KeyView XML Export SDK 10.23 C Programming Guide

KeyView XML Export SDK 10.23 C Programming Guide
KeyView
™
XML Export SDK C
Programming Guide
Version 10.23
Document Revision 0
20 January 2015
Copyright Notice
Notice
This documentation is a proprietary product of Autonomy and is protected by copyright laws and international treaty. Information in this
documentation is subject to change without notice and does not represent a commitment on the part of Autonomy. While reasonable efforts have
been made to ensure the accuracy of the information contained herein, Autonomy assumes no liability for errors or omissions. No liability is
assumed for direct, incidental, or consequential damages resulting from the use of the information contained in this documentation.
The copyrighted software that accompanies this documentation is licensed to the End User for use only in strict accordance with the End User
License Agreement, which the Licensee should read carefully before commencing use of the software. No part of this publication may be
reproduced, transmitted, stored in a retrieval system, nor translated into any human or computer language, in any form or by any means,
electronic, mechanical, magnetic, optical, chemical, manual or otherwise, without the prior written permission of the copyright owner.
This documentation may use fictitious names for purposes of demonstration; references to actual persons, companies, or organizations are
strictly coincidental.
Trademarks and Copyrights
Copyright © 2015 Hewlett-Packard Development Company, L.P. ACI API, Alfresco Connector, Arcpliance, Autonomy Process Automation,
Autonomy Fetch for Siebel eBusiness Applications, Autonomy, Business Objects Connector, Cognos Connector, Confluence Connector,
ControlPoint, DAH, Digital Safe Connector, DIH, DiSH, DLH, Documentum Connector, DOH, EAS Connector, Ektron Connector, Enterprise
AWE, eRoom Connector, Exchange Connector, FatWire Connector, File System Connector for Netware, File System Connector, FileNet
Connector, FileNet P8 Connector, FTP Fetch, HTTP Connector, Hummingbird DM Connector, IAS, IBM Content Manager Connector, IBM
Seedlist Connector, IBM Workplace Fetch, IDOL Server, IDOL, IDOLme, iManage Fetch, IMAP Connector, Import Module, iPlanet Connector,
KeyView, KVS Connector, Legato Connector, LiquidOffice, LiquidPDF, LiveLink Web Content Management Connector, MCMS Connector,
MediClaim, Meridio Connector, Meridio, Moreover Fetch, NNTP Connector, Notes Connector, Objective Connector, OCS Connector, ODBC
Connector, Omni Fetch SDK, Open Text Connector, Oracle Connector, PCDocs Fetch, PLC Connector, POP3 Fetch, Portal-in-a-Box, RecoFlex,
Retina, SAP Fetch, Schlumberger Fetch, SharePoint 2003 Connector, SharePoint 2007 Connector, SharePoint 2010 Connector, SharePoint
Fetch, SpeechPlugin, Stellent Fetch, TeleForm, Tri-CR, Ultraseek, Verity Profiler, Verity, VersiForm, WebDAV Connector, WorkSite Connector,
and all related titles and logos are trademarks of Hewlett-Packard Development Company, L.P. and its affiliates, which may be registered in
certain jurisdictions.
Microsoft is a registered trademark, and MS-DOS, Windows, Windows 95, Windows NT, SharePoint, and other Microsoft products referenced
herein are trademarks of Microsoft Corporation.
UNIX is a registered trademark of The Open Group.
AvantGo is a trademark of AvantGo, Inc.
Epicentric Foundation Server is a trademark of Epicentric, Inc.
Documentum and eRoom are trademarks of Documentum, a division of EMC Corp.
FileNet is a trademark of FileNet Corporation.
Lotus Notes is a trademark of Lotus Development Corporation.
mySAP Enterprise Portal is a trademark of SAP AG.
Oracle is a trademark of Oracle Corporation.
Adobe is a trademark of Adobe Systems Incorporated.
Novell is a trademark of Novell, Inc.
Stellent is a trademark of Stellent, Inc.
All other trademarks are the property of their respective owners.
Notice to Government End Users
If this product is acquired under the terms of a DoD contract: Use, duplication, or disclosure by the Government is subject to restrictions as set
forth in subparagraph (c)(1)(ii) of 252.227-7013. Civilian agency contract: Use, reproduction or disclosure is subject to 52.227-19 (a) through
(d) and restrictions set forth in the accompanying end user agreement. Unpublished-rights reserved under the copyright laws of the United States.
Autonomy, Inc., One Market Plaza, Spear Tower, Suite 1900, San Francisco, CA. 94105, US.
20 January 2015
Contents
Tables ............................................................................................................................................. 13
Figures ........................................................................................................................................... 15
About This Document .............................................................................................................. 17
Part 1 Overview of XML Export
Chapter 1
Introducing XML Export ......................................................................................................... 27
Overview ..................................................................................................................................... 27
Features...................................................................................................................................... 28
Platforms, Compilers and Dependencies ................................................................................... 29
Package Contents ...................................................................................................................... 31
License Information .................................................................................................................... 32
Enable Advanced Document Readers ................................................................................. 33
Update License Information ................................................................................................. 33
Directory Structure ..................................................................................................................... 34
Definition of Terms ..................................................................................................................... 36
Chapter 2
Getting Started .......................................................................................................................... 39
Architectural Overview ................................................................................................................ 40
Memory Abstraction .................................................................................................................... 42
Enhance Performance................................................................................................................. 42
File Caching ......................................................................................................................... 42
Convert Files Out of Process ...................................................................................................... 43
Configure Out-of-Process Conversions ................................................................................ 44
Run Export Out of Process—Overview ................................................................................ 46
XML Export SDK C Programming Guide
•
•
• 3
•
•
•
Contents
Run Export Out of Process in the C API ............................................................................... 47
Convert Files .............................................................................................................................. 50
Sub File Extraction ...................................................................................................................... 51
Convert Outlook Email without Using the Extraction API ...................................................... 52
Set Conversion Options .............................................................................................................. 52
Set Conversion Options Using the API ................................................................................. 53
Set Conversion Options Using the Template Files ............................................................... 53
Templates ...................................................................................................................... 53
Use the Export Demo Program ................................................................................................... 56
Change Input/Output Directories .......................................................................................... 57
Set Configuration Options .................................................................................................... 57
Suppress Imagesn ......................................................................................................... 58
Using PDF Position Information ..................................................................................... 58
Convert Files ........................................................................................................................ 58
Use the C-Language Implementation of the API ......................................................................... 59
Input/Output Operations ....................................................................................................... 60
Convert Files ........................................................................................................................ 60
Multi-threaded Conversions ................................................................................................. 62
Use the Verity Document Type Definition (DTD) ......................................................................... 63
Use XML Style Language Transformation (XSLT) ................................................................ 63
Add Elements and Attributes to the DTD .............................................................................. 64
Move the DTD ...................................................................................................................... 64
Part 2 Use the Export API
Chapter 3
Use the File Extraction API ................................................................................................... 67
Introduction.................................................................................................................................. 68
Extract Sub Files ........................................................................................................................ 69
Recreate a File’s Hierarchy ........................................................................................................ 70
Create a Root Node ............................................................................................................. 70
Recreate a File’s Hierarchy—Example ................................................................................. 71
Extract Mail Metadata ................................................................................................................. 72
Default Metadata Set ............................................................................................................ 72
Extract the Default Metadata Set ................................................................................... 73
Microsoft Outlook (MSG) Metadata ...................................................................................... 74
Extract MSG-Specific Metadata ..................................................................................... 75
Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata .......................................... 76
Extract EML- or MBX-Specific Metadata ........................................................................ 76
•
•
4 ••
•
•
XML Export SDK C Programming Guide
Contents
Lotus Notes Database (NSF) Metadata ............................................................................... 77
Extract NSF-Specific Metadata ...................................................................................... 77
Microsoft Personal Folders File (PST) Metadata .................................................................. 78
MAPI Properties ............................................................................................................ 78
Extract PST-Specific Metadata ...................................................................................... 79
Exclude Metadata from the Extracted Text File .................................................................... 80
Extract Sub Files from Outlook Files ........................................................................................... 80
Extract Sub Files from Outlook Express Files ............................................................................. 80
Extract Sub Files from Mailbox Files ........................................................................................... 81
Extract Sub Files from Outlook Personal Folders Files .............................................................. 81
Use the Native or MAPI-based Reader ................................................................................ 82
Use the Native PST Reader (pstnsr) ............................................................................. 83
Use the MAPI Reader (pstsr) ......................................................................................... 83
MAPI Attachment Methods .................................................................................................. 84
Open Secured PST Files ..................................................................................................... 85
Detect PST Files While the Outlook Client is Running ......................................................... 85
Extract Sub Files from Lotus Domino XML Language Files ........................................................ 85
Extract Sub Files from Lotus Notes Database Files ................................................................... 86
System Requirements .......................................................................................................... 87
Installation and Configuration ............................................................................................... 87
Open Secured NSF Files ..................................................................................................... 89
Format Note Sub Files ......................................................................................................... 89
Extract Sub Files from PDF Files ................................................................................................ 90
Extract Embedded OLE Objects.................................................................................................. 90
Extract Sub Files from ZIP Files ................................................................................................. 91
Default Filenames for Extracted Sub Files .................................................................................. 91
Default Filename for Mail Formats ....................................................................................... 91
Default Filename for Embedded OLE Objects ..................................................................... 92
Chapter 4
Use the XML Export API ......................................................................................................... 95
Extract Metadata ......................................................................................................................... 96
Extract Metadata Using the API ........................................................................................... 96
Extract Metadata Using a Template File .............................................................................. 96
Extract File Format Information .................................................................................................. 99
Convert Character Sets .............................................................................................................. 99
Determine the Character Set of the Output Text .................................................................. 99
Guidelines for Character Set Conversion .................................................................... 100
Examples of Character Set Conversion ............................................................................. 101
Document Character Set Can be Determined ............................................................. 102
XML Export SDK C Programming Guide
•
•
• 5
•
•
•
Contents
Document Character Set Cannot be Determined ......................................................... 103
Set the Character Set During Conversion .......................................................................... 103
Set the Character Set During File Extraction from a Container .......................................... 104
Map Styles ................................................................................................................................ 104
Use Style Sheets ...................................................................................................................... 108
Use Extensible Style Sheet Language (XSL) ..................................................................... 108
Use Cascading Style Sheets (CSS) ................................................................................... 108
Display Vector Graphics on UNIX and Linux ............................................................................ 109
Convert Revision Tracking Information ..................................................................................... 110
Convert PDF Files .................................................................................................................... 112
Convert PDF Files to a Logical Reading Order .................................................................. 112
Logical Reading Order and Paragraph Direction .......................................................... 112
Enable Logical Reading Order ..................................................................................... 113
Control Hyphenation .......................................................................................................... 115
Improve Performance for PDFs with Many Small Images .................................................. 116
Extract Custom Metadata from PDF Files .......................................................................... 116
Convert Spreadsheet Files ....................................................................................................... 117
Convert Hidden Text in Microsoft Excel Files ..................................................................... 117
Convert Headers and Footers in Microsoft Excel 2003 Files .............................................. 117
Specify Date and Time Format on UNIX Systems .............................................................. 118
Extract Microsoft Excel Formulas ....................................................................................... 118
Convert XML Files .................................................................................................................... 120
Configure Element Extraction for XML Documents ............................................................ 121
Modify Element Extraction Settings ............................................................................. 122
Modify Element Extraction Settings in the kvxconfig.ini File ......................................... 123
Specify an Element’s Namespace and Attribute .......................................................... 125
Add Configuration Settings for Custom XML Document Types .................................... 125
Show Hidden Data .................................................................................................................... 126
Hidden Data in Microsoft Documents ................................................................................. 126
Toggle Word Comment Settings in the formats_e.ini File ............................................ 127
Toggle PowerPoint Slide Note Settings in the formats_e.ini File .................................. 128
Show Hidden Data .................................................................................................................... 129
Hidden Data in Microsoft Documents ................................................................................. 129
Toggle Word Comment Settings in the formats_e.ini File ............................................ 130
Toggle PowerPoint Slide Note Settings in the formats_e.ini File .................................. 130
Chapter 5
Sample Programs ................................................................................................................... 133
Introduction................................................................................................................................ 133
C Sample Programs ........................................................................................................... 134
•
•
6 ••
•
•
XML Export SDK C Programming Guide
Contents
Compile the Visual Basic Sample Program ........................................................................ 135
tstxtract .................................................................................................................................... 135
cnv2xml .................................................................................................................................... 136
cnv2xmloop .............................................................................................................................. 137
metadata .................................................................................................................................. 138
xmlindex ................................................................................................................................... 138
xmlini ........................................................................................................................................ 139
Use Style Sheets with xmlini .............................................................................................. 140
xmlcallback ............................................................................................................................... 141
xmlonefile ................................................................................................................................. 141
xmlmulti .................................................................................................................................... 142
Export Demo ............................................................................................................................ 142
Part 3 C API Reference
Chapter 6
File Extraction API Functions ............................................................................................ 145
KVGetExtractInterface() ........................................................................................................... 146
fpCloseFile() ............................................................................................................................. 147
fpExtractSubFile() .................................................................................................................... 148
fpFreeStruct() ........................................................................................................................... 150
fpGetMainFileInfo() .................................................................................................................. 151
fpGetSubFileInfo() .................................................................................................................... 153
fpGetSubFileMetaData() .......................................................................................................... 155
fpOpenFile() ............................................................................................................................. 157
Chapter 7
File Extraction API Structures ........................................................................................... 159
KVCredential ............................................................................................................................ 160
KVCredentialComponent .......................................................................................................... 161
KVExtractInterface ................................................................................................................... 162
KVExtractSubFileArg ................................................................................................................ 163
KVGetSubFileMetaArg ............................................................................................................. 167
KVMainFileInfo ......................................................................................................................... 169
KVMetadataElem ..................................................................................................................... 171
KVMetaName ........................................................................................................................... 172
KVOpenFileArg ........................................................................................................................ 173
KVOutputStream ...................................................................................................................... 175
KVSubFileExtractInfo ............................................................................................................... 176
KVSubFileInfo .......................................................................................................................... 178
XML Export SDK C Programming Guide
•
•
• 7
•
•
•
Contents
KVSubFileMetaData ................................................................................................................. 181
Chapter 8
XML Export API Functions .................................................................................................. 183
KVXMLGetInterface() ............................................................................................................... 185
fpConvertStream() .................................................................................................................... 186
fpFileToInputStreamCreate() .................................................................................................... 189
fpFileToInputStreamFree() ....................................................................................................... 190
fpFileToOutputStreamCreate() ................................................................................................. 191
fpFileToOutputStreamFree() ..................................................................................................... 192
fpGetAnchor() ........................................................................................................................... 193
fpGetConvertFileList() ............................................................................................................... 195
fpGetStreamInfo() ..................................................................................................................... 196
fpGetSummaryInfo() ................................................................................................................. 197
fpInit() ....................................................................................................................................... 199
fpSetStyleMapping() ................................................................................................................. 201
fpShutDown() ............................................................................................................................ 203
fpValidateTemplate() ................................................................................................................ 204
KVXMLConfig() ......................................................................................................................... 205
Configuration Flags ............................................................................................................ 207
KVXMLConvertFile() ................................................................................................................. 214
KVXMLEndOOPSession() ........................................................................................................ 217
KVXMLSetStyleSheet() ............................................................................................................ 219
KVXMLStartOOPSession() ....................................................................................................... 221
Chapter 9
XML Export API Callback Functions ............................................................................... 225
Introduction ............................................................................................................................... 226
Continue() ................................................................................................................................. 227
GetAnchor() .............................................................................................................................. 228
GetAuxOutput() ........................................................................................................................ 230
UserCB() .................................................................................................................................. 232
Chapter 10
XML Export API Structures ................................................................................................. 233
ADDOCINFO ............................................................................................................................ 234
KVInputStream ......................................................................................................................... 235
KVMemoryStream .................................................................................................................... 236
KVOutputStream ...................................................................................................................... 237
KVSTR ..................................................................................................................................... 238
KVStreamInfo ........................................................................................................................... 239
•
•
8 ••
•
•
XML Export SDK C Programming Guide
Contents
KVStructHead .......................................................................................................................... 240
KVStyle .................................................................................................................................... 241
KVSumInfoElemEx ................................................................................................................... 243
KVSummaryInfoEx ................................................................................................................... 244
KVXConfigInfo .......................................................................................................................... 245
KVXMLCallbacks ..................................................................................................................... 247
KVXMLHeadingInfo .................................................................................................................. 248
KVXMLInterface ....................................................................................................................... 251
KVXMLOptions ......................................................................................................................... 253
KVXMLTemplate ...................................................................................................................... 262
KVXMLTOCOptions ................................................................................................................. 267
Chapter 11
Enumerated Types ................................................................................................................. 269
Introduction ............................................................................................................................... 270
ENSATableBorder .................................................................................................................... 271
KVCredKeyType ...................................................................................................................... 271
KVErrorCode ............................................................................................................................ 272
KVErrorCodeEx ........................................................................................................................ 274
KVXMLStyleSheetType ............................................................................................................ 277
KVXMLAnchorType .................................................................................................................. 278
KVXMLGraphicType ................................................................................................................. 279
KVHeadingCreateOptions ........................................................................................................ 280
KVXMLEmptyParaType ........................................................................................................... 281
KVXMLHardPageBreakType .................................................................................................... 282
KVMetadataType ..................................................................................................................... 283
KVMetaNameType ................................................................................................................... 285
KVSumInfoType ....................................................................................................................... 285
KVSumType ............................................................................................................................. 286
LPDF_DIRECTION .................................................................................................................. 290
Appendixes
Appendix A
Supported Formats ................................................................................................................. 293
Supported Formats .................................................................................................................. 294
Archive Formats ................................................................................................................. 295
Binary Format .................................................................................................................... 296
Computer-Aided Design Formats ....................................................................................... 297
Database Formats ............................................................................................................. 298
XML Export SDK C Programming Guide
•
•
• 9
•
•
•
Contents
Desktop Publishing ............................................................................................................ 298
Display Formats ................................................................................................................. 299
Graphic Formats ................................................................................................................ 299
Mail Formats ...................................................................................................................... 302
Multimedia Formats ............................................................................................................ 304
Presentation Formats ......................................................................................................... 305
Spreadsheet Formats ......................................................................................................... 307
Text and Markup Formats .................................................................................................. 309
Word Processing Formats .................................................................................................. 310
Supported Formats (Detected) ................................................................................................. 313
Appendix B
Files Required for Redistribution ...................................................................................... 319
Core Files ................................................................................................................................. 320
Support Files ............................................................................................................................ 321
Document Readers and Writers ................................................................................................ 322
Document Type Definition Files ................................................................................................ 328
Appendix C
Export Tokens ........................................................................................................................... 329
Appendix D
Character Sets........................................................................................................................... 333
Multi-Byte and Bi-Directional Support ....................................................................................... 333
Coded Character Sets .............................................................................................................. 341
Appendix E
File Format Detection ............................................................................................................. 347
Introduction................................................................................................................................ 347
Extract Format Information ........................................................................................................ 348
Determine Format Support ....................................................................................................... 348
Refine Detection of Text Files ............................................................................................ 349
Change the Amount of File Data to Read .................................................................... 349
Change the Percentage of Allowed Non-ASCII Characters ......................................... 349
Use the File Extension for Detection ............................................................................ 350
Translate Format Information .................................................................................................... 350
Distinguish Between Formats ............................................................................................. 351
Determine a Document Reader ................................................................................................. 352
Category Values in formats_e.ini .............................................................................................. 352
•
•
10 ••
•
•
XML Export SDK C Programming Guide
Contents
Appendix F
File Formats and Extensions .............................................................................................. 371
File Format and Extension Table .............................................................................................. 371
Appendix G
Extract and Format Lotus Notes Sub Files.................................................................... 393
Overview ................................................................................................................................... 393
Customize XML Templates ...................................................................................................... 394
Use Demo Templates ........................................................................................................ 394
Use Old Templates ............................................................................................................ 395
Disable XML Templates ..................................................................................................... 395
Template Elements and Attributes ........................................................................................... 395
Conditional Elements ......................................................................................................... 396
Control Elements ............................................................................................................... 398
Data Elements ................................................................................................................... 399
Date and Time Formats ............................................................................................................ 401
Lotus Notes Date and Time Formats ................................................................................. 402
KeyView Date and Time Formats ...................................................................................... 403
Appendix H
Password Protected Files .................................................................................................... 409
Supported Password Protected File Types .............................................................................. 409
Open Password Protected Container Files ............................................................................... 410
Export Password Protected Files ..............................................................................................411
Index ............................................................................................................................................. 413
XML Export SDK C Programming Guide
•
•
• 11
•
•
•
Contents
•
•
12 ••
•
•
XML Export SDK C Programming Guide
Tables
Table 1
Table 2
Table 3
Table 4
Table 5
Table 6
Table 7
Table 8
Table 9
Table 10
Table 11
Table 12
Table 13
Table 14
Table 15
Table 16
Table 17
Table 18
Table 19
Table 20
Table 21
Table 22
Table 23
Table 24
Table 25
Table 26
Table 27
Table 28
Table 29
Table 30
Supported Compilers .................................................................................................. 30
Supported Compilers for Java and .NET Components ............................................... 31
XML Export Installed Directory Structure .................................................................... 34
Architectural Components........................................................................................... 41
Parameters for Out-of-Process Conversion ................................................................ 45
Default Mail Metadata List .......................................................................................... 73
MSG-specific Metadata List ........................................................................................ 74
Document Character Set Can be Determined .......................................................... 102
Document Character Set Cannot be Determined ..................................................... 103
Flags for Defining Styles ........................................................................................... 107
Supported Microsoft Excel Functions ....................................................................... 119
Hidden data settings ................................................................................................. 126
Hidden data settings ................................................................................................. 129
Options for the cnv2xml Sample Program .............................................................. 137
Options for the cnv2xmloop Sample Program........................................................ 138
Options for the xmlini Sample Program ................................................................ 140
Key to Support Tables .............................................................................................. 294
Supported Archive Formats ...................................................................................... 295
Supported Binary Formats ........................................................................................ 296
Supported CAD Formats........................................................................................... 297
Supported Database Formats................................................................................... 298
Supported Desktop Publishing Formats ................................................................... 298
Supported Display Formats ...................................................................................... 299
Supported Graphic Formats...................................................................................... 299
Supported Mail Formats............................................................................................ 302
Supported Multimedia Formats ................................................................................. 304
Supported Presentation Formats .............................................................................. 305
Supported Spreadsheet Formats.............................................................................. 307
Supported Text and Markup Formats ....................................................................... 309
Supported Word Processing Formats ....................................................................... 310
XML Export SDK C Programming Guide
•
•
•
• 13
•
•
Tables
Table 31
Table 32
Table 33
Table 34
Table 35
Table 36
Table 37
Table 38
Table 39
Table 40
Table 41
Table 42
Table 43
Table 44
•
•
14 ••
•
•
Export Tokens ...........................................................................................................329
Multi-byte and bi-directional support..........................................................................333
Code Character Sets .................................................................................................341
Major Formats ...........................................................................................................352
File Classes ...............................................................................................................368
Minor Formats ...........................................................................................................369
KeyView file formats and extensions.........................................................................372
Conditional elements .................................................................................................396
Control Elements .......................................................................................................398
Data elements ...........................................................................................................399
Lotus Notes date and time formats............................................................................402
KeyView date and time formats.................................................................................403
Key to support table...................................................................................................409
Supported password-protected file types ..................................................................410
XML Export SDK C Programming Guide
Figures
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
XML Export Architecture ............................................................................................. 40
Export Demo: Launching ............................................................................................ 56
Export Demo: Setting Directories................................................................................ 57
Export Demo: Converting Files ................................................................................... 59
Example Container File Tree Structure....................................................................... 69
Extracted PST File ...................................................................................................... 71
Recreated File Hierarchy ............................................................................................ 72
Document Character Set Can Be Determined .......................................................... 100
Document Character Set Cannot Be Determined ..................................................... 101
XML Export SDK C Programming Guide
•
•
•
• 15
•
•
Figures
•
•
16 ••
•
•
XML Export SDK C Programming Guide
About This Document
This guide is for developers who incorporate KeyView XML conversion
technology into their custom Web applications using a C development
environment. It is intended for readers who are familiar with XML and C.

Documentation Updates

Related Documentation

Conventions

Autonomy Customer Support

Contact Autonomy
Documentation Updates
The information in this document is current as of XML Export SDK version 10.23.
The content was last modified 20 January 2015.
You can retrieve the most current product documentation from the HP Autonomy
Knowledge Base on the Customer Support Site.
A document in the Knowledge Base displays a version number in its name, such
as IDOL Server 7.5 Administration Guide. The version number applies to the
product that the document describes. The document may also have a revision
number in its name, such as IDOL Server 7.5 Administration Guide Revision 6.
The revision number applies to the document and indicates that there were
revisions to the document since its original release.
Autonomy recommends that you periodically check the Knowledge Base for
revisions to documents for the products your enterprise is using.
To access Autonomy documentation
1. Go to the Autonomy Customer Support site:
https://customers.autonomy.com/
XML Export SDK C Programming Guide
•
•
• 17
•
•
•
About This Document
2. Click Login.
3. Type the login credentials that you were given, and then click Login.
The Customer Support Site opens.
4. Click Knowledge Base.
The Knowledge Base Search page opens.
5. Search or browse the Knowledge Base.
To search the knowledge base:
a. In the Search box, type a search term or phrase and click Search.
Documents that match the query display in a results list.
To browse the knowledge base:
a. Select one or more of the categories in the Browse list. You can browse
by:

Repository. Filters the list by Documentation produced by technical
publications, or Solutions to Technical Support cases.

Product Family. Filters the list by product suite or division. For
example, you could retrieve documents related to the iManage, IDOL,
Virage or KeyView product suites.

Product. Filters the list by product. For example, you could retrieve
documents related to IDOL Server, Virage Videologger, or KeyView
Filter.

Version. Filters the list by product or component version number.

Type. Filters the list by document type. For example, you could
retrieve Guides, Help, Packages (ZIP files), or Release Notes.

Format. Filters the list by document format. For example, you could
retrieve documents in PDF or HTML format. Guides are typically
provided in both PDF and HTML format.
6. To open a document, click its title in the results list.
To download a PDF version of a guide, open the PDF version, click the
Download icon
in the PDF reader, and save the PDF to another location.
To download a documentation ZIP package, click Get Documentation
Package under the document title in the results list. Alternatively, browse to
the desired ZIP package by selecting either the Packages document Type or
the ZIP document Format from the Browse list.
Autonomy welcomes your comments.
•
•
18 ••
•
•
XML Export SDK C Programming Guide
Related Documentation
To send feedback on Autonomy documentation
 send e-mail to [email protected]

provide:
 full document title with version and revision number
 location: heading, a snippet of text or screen capture
 your comments
 your contact information in the event we need clarification
Related Documentation
The following documents provide more details on XML Export.

XML Export Release Notes

XML Export SDK Java Programming Guide
Conventions
The following conventions are used in this document.
Notational Conventions
This document uses the following conventions.
Convention
Usage
Bold
User-interface elements such as a menu item or button.
For example:
Click Cancel to halt the operation.
Italics
Document titles and new terms. For example:
 For more information, see the IDOL Server
Administration Guide.
 An action command is a request, such as a query or
indexing instruction, sent to IDOL Server.
XML Export SDK C Programming Guide
•
•
• 19
•
•
•
About This Document
Convention
Usage
monospace font
File names, paths, and code. For example:
The FileSystemConnector.cfg file is installed in
C:\Program Files\FileSystemConnector\.
monospace bold
Data typed by the user. For example:
 Type run at the command prompt.
 In the User Name field, type Admin.
monospace italics
Replaceable strings in file paths and code. For
example:
user UserName
Command-Line Syntax Conventions
This document uses the following command-line syntax conventions.
Convention
Usage
[ optional ]
Brackets describe optional syntax. For example:
[ -create ]
|
Bars indicate “either | or” choices. For example:
[ option1 ] | [ option2 ]
In this example, you must choose between option1
and option2.
{ required }
Braces describe required syntax in which you have a
choice and that at least one choice is required. For
example:
{ [ option1 ] [ option2 ] }
In this example, you must choose option1, option2,
or both options.
•
•
20 ••
•
•
XML Export SDK C Programming Guide
Conventions
Convention
Usage
required
Absence of braces or brackets indicates required
syntax in which there is no choice; you must type the
required syntax element.
variable
Italics specify items to be replaced by actual values. For
example:
<variable>
-merge filename1
(In some documents, angle brackets are used to denote
these items.)
Ellipses indicate repetition of the same pattern. For
example:
...
-merge filename1, filename2 [, filename3
... ]
where the ellipses specify, filename4, and so on.
The use of punctuation—such as single and double quotes, commas, periods—
indicates actual syntax; it is not part of the syntax definition.
Notices
This document uses the following notices:
CAUTION A caution indicates an action can result in the loss
of data.
IMPORTANT An important note provides information that is
essential to completing a task.
NOTE A note provides information that emphasizes or
supplements important points of the main text. A note supplies
information that may apply only in special cases—for example,
memory limitations, equipment configurations, or details that
apply to specific versions of the software.
XML Export SDK C Programming Guide
•
•
• 21
•
•
•
About This Document
TIP A tip provides additional information that makes a task
easier or more productive.
Autonomy Customer Support
Autonomy Customer Support provides prompt and accurate support to help you
quickly and effectively resolve any issue you may encounter while using
Autonomy products. Support services include access to the Customer Support
Site (CSS) for online answers, expertise-based service by Autonomy support
engineers, and software maintenance to ensure you have the most up-to-date
technology.
To access the Customer Support Site
 go to https://customers.autonomy.com
The Customer Support Site includes:

Knowledge Base: The CSS contains an extensive library of end user
documentation, FAQs, and technical articles that is easy to navigate and
search.

Case Center: The Case Center is a central location to create, monitor, and
manage all your cases that are open with technical support.

Download Center: Products and product updates can be downloaded and
requested from the Download Center.

Resource Center: Other helpful resources appropriate for your product.
To contact Autonomy Customer Support by e-mail or phone
 go to http://www.autonomy.com/work/services/customer-support
•
•
22 ••
•
•
XML Export SDK C Programming Guide
Contact Autonomy
Contact Autonomy
For general information about Autonomy, contact one of the following locations:
Europe and Worldwide
North and South America
E-mail: [email protected]
E-mail: [email protected]
Telephone: +44 (0) 1223 448 000
Fax:
+44 (0) 1223 448 001
Telephone: +1.415.243.9955
Fax:
+1.415.243.9984
Autonomy Corporation plc
Cambridge Business Park
Cowley Rd.
Cambridge CB4 0WZ
United Kingdom
Autonomy, Inc.
One Market Plaza
Spear Tower, Suite 1900
San Francisco CA 94105
USA
XML Export SDK C Programming Guide
•
•
• 23
•
•
•
About This Document
•
•
24 ••
•
•
XML Export SDK C Programming Guide
PART 1
Overview of XML
Export
This section provides an overview of the Export SDK and
describes how to use the C implementation of the API. It
contains the following chapters:

Introducing XML Export

Getting Started
Part 1 Overview of XML Export
•
•
26 ••
•
•
XML Export SDK C Programming Guide
CHAPTER 1
Introducing XML Export
This section describes the KeyView Export SDK package. It contains the following
topics:

Overview

Features

Platforms, Compilers and Dependencies

Package Contents

License Information

Directory Structure

Definition of Terms
Overview
XML Export is part of the KeyView Export SDK. It enables you to convert virtually
any document, spreadsheet, presentation, or graphic into well-formed, valid XML
which is validated against a predefined Document Type Definition (DTD). With
XML Export, you control the content, structure, and format of the XML output
using either easily customized templates, or the flexible and robust APIs.
The main purpose of XML Export is to apply an XML vocabulary to the data
structures in a document so that content and metadata can be indexed and
subsequently searched in context.
XML Export SDK C Programming Guide
•
•
• 27
•
•
•
Chapter 1 Introducing XML Export
Data structures in a source document can be:

metadata (title, author, subject, and so on)

document components (headers, footers, footnotes, endnotes, captions,
bookmarks, and so on)

tagged text (chapters, sections, bulleted lists, and so on)

table components (sheet names, rows, columns, cell ranges, and so on)

presentation components (notes, slide titles, slide descriptions, and so on)
Although viewing is not the main purpose of XML Export, Extensible Stylesheet
Language (XSL) style sheets or Cascading Style Sheets (CSS) can be used to
display the XML data.
Export SDK supports a number of programming environments, such as Visual
Basic, Java, and Delphi and runs on all popular operating system platforms
including Windows, Solaris, HP-UX, IBM AIX, and Linux.
Export SDK is part of the KeyView suite of products. KeyView provides
high-speed text extraction, conversion to Web-ready HTML and well-formed XML,
and high-fidelity document viewing.
Features
•
•
28 ••
•
•

Dynamically convert word processing, spreadsheet, presentation, and
graphics files into well-formed, valid, and 1.0-compliant XML. The XML output
is validated against a predefined DTD named the “Verity.dtd.”

Export supports over 300 formats in 70 languages.

Convert files either in-process or out of process. Out-of-process conversion
ensures the stability and robustness of the calling application if a corrupt
document causes an exception or the conversion process to fail.

Files embedded within files can be extracted, using the File Extraction API,
and then converted, using the Export API.

Use redirected input/output. You can provide an input stream that is not
restricted to file system access.

Export automatically recognizes the file format being converted and uses the
appropriate reader. Your application does not need to rely on filename
extensions to determine the file format.
XML Export SDK C Programming Guide
Platforms, Compilers and Dependencies

Create heading levels in the output file by either using the structure in the
source document or by allowing Export to automatically generate a structure
based on document properties, such as font or font attributes.

Use callbacks to control such aspects of the conversion process as file
naming and the insertion of scripts.

Manage memory allocation to optimize speed and performance of application.

Insert predefined XML markup at specific points in the output stream.

Apply XSL or Cascading Style Sheets (CSS) to improve the fidelity of the
output.

Map paragraph and character styles in word processing documents to any
markup you specify in the output.

Control the resolution of rasterized vector graphics to optimize storage
requirements or image quality.

Select the target format for converted graphics, including GIF, JPEG, CGM,
PNG, WMF, and Java on Windows, and Java and JPEG on Unix and Linux.
Platforms, Compilers and Dependencies
This section lists the supported platforms, supported compilers, and software
dependencies for the KeyView software.
Supported Platforms

FreeBSD 8.1 x86.

HP HP-UX 11i and 11i v2 PA-RISC

Mac OS X Mountain Lion 10.8 or higher on 32- and 64-bit Apple-Intel
architecture

Microsoft Windows 2003 Server x86 and x64

Microsoft Windows Vista Business Edition x86 and x64. Other editions of Vista
have not been tested, but are likely supported.

Microsoft Windows 2008 Server Enterprise Edition x86 and x64

Microsoft Windows 2008 Server R2

Microsoft Windows XP x86 (Service Pack 2)

Microsoft Windows 7 x86 and x64
XML Export SDK C Programming Guide
•
•
• 29
•
•
•
Chapter 1 Introducing XML Export

Microsoft Windows 8 x86 and x64

Red Hat Enterprise Linux AS 4.0 x86

Red Hat Enterprise Linux AS 4.0 x64

Red Hat Enterprise Linux 5.0 x86 and x64

Red Hat Enterprise Linux 6.0 x86 and x64

Sun Solaris 9.0, and 10 SPARC

Sun Solaris 10 x64

SuSE Linux Enterprise Server 10, 10.1, 11 x86

SuSE Linux Enterprise Server 10, 10.1 x64

SuSE Linux Enterprise Server 11 x64
Supported Compilers
Table 1 Supported Compilers
Platform
Architecture
Compiler Name
Compiler Version
Microsoft
Windows
x86
cl
Microsoft 32-bit C/C++ Optimizing Compiler
Version 16.00.30319.01 for x86
x64
cl
Microsoft C/C++ Optimizing Compiler Version
16.00.30319.01 for x64
x86 64-bit
Sun Studio 12
Sun C 5.9 SunOS_i386 Patch 124868-01
2007/07/12
SPARC 64-bit
Sun Studio 11
Sun C 5.8 Patch 121015-06 2007/10/03
x86
gcc / g++
3.4.3 (Redhat 4), 4.1.0 (SuSE Linux 10)
x64
gcc / g++
4.1.0 (Redhat 4), 4.1.0 (SuSE Linux 10)
HP HP-UX
PA-RISC
cc / aCC
aCC: HP ANSI C++ B3910B A.03.70 for 32 bit1
Mac OSX
Apple-Intel 32-bit
and 64-bit
LLVM
Apple LLVM 5.1 (clang-503.0.40) (based on LLVM
3.4svn)
FreeBSD
BSD x86
gcc / g++
4.2.1 [FreeBSD] 20070719
Sun Solaris
Linux
•
•
30 ••
•
•
XML Export SDK C Programming Guide
Package Contents
Table 2 Supported Compilers for Java and .NET Components
Component
Compiler
Java components
Java 1.5
.NET components
Microsoft Visual J# 2005 Compiler
8.00.50727.42
Software Dependencies
Some KeyView components require that you have installed specific third-party
software:

Java Runtime Environment (JRE) or Java Software Developer Kit (JDK)
version 1.5. Required for Java API and graphics conversion in Export SDK.

Outlook 2002 client or later versions. Required when processing Microsoft
Outlook Personal Folders (PST) files using the MAPI-based reader (pstsr).
The native PST reader (pstnsr) does not require an Outlook client.

Lotus Notes or Lotus Domino (minimum requirement is 6.5.1, but version 8.5
is recommended). Required for Lotus Notes database (NSF) file processing.

Microsoft .NET Framework SDK version 2.0, Microsoft .NET Framework
version 2.0 Redistributable Package (if programming in .NET environment)
Package Contents
The Export installation contains:

Libraries and executable files necessary for converting source documents into
high-quality, well-formed XML (see “Files Required for Redistribution” on
page 319).

The include files that define the functions and structures used by the
application to establish an interface with Export:
adinfo.h
kvxml.h
kvtypes.h
kvxtract.h

The Java API implemented in the package com.verity.api.export
contained in the file KeyView.jar.

Several sample programs that demonstrate Export’s functionality.
XML Export SDK C Programming Guide
•
•
• 31
•
•
•
Chapter 1 Introducing XML Export

Sample images that can be used as navigation buttons and background
textures in your output.

Template files that allow you to set conversion options without modifying at the
API level. They can be used to generate a wide range of output, from
highly-stylized user-defined XML to stripped-down, text-only output suitable
for use with an indexing engine.

The predefined DTD, Verity.dtd, used to validate all XML output.

Sample style sheets: wp.xsl (for word processing documents), ss.xsl (for
spreadsheets), and pg.xsl (for presentation graphics).
License Information
During installation, the installation program validates the organization name and
license key you enter and generates the install/OS/bin/kv.lic file, where
install is the directory in which you installed KeyView, and OS is the operating
system. This file is opened and validated when the KeyView API is used.
The kv.lic file contains the organization name and the 28-digit license key you
specified during installation. The contents of a kv.lic file looks similar to the
following:
Company Name
XXXXXXX-XXXXXXX-XXXXXXX-XXXXXXX
The license key controls whether the following are enabled:

full version of the KeyView SDK

trial version of the KeyView SDK

language detection and advanced document readers—The following
components are considered advanced features, and are licensed separately:
 Microsoft Outlook Personal Folders (PST) reader (pstsr and pstnsr)
 Lotus Notes database (NSF) reader (nsfsr)
 Mailbox (MBX) reader (mbxsr)
 Character set detection library (kvlangdetect)
If you change the license key at any time, you must update the licensing
information in the kv.lic file. See “Update License Information” on page 33.
•
•
32 ••
•
•
XML Export SDK C Programming Guide
License Information
Enable Advanced Document Readers
To enable advanced readers in one of the KeyView SDKs, you must obtain an
appropriate license key from Autonomy and update the installed license key with
the new information as described in “Update License Information” on page 33.
If you are enabling the MBX reader in an existing installation of Export, in addition
to updating the license key, change the parameter 208=eml to 208=mbx in the
formats_e.ini file.
Update License Information
If you currently have an evaluation version of KeyView and have purchased a full
version of the SDK, or you are adding a document reader (for example, the PST
reader), you must update the license information that was installed with the
original version of the KeyView SDK.
If you installed a full version of KeyView, but did not enter licensing information at
the time of installation, you must also update the license information.
To update the information, do one of the following:

Manually update the license information that is stored in the text file named
kv.lic.

Re-install the product and enter the new license information when prompted.
To update the KeyView license information:
1. Open the license key file, kv.lic, in a text editor. The file is in the install\
OS\bin directory, where install is the directory in which you installed
KeyView, and OS is the operating system. The file contains the following text:
COMPANY NAME
XXXXXXX-XXXXXXX-XXXXXXX-XXXXXXX
2. Replace the text COMPANY NAME with the company name that appears at the
top of the License Key Sheet provided by Autonomy. Enter the text exactly as
it appears in the document.
3. Replace the characters XXXXXX-XXXXXXX-XXXXXXX-XXXXXXX with the
appropriate license key from the License Key Sheet provided by Autonomy.
The license key is listed in the Key column in the Standalone Products table.
The key is a string containing 31 characters, for example,
2TQD22D-2M6FV66-2KPF23S-2GEM5AB. Enter the characters exactly as
they appear in the document, and do not include a leading or trailing space.
4. The finished kv.lic file looks similar to the following:
Autonomy
24QD22D-2M6FV66-2KPF23S-2G8M59B
XML Export SDK C Programming Guide
•
•
• 33
•
•
•
Chapter 1 Introducing XML Export
5. Save the kv.lic file.
Directory Structure
Table 3 describes the directories created during the XML Export installation. The
variable install is the pathname of the Export installation directory (for
example, /usr/autonomy/KeyviewExportSDK on UNIX, or C:\Program
Files\Autonomy\KeyviewExportSDK on Windows). On UNIX, the XML
Export directory is named /xmlexpt.
The variable OS is the operating system for which the SDK is installed. For
example, the bin directory on a standard 32-bit Windows installation would be
located at C:\Program Files\Autonomy\KeyviewExportSDK\WINDOWS\
bin.
Table 3 XML Export Installed Directory Structure
•
•
34 ••
•
•
Directory
Contents
install\OS\bin
Contains the libraries, executables for the sample
programs Export Demo and cnv2xml, the Java
program (kvraster.class), the Java applet
(kvvector.jar), the format detection file,
formats_e.ini, the license key file (kv.lic), and a
number of other supporting files.
install\javaapi\ini
Contains the template files used with the Java API.
install\javaapi\javadoc
Contains the Javadoc for the Java API.
install\javaapi\sample
Contains the source files and sample programs for the
Java API.
install\testdocs
Contains sample word processing, spreadsheet, and
presentation graphics files that can be used to test
XML Export’s options. You may also find this directory
useful when testing your own applications.
install\XML Export\guide
Contains the XML Export C Programming Guide and
XML Export Java Programming Guide in HTML and
PDF format.
install\XML Export\include
Contains the header files (adinfo.h, kvxml.h and
kvtypes.h) for the C API.
install\XML Export\programs\bin
Contains the executable files for the sample Visual
Basic program called Export Demo.
XML Export SDK C Programming Guide
Directory Structure
Table 3 XML Export Installed Directory Structure
Directory
Contents
install\XML Export\programs\cnv2xml
Contains the C source code files for a sample program
that creates a single XML file. The executable for this
sample program is in the bin directory.
install\XML Export\programs\
cnv2xmloop
Contains the C source code for a sample program that
creates a single XML file out of process.
install\XML Export\programs\
ExportDemo
Contains the source code for a sample Visual Basic
program. The executable for this sample program is in
the bin directory. Export Demo is available through the
Start menu.
install\XML Export\programs\ini
Contains the template files used to set the conversion
options in the C API.
install\XML Export\programs\metadata
Contains the C source code and supporting files for a
sample program that creates a valid XML file
containing only the document’s metadata.
install\XML Export\programs\pdfini
Contains the template file used to extract custom
metadata from PDF documents.
install\XML Export\programs\tempout
The default output directory for converted files.
Contains the KeyView DTD, sample style sheets, and
character entity files. These files are required for
viewing the converted XML files.
install\XML Export\programs\tstxtract
Contains the C source code and supporting files for a
sample program that demonstrates the File Extraction
interface.
install\XML Export\programs\
xmlcallback
Contains the C source code and supporting files for a
sample program that demonstrates how user callbacks
can dynamically shape the XML conversion.
install\XML Export\programs\xmlindex
Contains the C source code and supporting files for a
sample program that produces text-only XML.
install\XML Export\programs\xmlini
Contains the C source code and supporting files for a
sample program that uses template files to set the
conversion options.
XML Export SDK C Programming Guide
•
•
• 35
•
•
•
Chapter 1 Introducing XML Export
Table 3 XML Export Installed Directory Structure
Directory
Contents
install\XML Export\programs\xmlmulti
Contains the C source code and supporting files for a
sample program that creates multiple XML files from a
source document. The main file contains the table of
contents. Each H1 heading is contained within its own
file.
install\XML Export\programs\
xmlonefile
Contains the C source code and supporting files for a
sample program that converts a source document into
a single, formatted XML file.
install\XML Export\rel_notes
Contains the XML Export Release Notes in HTML and
PDF format.
Definition of Terms
The following are specialized terms used throughout the guide.
anchor
XML markup that defines both anchors and hyperlinks. An anchor is
a named place in a document to which other documents can form a
link. Anchors use the XML anchor tags (<a xmlns:xlink= xlink
href=> </a>) to facilitate navigation within a document.
The major browsers do not currently support linking in XML
documents.
•
•
36 ••
•
•
block
All source document content (including sub-headings) associated
with Heading Level 1. Export identifies and/or generates blocks from
the input stream for the implementation of the your XML markup.
block chunk or
chunk
All source document content associated with Heading Levels 2
through 6. Chunks are subdivisions of blocks. You may supply
specific XML markup for the different levels of block chunks.
callback
A function optionally supplied by your application and called from
within the Export API. For example, callbacks allow your application
to monitor the progress of the conversion process dynamically.
stream
Transmission of a file’s content between memory and disk in a
continuous flow.
token
The vehicle for conveying specific types of information to and from the
API during the conversion process. Tokens are placeholders for
markup that appears in the output. See “Export Tokens” on page 329.
XML Export SDK C Programming Guide
Definition of Terms
XML Export SDK C Programming Guide
•
•
• 37
•
•
•
Chapter 1 Introducing XML Export
•
•
38 ••
•
•
XML Export SDK C Programming Guide
CHAPTER 2
Getting Started
This section provides an overview of XML Export and describes how to use the C
implementations of the API. It contains the following topics:

Architectural Overview

Memory Abstraction

Enhance Performance

Convert Files Out of Process

Convert Files

Sub File Extraction

Set Conversion Options

Use the Export Demo Program

Use the C-Language Implementation of the API

Use the Verity Document Type Definition (DTD)
XML Export SDK C Programming Guide
•
•
• 39
•
•
•
Chapter 2 Getting Started
Architectural Overview
The general architecture of the KeyView XML conversion technology is the same
across all supported platforms and is illustrated in Figure 1.
Figure 1 XML Export Architecture
•
•
40 ••
•
•
XML Export SDK C Programming Guide
Architectural Overview
Each component is described in Table 4.
Table 4 Architectural Components
Component
Description
Developer’s Application
The developer’s application interfaces directly with the XML Export API
through either a C-language or Java implementation.
File Extraction API
The File Extraction API opens a file and extracts the file’s sub files so
that they are available for conversion. See “Use the File Extraction API”
on page 67.
XML Export API
The XML Export API exposes the functionality of XML Export and
controls all other XML Export modules during the conversion process.
Format Detection Module
The format detection module determines the file type of the source file,
which enables the XML Export interface to load the appropriate
structured access layer module and document reader. See “File Format
Detection” on page 347.
Structured Access Layer
The structured access layer contains three modules: one for word
processing, one for spreadsheets, and one for presentations and
graphics. Information from the format detection module determines
which access layer module operates at this stage of the conversion. The
structured access layer performs the following:
1. Loads the appropriate document reader.
2. Processes the data stream from the document reader.
3. Determines table of contents entries.
4. Sends the stream to the appropriate XML writer.
5. Accepts the XML stream from the XML writer.
6. Generates the XML output file with a table of contents, metadata,
and the document’s contents, and sends it to the XML Export
interface.
Document Reader
Each document reader reads a specific file format and sends a text
stream of the document to the structured access layer. Word processing
readers return a token stream to the structured access layer. A token
stream contains the document contents and messages (tokens) that
precede the content and identify the type of information that follows
them. Each reader is loaded as required by the structured access layer.
See “Document Readers and Writers” on page 322 for a complete list of
document readers.
XML Writers
Each XML writer accepts a text stream or token stream from the
structured access layer and generates an equivalent XML stream that is
sent back to the structured access layer. The structured access layer
then generates the output file. See “Document Readers and Writers” on
page 322 for a list of format writers.
XML Export SDK C Programming Guide
•
•
• 41
•
•
•
Chapter 2 Getting Started
Memory Abstraction
All dynamic memory allocations in Export modules are abstracted through a C
interface. This memory allocation interface is defined in the KVMemoryStream
structure in kvtypes.h. See “KVMemoryStream” on page 236. You may override
all memory allocations by providing a C structure containing pointers to functions
identical in nature to their standard ANSI C counterpart. The xmlcallback
sample program demonstrates Export memory management features. See
“xmlcallback” on page 141.
Enhance Performance
KeyView is designed for optimal performance out of the box. However, there are
some parameters that you can adjust to improve system performance according
to your needs.
File Caching
To reduce the frequency of I/O operations, and consequently improve
performance, the KeyView readers load file data into memory. The readers then
read the data from the cache rather than the physical disk. You can configure the
amount of memory used for file caching through the formats_e.ini file.
Generally, when you increase the memory, performance will improve.
By default, KeyView uses a maximum of 1MB of memory for each thread—
assuming a thread contains only one instance of pContext that is returned from
the session initialization (see “fpInit()” on page 199). If the file data is larger than
1MB, up to 1MB of data is cached and the data beyond 1MB is read from disk.
The minimum amount of memory that can be used for file caching is 64KB.
To determine a reasonable value, divide the maximum amount of memory you
want KeyView to use for file caching by the total number of threads. For example,
if you want KeyView to use a maximum of 50MB of memory and have 10 threads,
set the value to 5MB.
To modify the memory allocated for file caching, change the value for the following
parameter in the [DiskCache] section of the formats_e.ini file:
DiskCacheSize=1024
The value is in kilobytes. If this parameter is not set or is set to 0 (zero), the
minimum value of 64KB is used.
•
•
42 ••
•
•
XML Export SDK C Programming Guide
Convert Files Out of Process
The formats_e.ini file is in the directory install\OS\bin, where install is
the pathname of the Export installation directory and OS is the name of the
operating system.
Convert Files Out of Process
Export can run independently from the calling application. This is called out of
process. Out-of-process conversions protect the stability of the calling application
in the rare case when a malformed document causes Export to fail. You can also
run Export in the same process as the calling application. This is called
in-process. However, it is strongly recommended you convert documents out of
process whenever possible.
The Export out-of-process framework uses a client-server architecture. The
calling application sends an out-of-process conversion request to the Service
Request Broker in the main Export process. The Broker then creates, monitors,
and manages a Servant process for the request—each request is handled by one
independent Servant process. Data is exchanged between the application thread
and the Servant through TCP/IP sockets. The source data is sent to the Servant
process as a data stream or file, converted in the Servant, and then returned to
the application thread. At that point, the application can either terminate the
Servant process or send more data for conversion.
Multiple conversion requests can be sent from multiple threads in the calling
application simultaneously. All requests sent from one thread are processed by
the Servant mapped to that thread, in other words, each thread can only have one
Servant to process its conversion requests.
Any standard conversion errors generated by the Servant are sent to the
application.
NOTE Currently, the main Export process and Servant
processes must run on the same host.
The following are requirements for running Export out of process:

Internet Protocol (TCP/IP) must be installed

Multi-threaded processing must be supported on the operating system
platform

The user application must be built with a multi-threaded runtime library
XML Export SDK C Programming Guide
•
•
• 43
•
•
•
Chapter 2 Getting Started
The following functions run in-process or out of process:
NOTE When converting out of process, these functions must be called
after the call to start an out-of-process session and before the call to end an
out-of-process session.
Other Export API functions and the File Extraction functions always run
in-process.
Configure Out-of-Process Conversions
Although most components of the out-of-process conversion are transparent, the
following parameters are configurable:

File-size threshold/temporary file location

Conversion time-out

Listener port numbers and time-out

Connection time-out and retry

Servant process name
These parameters are defined internally, but you can override the default by
defining the parameter in the formats_e.ini file. The formats_e.ini file is in
the directory install\OS\bin, where install is the pathname of the Export
installation directory and OS is the name of the operating system.
To set the parameters, add the following section to the formats_e.ini file:
[KVExportOOPOptions]
TempFileSizeMark=
TempFilePath=
WaitForConvert=
WaitForConnectionTime=
ListenerPortList=
ListenerTimeout=
ConnectRetryInterval=
ConnectRetry=
ServantName=
•
•
44 ••
•
•
XML Export SDK C Programming Guide
Convert Files Out of Process
Each parameter is described in Table 5. The default values for these parameters
are set to ensure reasonable performance on most systems. If you are processing
a large number of files, or running Export on a slow machine, you may need to
increase some of the time-out and retry values.
Table 5 Parameters for Out-of-Process Conversion
Parameter
Description
TempFileSizeMark
The file-size threshold. If the input file received by the Servant is
larger than this value, temporary files are created to store the
data. The directory in which the temporary files are stored is
defined by the TempFilePath parameter. If the file received is
smaller than this value, the data is stored in memory in the
Servant. This only applies when the input is a stream.
unit = megabytes
default=10
TempFilePath
type = file path
default = current working directory
The directory in which temporary files are stored. Temporary files
are created when the input file surpasses the file-size threshold
(TempFileSizeMark). If the Servant cannot access the file
path, an error is generated.
This only applies when converting in stream mode.
WaitForConvert
unit = seconds
The length of time to wait for a Servant to convert a file. If the
conversion is not completed within the specified time, the error
code “Wait for child process failed” is generated.
default = 1800
range = 30~3600
WaitForConnectionTime
unit = seconds
default = 180
range = 15~600
ListenerPortList
type = integer
default = 9985, 9986, 9987, 9988, 9989
ListenerTimeout
unit = seconds
default = 10
range = 5~30
XML Export SDK C Programming Guide
The length of time to wait for the Servant to connect to the
application thread after the application has sent a conversion
request to the Broker. If the Servant does not connect within the
specified time, the error code “Wait for child process
failed” is generated. If there are many Servant processes
running simultaneously, this value may need to be increased.
The TCP/IP port number(s) used for communication between the
calling application and the Servant. You can specify a single port
number or a series of numbers (enter the number separated by
commas).
The length of time to wait for the Servant listener thread to get a
process ID from the Servant after the connection is established.
If the ID is not obtained within the specified time, the error code
“Wait for child process failed” is generated. During
this time, no other Servant can connect with the application.
•
•
• 45
•
•
•
Chapter 2 Getting Started
Table 5 Parameters for Out-of-Process Conversion
Parameter
Description
ConnectRetryInterval
The length of time to wait after a Servant has failed to connect to
the application before it retries the connection. A Servant may be
unable to connect because the application is waiting for another
Servant to send a process ID.
unit = microseconds
default = 0.1
range = 50000~500000
To calculate the total retry interval, the value set here is added to
the platform-specific TCP retry value (on Windows, this is 1
second).
ConnectRetry
The number of attempts the Servant makes to connect to the
calling application. This value and the total retry interval
determine the total delay time. The total delay is calculated as
follows:
type = integer
default = 120
range = 30~600
ConnectRetryInterval +
platform-specific_TCP_retry_value * ConnectRetry
For example, if the ConnectRetryInterval is set to 2
seconds, and the Export process is running on Windows (the
default TCP retry value on Windows is 1 second), the total delay
would be:
2 + 1 * 120 = 360
The Servant would attempt to connect to the application every 3
seconds for 120 attempts for a total of 360 seconds.
ServantName
type = string
The name of the Servant process. To move the Servant to
another location, enter a fully qualified path.
default = servant
Run Export Out of Process—Overview
To convert files out of process
1. If required, set parameters for the out-of-process conversion in the
formats_e.ini file.
2. Initialize an Export session.
3. If you are using streams, create an input stream.
4. Define the conversion options.
5. Initialize an out-of-process session.
6. Convert the input and/or call other functions that can run out of process.
7. Shutdown the out-of-process session.
•
•
46 ••
•
•
XML Export SDK C Programming Guide
Convert Files Out of Process
8. Repeat Step 3 through Step 7 for additional files.
9. Terminate the out-of-process session and the Servant process.
10. Shutdown the Export session.
Recommendations

To ensure multi-threaded conversions are thread-safe, you must create a
unique context pointer for every thread by calling fpInit(). In addition,
threads must not share context pointers, and the same context pointer must
be used for all API calls in the same thread. Creating a context pointer for
every thread does not affect performance because the context pointer uses
minimal resources.
 All functions that can run out of process must be called within the
out-of-process session, that is, after the call to initialize the out-of-process
session and before the call to end the out-of-process session.

When terminating an out-of-process session, persist the Servant process by
setting the boolean flag bKeepServantAlive in the KVXMLEndOOPSession()
function or endOOPSession method. If the Servant process remains active,
subsequent conversion requests are processed more quickly because the
Servant process is already prepared to receive data. Only terminate the
Servant when there are no more out-of-process requests.

To recover from a failure in the Servant process, start a new out-of-process
session. This creates a new Servant process for the next conversion.
Run Export Out of Process in the C API
The cnv2xmloop sample program demonstrates how to run Export out of process.
To convert files out of process in the C API
1. If required, set parameters for the out-of-process conversion in the
formats_e.ini file. See “Configure Out-of-Process Conversions” on
page 44.
2. Declare instances of the following types and assign values to the members as
required:
KVXMLTemplateEx
KVXMLOptionsEx
KVXMLHeadingInfo
KVXMLTOCOptions
See “XML Export API Structures” on page 233 for more information.
XML Export SDK C Programming Guide
•
•
• 47
•
•
•
Chapter 2 Getting Started
3. Load the KVXML library and obtain the KVXMLInterface entry point by calling
KVXMLGetInterface(). See “KVXMLGetInterface()” on page 185.
4. Initialize an Export session by calling fpInit(). See “fpInit()” on page 199.
5. If you are using streams for the input and output source, follow these steps;
otherwise proceed to Step 6:
a. Create an input stream (KVInputStream) by calling
fpFileToInputStreamCreate(). See “fpFileToInputStreamCreate()” on
page 189.
b. Create an output stream (KVOutputStream) by calling
fpFileToOutputStreamCreate(). See “fpFileToOutputStreamCreate()”
on page 191.
c. Proceed to Step 6.
6. Set up an out-of-process session by calling KVXMLStartOOPSession(). See
“KVXMLStartOOPSession()” on page 221. This functions performs the
following:
 Initializes the out-of-process session.
 Specifies the input stream or file. If you are using an input file, set
pFileName to the filename, and set pInputStream to NULL. If you are
using an input stream, set pInputStream to point to KVInputStream, and
set pFileName to NULL.
 Sets conversion options in the KVXMLTemplate, KVXMLOptions, and
KVXMLTOCOptions data structures.
 Creates a Servant process.
 Establishes a communication channel between the application thread and
the Servant.
 Sends the data to the Servant.
See the sample code in “Example—KVXMLStartOOPSession” on page 49,
and “KVXMLStartOOPSession()” on page 221.
7. Convert the input and generate the output files by calling
KVXMLConvertFile() or fpConvertStream(). The structures
KVXMLTemplate, KVXMLOptions, and KVXMLTOCOptions are defined in the
call to KVXMLStartOOPSession(), and should be NULL in the conversion call.
A conversion function can only be called once in a single out-of-process
session. See “KVXMLConvertFile()” on page 214, and “fpConvertStream()” on
page 186.
8. Terminate the out-of-process session by calling KVXMLEndOOPSession(). The
Servant ends the current conversion session, and releases the source data
and session resources. See sample code in “Example—
•
•
48 ••
•
•
XML Export SDK C Programming Guide
Convert Files Out of Process
KVXMLEndOOPSession” on page 50, and “KVXMLEndOOPSession()” on
page 217.
9. If you used streams, free the memory allocated for the input stream and output
stream by calling the functions fpFileToInputSreamFree() and
fpFileToOutputStreamFree(). See “fpFileToInputStreamFree()” on
page 190 and “fpFileToOutputStreamFree()” on page 192.
10. Repeat Step 5 through Step 9 for additional files.
11. After all files are converted, terminate the out-of-process session and the
Servant process by calling KVXMLEndOOPSession() and setting the boolean
to FALSE.
12. After the out-of-process session and Servant are terminated, shutdown the
Export session by calling fpShutDown(). See “fpShutDown()” on page 203.
Example—KVXMLStartOOPSession
The following sample code is from the cnv2xmloop sample program:
/* declare OOP startsession function pointer */
KVXML_START_OOP_SESSION fpKVXMLStartOOPSession;
/* assign OOP startsession function pointer */
fpKVXMLStartOOPSession = (KVXML_START_OOP_SESSION)mpGetProcAddress
(hKVXML,"KVXMLStartOOPSession");
if(!fpKVXMLStartOOPSession)
{
printf("Error assigning KVXMLStartOOPSession pointer\n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);
mpFreeLibrary(hKVXML);
return 7;
}
/********START OOP SESSION *****************/
if(!(*fpKVXMLStartOOPSession)(pKVXML,
&Input,
NULL,
&XMLTemplates,
/* Mark-up and related variables */
&XMLOptions,
/* Options */
NULL,
/* TOC options */
&oopServantPID,
&error,
0,
NULL,
NULL))
{
printf("Error calling fpKVXMLStartOOPSession \n");
XML Export SDK C Programming Guide
•
•
• 49
•
•
•
Chapter 2 Getting Started
(*KVXMLInt.fpShutDown)(pKVXML);
mpFreeLibrary(hKVXML);
return 9;
}
Example—KVXMLEndOOPSession
The following sample code is from the cnv2xmloop sample program:
/* declare endsession function pointer */
KVXML_END_OOP_SESSION
fpKVXMLEndOOPSession;
/* assign OOP endsession function pointer */
fpKVXMLEndOOPSession = (KVXML_END_OOP_SESSION)mpGetProcAddress
(hKVXML, "KVXMLEndOOPSession");
if(!fpKVXMLEndOOPSession)
{
printf("Error assigning KVXMLEndOOPSession pointer\n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);
mpFreeLibrary(hKVXML);
return 8;
}
/********END OOP SESSION, DO NOT KEEP SERVANT ALIVE *********/
if(!(*fpKVXMLEndOOPSession)(pKVXML,
FALSE,
&error,
0,
NULL,
NULL))
{
printf("Error calling fpKVXMLEndOOPSession \n");
(*KVXMLInt.fpShutDown)(pKVXML);
mpFreeLibrary(hKVXML);
return 10;
}
Convert Files
KeyView Export SDK enables you to convert many different types of documents to
XML. Converting is the process of extracting the text from a document without the
application-specific markup, and applying XML markup. However, the conversion
process can also include the following:
•
•
50 ••
•
•
XML Export SDK C Programming Guide
Sub File Extraction

Extracting sub files—exposes all sub files for conversion. See “Sub File
Extraction” on page 51.

Setting conversion options—determines the content, structure, and
appearance of the XML output. See “Set Conversion Options” on page 52.

Extracting the file’s format—detects a file’s format, and reports the information
to the API, which in turn reports the information to the developer’s application.
See “Extract File Format Information” on page 99.

Extracting metadata—extracts selected metadata (document properties) from
a file. See “Extract Metadata” on page 96.

Converting character set—controls the character set of both the input and the
output text. See “Convert Character Sets” on page 99.

Implementing callbacks—controls the conversion while it is in progress. See
“XML Export API Callback Functions” on page 225.
You can use one of the following methods to convert documents:

Use the Export Demo sample program. This Visual Basic program
demonstrates most Export SDKs capabilities and is the easiest way to get
started. See “Use the Export Demo Program” on page 56

Use the C-language implementation of the API from your C or C++
application. See “Use the C-Language Implementation of the API” on page 59.

Use the C sample programs. See “Introduction” on page 133.
NOTE It is strongly recommended you convert documents out of process.
During out-of-process conversion, Export runs independently from the
calling application. Out-of-process conversions protects the stability of the
calling application in the rare case when a malformed document causes
Export to fail. See “Convert Files Out of Process” on page 43.
Sub File Extraction
To convert a file, you must first determine whether the source file contains any sub
files (attachments, embedded objects, and so on). A file that contains sub files is
called a container file. Compressed files (such as Zip), mail messages with
attachments (such as Microsoft Outlook Express), mail stores (such as Microsoft
Outlook Personal Folders), and compound documents with embedded OLE
objects (such as a Microsoft Word document with an embedded Excel chart) are
examples of container files.
XML Export SDK C Programming Guide
•
•
• 51
•
•
•
Chapter 2 Getting Started
If the file is a container file, the container must be opened and its sub files
extracted using the File Extraction API. The extraction process is done repeatedly
until all sub files are extracted and exposed for conversion. Once a sub file is
extracted, you can call the XML Export APIs to convert the file.
If a file is not a container, you should pass it directly to the XML Export API for
conversion without extraction.
See “Use the File Extraction API” on page 67 for more information.
Convert Outlook Email without Using the Extraction API
It is strongly recommended you convert all container files, including Microsoft
Outlook files, using the File Extraction API. However, you can convert Outlook
email messages (MSG) directly using the Export API and the MSG reader
(msgsr).
NOTE The MSG reader only extracts the message body of
an MSG file. Attachments are not extracted.
To convert MSG files using the MSG reader, add the following to the
formats_e.ini file (TRUE is case-sensitive):
[ContainerOptions]
bConvertMSG=TRUE
Set Conversion Options
Conversion options are parameters that determine the content, structure, and
appearance of the XML output. For example, you can specify the markup inserted
at the beginning and end of specific XML blocks, whether a heading is included in
the table of contents, the output character set, or the resolution at which graphics
are converted. The conversion options can be set either in the API or in the
template files. Regardless of the method used to set the options, the values are
ultimately passed to the API and used to populate the following data structures:
•
•
52 ••
•
•

KVXMLTemplate

KVXMLOptions

KVXMLHeadingInfo

KVXMLTOCOptions
XML Export SDK C Programming Guide
Set Conversion Options
The conversion options are described in “XML Export API Structures” on
page 233.
Set Conversion Options Using the API
The conversion options are set using any of the following functions:

fpConvertStream()

KVXMLConvertFile()

KVXMLStartOOPSession()
Set Conversion Options Using the Template Files
XML Export includes templates in the form of initialization files (.ini). The
templates provide a quick and easy way to modify the conversion options without
programming at the API level. However, the template files do not give you
complete control of the conversion process. To control some features, you must
use the API directly.
The template files can be fully customized using a text editor. For example, to
change the output character set from the default KVCS_UTF8 to KVCS_SJIS in the
xml1file.ini template, you would make the following change in bold:
[KVXMLOptions]
eOutputCharSet=KVCS_SJIS
bForceOutputCharSet=TRUE
To create valid XML, a template file must contain two structures:
KVXMLTemplateEx and KVXMLOptionsEx.
NOTE If you enter markup in the template files that is not compliant with
XML standards, XML Export inserts the markup into the output file
unchanged. This may result in a malformed XML file.
An application must then read the template file and write the data to the
appropriate Export structures. In the C sample program xmlini, a template file is
supplied as a command-line argument (see “xmlini” on page 139).
Templates
The template files for the C API implementation are in the directory install\
xmlexport\programs\ini, where install is the pathname of the Export
installation directory. The following templates are provided:
XML Export SDK C Programming Guide
•
•
• 53
•
•
•
Chapter 2 Getting Started
Template
Description
Cascading style sheet
(xml_css.ini)
This template writes style sheet information to an external CSS file. This
makes the XML output significantly smaller because the information is
not stored within the output file.
See “Use Style Sheets” on page 108 and “Use Style Sheets with xmlini”
on page 140 for more information on using an external CSS file.
Index (xml_index.ini)
•
•
54 ••
•
•
 Converts a source document into a single, largely unformatted XML
file that is appropriate for use with an indexing engine.
XML Export SDK C Programming Guide
Set Conversion Options
Template
Description
Single file( xml1file.ini)
 Creates a single XML file.
 Does not define an XSL style sheet. A default XSL style sheet that is
appropriate to the source document type is used. The defaults
supplied are wp.xsl (for word processing documents), ss.xsl (for
spreadsheets), pg.xsl (for presentations).
 Forces the output character set to UTF-8.
 Maintains the source document’s fonts and styles.
 Does not create a table of contents.
Single file for presentations
(xml1file_pg.ini)
This template is designed specifically for presentation formats.
 Creates a single XML file.
 Defines an XSL style sheet for presentations (pg.xsl).
 Forces the output character set to UTF-8.
 Since XML Export only extracts textual components from
presentations, the bRasterizeFiles member of KVXMLOptions
is set to FALSE. See “KVXMLOptions” on page 253.
 Only the szMainTop, szMainBottom, and szUserSummary
parameters of the KVXMLTemplate structure are relevant to
presentations and are set in the presentations template.
 A template file for presentations must not include any other
parameters in the KVXMLTemplate structure. See “KVXMLTemplate”
on page 262.
Single file with table of contents
(xml1filetoc.ini)
 Creates a single XML file.
 Creates a table of contents at the top of the XML document.
 Uses the Verity.dtd.
 Uses an XSL style sheet (wp.xsl).
 Forces the output character set to UTF-8.
 Lists all metadata (Title, Subject, Author, Comments, Created,
Modified, Last Saved By, and Revision Number).
 Uses the name of the worksheets for spreadsheets.
 Uses the slide titles for presentations. If no titles are available in the
source document, it uses “slide 1,” “slide 2,” “slide 3,” and so on.
XML Export SDK C Programming Guide
•
•
• 55
•
•
•
Chapter 2 Getting Started
Use the Export Demo Program
The easiest way to get started with XML Export is to become familiar with its
capabilities through the Visual Basic sample program, Export Demo. The source
code for the program is in the directory install\xmlexport\programs\
ExportDemo, where install is the pathname of the Export installation directory.
Export Demo is for Windows only, and requires Internet Explorer 4.01 with Service
Pack 1 or higher.
The output options for output files are pre-defined in Export Demo and cannot be
changed in the user interface. Export Demo uses a small sample of the options
available in the XML Export API.
You can use the sample documents in install\testdocs to experiment with
converting different file formats.
To launch the sample program, select Export Demo from Start | Programs|
Autonomy | XML Export. The following dialog appears:
Figure 2 Export Demo: Launching
NOTE HTML conversion using HTML Export is available in Export Demo
if you have HTML Export installed. If you do not have HTML Export
installed, the HTML button is disabled.
•
•
56 ••
•
•
XML Export SDK C Programming Guide
Use the Export Demo Program
Change Input/Output Directories
If XML Export is installed in the default directory, the output and input directories
are automatically set. The default location for source files is the directory
install\testdocs. The default location for output files is the directory install\
xmlexport\programs\tempout.
If XML Export is installed in a directory other than the default, you are prompted to
select an output and input directory when you first start up Export Demo.
To change the default directories for the source and output files
1. Select Options | Set Directories. The following dialog appears:
Figure 3 Export Demo: Setting Directories
2. From the tree view, select the drive letter and directory for the source or output
files.
3. In Change Location, select which files are in the directory, either Source or
XML.
4. Click Change. The Current Locations fields are updated with the new
selection.
5. Follow the same procedure for the other file types. When you are finished,
click OK.
Set Configuration Options
With XML Export, you can configure options prior to the document conversion
using the XMLConfig() function. Export Demo demonstrates this function, and
allows you to control the following options:

Generating output with verbose markup and without images.
XML Export SDK C Programming Guide
•
•
• 57
•
•
•
Chapter 2 Getting Started

Including position information in the markup generated for a PDF document.
Suppress Imagesn
Export Demo provides an option to generate output with verbose markup and
without images. For more information, see “KVXMLConfig()” on page 205.
To specify that images are suppressed in the XML output, select Options | XML
Config | Suppress Images.
Using PDF Position Information
Export Demo provides an option to include position information in the markup
generated for a PDF document. For more information, see “KVXMLConfig()” on
page 205.
To specify that PDF position information be included in the XML output, select
Options | XML Config | Enable Position Token.
Convert Files
To convert a single file:
1. Select Options | Convert | Single file.
2. Select the document from the file list, and click XML in the Convert file to
pane.
To convert files in a directory:
1. Select Options | Convert | Entire directory.
2. Click XML in the Convert directory to pane.
To view a converted file, double-click the output file in the Output Files pane
or select the output file and click View.
The converted file is displayed in the view pane.
•
•
58 ••
•
•
XML Export SDK C Programming Guide
Use the C-Language Implementation of the API
Figure 4 Export Demo: Converting Files
To view the original document, select the document from the file list, and click
Open. If you have an application on your system associated with the file, the file is
displayed in that application.
To delete output files, select the file in the Output Files pane and click Delete.
Use the C-Language Implementation of the API
The C-language implementation of the XML Export API is divided into the
following function suites:

File Extraction API Functions—Open and extract sub files in a container file.
They also extract metadata and file format information, and control character
set conversion on extraction.

XML Export API Functions— Extract format information (metadata, character
set, and format), create an input/output stream from a file, and open, convert,
and close the stream.

XML Export API Callback Functions—Controls the conversion while it is in
progress.
XML Export SDK C Programming Guide
•
•
• 59
•
•
•
Chapter 2 Getting Started
Input/Output Operations
In the XML Export API, the source input and target output can be either a physical
file accessed through a file path, or a stream created from a data source. A stream
is a C structure containing pointers to functions similar in nature to their standard
ANSI C counterparts. This structure is passed to Export functions in place of the
standard input source. The input stream is defined by the structure
KVInputStream in kvtypes.h. The output stream is defined by the structure
KVOutputStream in kvtypes.h. See “KVInputStream” on page 235 and
“KVOutputStream” on page 237.
You can create an input stream using the function
fpFileToInputStreamCreate(), and an output stream using the function
fpFileToOutputStreamCreate(). These functions assign C equivalent I/O
functions to fpOpen(), fpRead(), fpSeek(), fpTell(), and fpClose(). See
“fpFileToInputStreamCreate()” on page 189 and “fpFileToOutputStreamCreate()”
on page 191.
Convert Files
To use the C-language implementation of the API
1. Develop the XML markup and tokens to be assigned to the required members
of a declared instance of KVXMLTemplate.
If you use markup in the structure that is not compliant with XML standards,
XML Export inserts the markup into the output file unchanged. This may result
in a malformed XML file.
2. Declare instances of the following types and assign values to the members as
required:
KVXMLTemplateEx
KVXMLOptionsEx
KVXMLHeadingInfo
KVXMLTOCOptions
See “XML Export API Structures” on page 233 for more information.
3. Load the KVXML library and obtain the KVXMLInterface entry point by calling
KVXMLGetInterface(). See “KVXMLGetInterface()” on page 185.
4. Initialize an Export session by calling fpInit(). The function’s return value,
pContext, is passed as the first parameter to all other Export functions. See
“fpInit()” on page 199.
5. Pass the context pointer from fpInit() and the address of a structure
containing pointers to the File Extraction API functions in the call to
KVGetExtractInterface(). See “KVGetExtractInterface()” on page 146.
•
•
60 ••
•
•
XML Export SDK C Programming Guide
Use the C-Language Implementation of the API
6. If you are using streams for the input and output source, follow these steps;
otherwise, proceed to Step 7:
a. Create an input stream (KVInputStream) by calling
fpFileToInputStreamCreate(), or using code similar to the example
code in the sample programs. See “fpFileToInputStreamCreate()” on
page 189.
b. Create an output stream (KVOutputStream) by calling
fpFileToOutputStreamCreate(), or using code similar to the example
code in the sample programs. See “fpFileToOutputStreamCreate()” on
page 191.
c. Proceed to Step 7.
7. Declare the input stream or filename in the KVOpenFileArg structure. See
“KVOpenFileArg” on page 173.
8. Open the source file by calling fpOpenFile() and passing the
KVOpenFileArg structure. This call defines the parameters necessary to open
a file for extraction. See “fpOpenFile()” on page 157.
9. Determine whether the source file is a container file (contains sub files) by
calling fpGetMainFileInfo(). See “fpGetMainFileInfo()” on page 151.
10. If the call to fpGetMainFileInfo() determined the source file is a container
file, proceed to Step 11; otherwise, proceed to Step 14.
11. Determine whether the sub file is itself a container (contains sub files) by
calling fpGetSubFileInfo(). See “fpGetSubFileInfo()” on page 153.
12. Extract the sub file by calling fpExtractSubFile(). See “fpExtractSubFile()”
on page 148.
13. If the call to fpGetSubFileInfo() determined the sub file is a container file,
repeat Step 6 through Step 12 until all sub files are extracted; otherwise,
proceed to Step 14.
14. Setup an out-of-process session by calling KVXMLStartOOPSession(). See
“KVXMLStartOOPSession()” on page 221.
15. Convert the input and generate the output files by calling
KVXMLConvertFile() or fpConvertStream(). The structures
KVXMLTemplate, KVXMLOptions, and KVXMLTOCOptions are defined in the
call to KVXMLStartOOPSession(), and should be NULL in the conversion call.
A conversion function can only be called once in a single out-of-process
session. See “fpConvertStream()” on page 186 or “KVXMLConvertFile()” on
page 214.
XML Export SDK C Programming Guide
•
•
• 61
•
•
•
Chapter 2 Getting Started
If you are using callbacks, they are called while the conversion process is
underway. If required, alternate paths and filenames can be specified for
output files, including using the table of content entries for the filenames. See
“XML Export API Callback Functions” on page 225
16. If you are converting additional files, terminate the out-of-process session by
calling KVXMLEndOOPSession() and setting the boolean to TRUE. The Servant
ends the current conversion session, and releases the source data and
session resources.
If you are not converting additional files, terminate the out-of-process session
and the Servant process by calling KVXMLEndOOPSession() and setting the
boolean to FALSE. See “KVXMLEndOOPSession()” on page 217
17. Close the file by calling fpCloseFile(). See “fpCloseFile()” on page 147.
18. If you used streams, free the memory allocated for the input stream and output
stream by calling the functions fpFileToInputSreamFree() and
fpFileToOutputStreamFree(). See “fpFileToInputStreamFree()” on
page 190 and “fpFileToOutputStreamFree()” on page 192.
19. Repeat Step 6 through Step 18 for additional source files.
20. Shutdown the Export session by calling fpShutDown(). See “fpShutDown()”
on page 203.
Multi-threaded Conversions
To ensure multi-threaded conversions are thread-safe, you must create a unique
context pointer for every thread by initializing the Export session using fpInit().
In addition, threads must not share context pointers, and the same context pointer
must be used for all API calls in the same thread. Creating a context pointer for
every thread does not affect performance because the context pointer uses
minimal resources.
For example, your code should have the following logic for one thread:
fpInit()
KVGetExtractInterface()
fpFileToInputStreamCreate()
fpFileToOutputStreamCreate()
fpOpenFile()
fpGetMainFileInfo()
/* container file */
fpGetSubFileInfo()
fpExtractSubFile
fpGetSubFileMetadata()
KVXMLStartOOPSession()
fpConvertStream()
KVXMLEndOOPSession(bKeepServantAlive TRUE)
fpCloseFile()
•
•
62 ••
•
•
XML Export SDK C Programming Guide
Use the Verity Document Type Definition (DTD)
fpFileToInputSreamFree()
fpFileToOutputStreamFree()
set input/output file
fpOpenFile()
fpGetMainFileInfo()
/* not a container file */
KVXMLStartOOPSession()
KVXMLConvertFile()
KVXMLEndOOPSession(bKeepServantAlive TRUE)
fpCloseFile()
...
fpShutdown()
Use the Verity Document Type Definition (DTD)
XML Export produces well-formed, valid XML documents. Document validity is
based on a Document Type Definition (DTD) called the Verity.dtd. The
Verity.dtd is in the default output directory tempout. If the DTD is in a different
directory, the full path must be specified in pszVerityDTDPath.
The elements in the Verity.dtd are based on those defined in the W3C XHTML
1.0 specification and the attributes are based on those defined in the W3C CSS 2
specification.
The root element of each document is “VerityXMLExport.” Character entities are
imported by using the three XHTML DTDs defined at the beginning of the
Verity.dtd.
<!-- Character entities -->
<!ENTITY % HTMLlat1x SYSTEM "HTMLlat1x.ent">
%HTMLlat1x;
<!ENTITY % HTMLspecialx SYSTEM "HTMLspecialx.ent">
%HTMLspecialx;
<!ENTITY % HTMLsymbolx SYSTEM "HTMLsymbolx.ent">
%HTMLsymbolx;
Use XML Style Language Transformation (XSLT)
XML Export is designed to generate XML documents based on the Verity DTD.
You can convert the XML produced by XML Export to other XML vocabularies,
such as Wireless Markup Language (WML), using XSLT.
XML Export SDK C Programming Guide
•
•
• 63
•
•
•
Chapter 2 Getting Started
Add Elements and Attributes to the DTD
XML Export can only generate XML that conforms to the Verity DTD. You can
create your own DTD based on the Verity DTD. You cannot rename the Verity
DTD, so make sure you back up the original Verity DTD to another name before
making changes.
If you create your own DTD and add elements or attributes that are not defined in
the original Verity DTD, you must ensure the new markup is defined in the XML
Export API classes. You can define the markup by entering the markup directly in
the styles, or populating the styles using the template files. See “Map Styles” on
page 104 for more information on mapping styles to user-defined markup.
Move the DTD
The default output directory for the Verity DTD is programs\tempout. If you move
the Verity DTD to another output directory, you must set the string value of
pszVerityDTDPath to the new location. This path is added to the document type
declaration in the XML file. See “pszVerityDTDPath” on page 254.
•
•
64 ••
•
•
XML Export SDK C Programming Guide
PART 2
Use the Export API
This section explains how to perform some basic tasks using
the File Extraction and Export APIs, and describes the sample
programs. It contains the following chapters:

Use the File Extraction API

Use the XML Export API

Sample Programs
Part 2 Use the Export API
•
•
66 ••
•
•
XML Export SDK C Programming Guide
CHAPTER 3
Use the File Extraction API
This section describes how to extract sub-files from a container file using the File
Extraction API. It contains the following topics.

Introduction

Extract Sub Files

Recreate a File’s Hierarchy

Extract Mail Metadata

Extract Sub Files from Outlook Files

Extract Sub Files from Outlook Express Files

Extract Sub Files from Mailbox Files

Extract Sub Files from Outlook Personal Folders Files

Extract Sub Files from Lotus Domino XML Language Files

Extract Sub Files from Lotus Notes Database Files

Extract Sub Files from PDF Files

Extract Embedded OLE Objects

Extract Sub Files from ZIP Files

Default Filenames for Extracted Sub Files
XML Export SDK C Programming Guide
•
•
• 67
•
•
•
Chapter 3 Use the File Extraction API
Introduction
To convert a file, you must first determine whether the file contains any sub files
(attachments, embedded OLE objects, and so on). A file that contains sub files is
called a container file. A container file has a main file (parent) and sub files
(children) embedded in the main file.
The following are examples of container files:
 Archive files such as ZIP, TAR, and RAR.

Mail messages such as Outlook (MSG) and Outlook Express (EML).

Mail stores such as Microsoft Outlook Personal Folders (PST), Mailbox
(MBX), and Lotus Notes database (NSF).

PDF files containing file attachments.

Compound documents with embedded OLE objects such as a Microsoft Word
document with an embedded Excel chart.
NOTE “Supported Formats” on page 293 indicates which
formats are treated as container files and are supported by the
File Extraction API.
The sub files may also be container files, creating a file hierarchy of multiple
levels. For example, let us say an MSG file (the root parent) contains three
attachments:
•
•
68 ••
•
•

a Microsoft Word document containing an embedded Microsoft Excel
spreadsheet.

an AutoCAD drawing file (DWG).

an EML file with an attached Zip file, which in turn contains four archived files.
XML Export SDK C Programming Guide
Extract Sub Files
Figure 5 shows the file’s hierarchy.
Figure 5 Example Container File Tree Structure
NOTE The parent MSG file contains four first-level children.
The body text of a message file, although not a standalone file
in the container, is considered a child of the parent file.
Extract Sub Files
To convert all files in a container file, the container must be opened and its sub
files extracted using the File Extraction API. The extraction process is done
repeatedly until all sub files are extracted and exposed for conversion. Once a sub
file is extracted, you can call Export API functions to convert the file.
If you require a container file, including sub files, to be converted to a single file,
you must extract all files from the container, convert the files, and then append
each converted output to its parent.
To extract sub files, follow this general procedure
1. Pass the context pointer from fpInit() and the address of a structure
containing pointers to the File Extraction API functions in the call to
KVGetExtractInterface(). See. “KVGetExtractInterface()” on page 146.
2. Declare the input stream or filename in the KVOpenFileArg structure. See
“KVOpenFileArg” on page 173.
XML Export SDK C Programming Guide
•
•
• 69
•
•
•
Chapter 3 Use the File Extraction API
3. Open the source file by calling fpOpenFile() and passing the
KVOpenFileArg structure. This call defines the parameters necessary to
open a file for extraction. See “fpOpenFile()” on page 157.
4. Determine whether the source file is a container file (contains sub files) by
calling fpGetMainFileInfo(). See “fpGetMainFileInfo()” on page 151.
5. If the call to fpGetMainFileInfo() determined the source file is a
container file, proceed to Step 6; otherwise, convert the file.
6. Determine whether the sub file is itself a container (contains sub files) by
calling fpGetSubFileInfo(). See “fpGetSubFileInfo()” on page 153.
7. Extract the sub file by calling fpExtractSubFile(). See
“fpExtractSubFile()” on page 148.
8. If the call to fpGetSubFileInfo() determined the sub file is a container file,
repeat Step 2 through Step 7 until all sub files are extracted and the lowest
level of sub files is reached; otherwise, convert the file.
Recreate a File’s Hierarchy
When a container file is extracted, any relationships between the sub files in the
container are not maintained. However, the File Extraction interface provides
information that enables you to recreate the hierarchy. The hierarchy can be used
to create a directory structure in a file system, or to categorize documents
according to their relationship to each other. For example, if you use KeyView to
generate text for a search engine, the hierarchical information enables your users
to search for a document based on the document’s parent or sibling. In addition,
when the document is returned to the user, the parent and sibling documents can
be returned as recommendations.
The information needed to recreate a file’s hierarchy is provided in the call to
fpGetSubFileInfo(). See “fpGetSubFileInfo()” on page 153. The members
KVSubFileInfo->parentIndex and KVSubFileInfo->childArray
provide information about a sub file’s parent and children. Since you can only
retrieve the first-level children in the sub file, you must call
fpGetSubFileInfo() repeatedly until information for the leaf-node children is
extracted.
Create a Root Node
Because of their structure, some container files do not contain a sub file or folder
which acts as a root directory on which the hierarchy can be based. For example,
sub files in a Zip archive can be extracted, but none of the sub files represent the
root of the hierarchy. In this case, an artificial root node must be created at the top
•
•
70 ••
•
•
XML Export SDK C Programming Guide
Recreate a File’s Hierarchy
of the file hierarchy as a point of reference for each child, and ultimately to
recreate the relationships. This artificial root node is an internal object, and is
extracted to disk as a directory called root. Its index number is 0.
To create the root node, set openFlag to KVOpenFileFlag_CreateRootNode
in the call to fpOpenFile(). See “fpOpenFile()” on page 157. When a root node
is created, the value of numSubFiles in KVMainFileInfo includes the root
node (see “KVMainFileInfo” on page 169). For example, when you call
fpGetMainFileInfo() on a Microsoft Word document with three embedded
OLE objects and the root node is disabled, numSubFiles is 3. If you create a
root node, numSubFiles is 4.
Recreate a File’s Hierarchy—Example
For example, let us say we extract a PST file containing seven sub files with a root
node enabled. The call to fpGetMainFileInfo()returns the number of sub
files as 8 (seven sub files and one root node). Figure 6 shows the structure and
the available hierarchy information after the sub files are extracted:
Figure 6 Extracted PST File
XML Export SDK C Programming Guide
•
•
• 71
•
•
•
Chapter 3 Use the File Extraction API
The parentIndex specifies the index number of a sub file’s parent. The
childArray specifies an array of a sub file’s children. With this information, you
can recreate the hierarchy shown in Figure 7.
Figure 7 Recreated File Hierarchy
Extract Mail Metadata
You can extract metadata, such as subject, sender, and recipient, from MSG,
EML, MBX, PST, and NSF files, by calling the fpGetSubFileMetaData() function.
You can extract a pre-defined set of metadata fields and/or individual fields that
are unique to a file format.
Default Metadata Set
KeyView internally defines a set of common mail metadata fields that can be
extracted as a group from mail formats. This default metadata set is listed in
Table 6. When you retrieve all metadata for a file—that is, pass NULL for the array
of metadata—the complete set of default metadata, not all available metadata in
the file, is returned.
•
•
72 ••
•
•
XML Export SDK C Programming Guide
Extract Mail Metadata
Table 6 Default Mail Metadata List
Field Name (string to specify)
Description
From
The display name and e-mail address of the sender.
Sent
The time the message was sent.
To
The display names and email addresses of the recipients.
Cc
The display names and email addresses of recipients
who receive copies of the email.
Bcc
The display names and email addresses of recipients
who received blind copies of the email.
Subject
The text in the subject line of the message.
Priority
The priority applied to the message.
Because mail formats use different terms for the same fields, the format’s reader
maps the default field name to the appropriate format-specific name. For example,
when retrieving the default metadata set, the NSF field Importance is mapped to
the name Priority and is returned.
You can also extract the default field names individually by passing the field name
(such as From, To, and Subject); however, in this case, the string is not mapped to
the format-specific name. For example, if you pass Priority in the call, you will
retrieve the contents of the Priority field from an MBX file, but will not retrieve the
contents of the Importance field from an NSF file.
NOTE You cannot pass the field names listed in Table 6 on
page 73 individually for PST files. However, you can pass either
the MAPI tag number or the MAPI tag name as integers. See
“Microsoft Personal Folders File (PST) Metadata” on page 78.
Extract the Default Metadata Set
To extract the default metadata set, call the fpGetSubFileMetadata()
function, and pass 0 for metaNameCount and NULL for metaNameArray. See
“fpGetSubFileMetaData()” on page 155.
KVGetSubFileMetaArgRec metaArg;
KVSubFileMetaData pMetaData = NULL;
KVStructInit(&metaArg);
metaArg.index = subFileIndex;
metaArg.metaNameCount = 0;
metaArg.metaNameArray = NULL;
XML Export SDK C Programming Guide
•
•
• 73
•
•
•
Chapter 3 Use the File Extraction API
error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg,
&pMetaData);
...
extractInterface->fpFreeStruct(pFile,pMetaData);
pMetaData = NULL;
Microsoft Outlook (MSG) Metadata
In addition to the default metadata set, the metadata fields listed in Table 7 can be
extracted for MSG files. The field name must be passed to metaNameArray in
the call to the fpGetSubFileMetadata() function.
Table 7 MSG-specific Metadata List
•
•
74 ••
•
•
Field Name (string to specify)
Description
AttachFileName
An attachment's long filename and extension, excluding
path.
ConversationTopic
The topic of the first message in a conversation thread. A
conversation thread is a series of messages and replies.
This is the first message’s subject with any prefix removed.
CreationTime
The time the message or attachment was created. This
value is displayed in the Sent field in the message’s
Properties dialog in Outlook.
InternetMessageID
The identifier for messages that come in over the Internet.
This is the MAPI property PR_INTERNET_MESSAGE_ID.
This property is not in the MAPI headers or MAPI
documentation.
LastModificationTime
The time the message or attachment was last modified.
This value is displayed in the Modified field in the
message’s Properties dialog in Outlook.
Location
The physical location of the event specified in the Outlook
calendar entry.
MessageID
The message transfer system (MTS) identifier for the
message transfer agent (MTA). This value is displayed on
the Message ID tab in the message’s Properties dialog in
Outlook.
Received
The date and time a message was delivered. This value is
displayed in the Received field in the message’s
Properties dialog in Outlook.
XML Export SDK C Programming Guide
Extract Mail Metadata
Table 7 MSG-specific Metadata List
Field Name (string to specify)
Description
Sender
The name and e-mail address of the message sender. This
value is a concatenation of two MAPI properties in the
following format:
"PR_SENDER_NAME" <PR_SENDER_EMAIL_ADDRESS>
The Sender value may be the same as or different than
the default metadata From value (see “Default Metadata
Set” on page 72), depending on which MAPI properties
exist in the MSG file.
Sensitivity
The value indicating the message sender's opinion of the
sensitivity of a message. For example, Personal, Private,
or Confidential. This value is displayed in the Sensitivity
field in the message’s Properties dialog in Outlook.
TransportMsgHeaders
Contains transport-specific message envelope
information. This value corresponds to the MAPI property
PR_TRANSPORT_MESSAGE_HEADERS.
StartDate
Contains an appointment start date. This value
corresponds to the PR_START_DATE MAPI property.
EndDate
Contains an appointment end date. This value
corresponds to the PR_END_DATE MAPI property.
Extract MSG-Specific Metadata
To extract specific metadata fields from an MSG file, call the
fpGetSubFileMetadata() function, and pass the field name defined in
Table 7 to metaNameArray (the string is not case sensitive). See
“fpGetSubFileMetaData()” on page 155.
For example, the following code extracts the contents of the
ConversationTopic and MessageID fields:
KVGetSubFileMetaArgRec metaArg;
KVSubFileMetaData pMetaData = NULL;
KVStructInit(&metaArg);
KVMetaNameRec names[2];
KVMetaName
pname[2];
names[0].type = KVMetaNameType_String;
names[0].name.sname = “conversationtopic”;
names[1].type = KVMetaNameType_String;
names[1].name.sname = “MessageID”;
pname[0] = &names[0];
pname[1] = &names[1];
XML Export SDK C Programming Guide
•
•
• 75
•
•
•
Chapter 3 Use the File Extraction API
metaArg.metaNameCount = 2;
metaArg.metaNameArray = pname;
metaArg.index = subFileIndex;
error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg,
&pMetaData);
...
extractInterface->fpFreeStruct(pFile,pMetaData);
pMetaData = NULL;
Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata
In addition to the default metadata set, you can extract any metadata field that
exists in the header of an EML or MBX file by passing the field’s name. If the
name is a valid field in the file, the contents of the field is returned. For example, to
retrieve the name of the last mail server that received the message before it was
delivered, you can pass the string “Received”.
Extract EML- or MBX-Specific Metadata
To extract specific metadata fields from an EML or MBX file, call the
fpGetSubFileMetadata() function, and pass the metadata name to
metaNameArray (the string is not case sensitive). See “fpGetSubFileMetaData()”
on page 155.
For example, the following code extracts the contents of the Received and
Mime-version fields:
KVGetSubFileMetaArgRec metaArg;
KVSubFileMetaData pMetaData = NULL;
KVStructInit(&metaArg);
KVMetaNameRec names[2];
KVMetaName
pname[2];
names[0].type = KVMetaNameType_String;
names[0].name.sname = “Received”;
names[1].type = KVMetaNameType_String;
names[1].name.sname = “Mime-version”;
pname[0] = &names[0];
pname[1] = &names[1];
metaArg.metaNameCount = 2;
metaArg.metaNameArray = pname;
metaArg.index = subFileIndex;
error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg,
&pMetaData);
...
extractInterface->fpFreeStruct(pFile,pMetaData);
pMetaData = NULL;
•
•
76 ••
•
•
XML Export SDK C Programming Guide
Extract Mail Metadata
Lotus Notes Database (NSF) Metadata
In addition to the default metadata set, you can extract any Lotus field name that
exists in an NSF file by passing the field’s name. (You can extract fields from mail
NSF files and non-mail NSF files.) If the name is a valid field in the file, the field is
returned. For example, to retrieve the date a document in an NSF file was last
accessed, you would pass the string “$LastAccessedDB”.
NOTE A complete list of NSF fields are provided in the Lotus
Notes file stdnames.h. This header file is available in the Lotus
API Toolkit.
Extract NSF-Specific Metadata
To extract specific metadata fields from an NSF file , call the
fpGetSubFileMetadata() function, and pass the metadata name to
metaNameArray (the string is not case sensitive). See “fpGetSubFileMetaData()”
on page 155.
For example, the following code extracts the contents of the Description and
Categories fields:
KVGetSubFileMetaArgRec metaArg;
KVSubFileMetaData pMetaData = NULL;
KVStructInit(&metaArg);
KVMetaNameRec names[2];
KVMetaName
pname[2];
names[0].type = KVMetaNameType_String;
names[0].name.sname = “description”;
names[1].type = KVMetaNameType_String;
names[1].name.sname = “Categories”;
pname[0] = &names[0];
pname[1] = &names[1];
metaArg.metaNameCount = 2;
metaArg.metaNameArray = pname;
metaArg.index = subFileIndex;
error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg,
&pMetaData);
...
extractInterface->fpFreeStruct(pFile,pMetaData);
pMetaData = NULL;
XML Export SDK C Programming Guide
•
•
• 77
•
•
•
Chapter 3 Use the File Extraction API
Microsoft Personal Folders File (PST) Metadata
In addition to the default metadata set, you can extract Messaging Application
Programming Interface (MAPI) properties from a PST file. These properties
describe all elements of an Outlook item in a PST file (such as subject, sender,
recipient, and message text). Since the properties are stored in the PST file itself,
they can be retrieved before the contents of the PST are extracted. This enables
you to determine whether an Outlook item should be extracted based on its
attributes. Some MAPI properties are also stored for Outlook attachments that are
not mail messages (such as an attached Microsoft Word document or Lotus 1-2-3
file).
NOTE Since all elements of a message (except non-mail
attachments) are represented by MAPI properties, you can
extract all components of a sub file, including the header and
message text, by calling the fpGetSubFileMetadata() function.
MAPI Properties
Each MAPI property is identified by a property tag, which is a constant containing
the property type and a unique identifier. For example, the property that indicates
whether a message has attachments has the following components:
Property
PR_HASATTACH
Identifier
0x0E1B
Property type
PT_BOOLEAN (000B)
Property tag
0x0E1B000B
The Microsoft MAPI documentation on the Microsoft Developer Network Web site
lists all available MAPI properties, their tags, and types.
You can retrieve any MAPI property that is of one of the MAPI property types
listed below:
•
•
78 ••
•
•
PT_I2
PT_DOUBLE
PT_STRING8
PT_I4
PT_FLOAT
PT_TSTRING
PT_BINARY
PT_LONG
PT_SYSTIME
PT_BOOLEAN
PT_SHORT
PT_UNICODE
XML Export SDK C Programming Guide
Extract Mail Metadata
NOTE Properties with a PT_TSTRING type have the property type
recompiled to either a Unicode string (PT_UNICODE) or to an ANSI string
(PT_STRING8) depending on the operating system’s character set. To
retrieve the Unicode property, pass in the Unicode version of the tag. For
example, the property tag for PR_SUBJECT is either 0x0037001E for an
ANSI string, or 0x0037001F for a Unicode string.
Extract PST-Specific Metadata
In the call to extract sub file metadata, you can pass either the MAPI tag number
(such as 0x0070001e) or the MAPI tag name (such as
PR_CONVERSATION_TOPIC). If you specify the MAPI tag name, you must include
the Windows header files mapitags.h and mapidefs.h in which the MAPI tag
name is defined as a tag number.
To extract specific MAPI properties from a PST file, call the
fpGetSubFileMetadata() function, and pass the property tag to
metaNameArray. See “fpGetSubFileMetaData()” on page 155. The tag is passed
as an integer.
For example, the following code extracts the MAPI properties PR_SUBJECT and
PR_ALTERNATE_RECIPIENT:
KVGetSubFileMetaArgRec metaArg;
KVSubFileMetaData pMetaData = NULL;
KVMetaNameRec names[2];
KVMetaName
pName[2];
names[0].type = KVMetaNameType_Integer;
names[0].name.iname = PR_SUBJECT;
names[1].type = KVMetaNameType_Integer;
names[1].name.iname = 0x3A010102;
pName[0] = &names[0];
pName[1] = &names[1];
KVStructInit(&metaArg);
metaArg.metaNameCount = 2;
metaArg.metaNameArray = pName;
metaArg.index = SubFileIndex;
error = extractInterface->fpGetSubFileMetaData
(pFile,&metaArg,&pMetaData);
...
extractInterface->fpFreeStruct(pFile,pMetaData);
pMetaData = NULL;
XML Export SDK C Programming Guide
•
•
• 79
•
•
•
Chapter 3 Use the File Extraction API
NOTE You must include the Windows header files
mapitags.h and mapidefs.h in which PR_SUBJECT is
defined as 0x0037001E.
Exclude Metadata from the Extracted Text File
When a mail message is extracted, its message text and header information (To,
From, Sent, and so on) are also extracted. You can prevent the header
information from appearing in the text file.
To exclude the header information, set the flag extractFlag to
KVExtractionFlag_ExcludeMailHeader in the call to
fpExtractSubFile(). See “fpExtractSubFile()” on page 148.
Extract Sub Files from Outlook Files
When an Outlook file (MSG) is extracted to disk, it’s message text and header
information (To, From, Sent, and so on) are extracted to a text file. (If you do not
want the header information to appear in the text file, see “Exclude Metadata from
the Extracted Text File” on page 80.) If the Outlook file contains a non-mail
attachment, the attachment is extracted in its native format to a sub directory. If
Outlook file contains a mail attachment, the attachment’s message text is
extracted to a sub directory.
Extract Sub Files from Outlook Express Files
When an Outlook Express (EML) file is extracted to disk, its message text and
header information (To, From, Sent, and so on) are extracted to a text file. (If you
do not want the header information to appear in the text file, see “Exclude
Metadata from the Extracted Text File” on page 80.) If an Outlook file contains a
non-mail attachment, the attachment is extracted in its native format to the same
directory as the message text file. If the Outlook file contains a mail attachment,
the complete attachment (including message text and attachments), message text
file, and non-mail attachment(s) are extracted to a the same directory as the main
message.
NOTE When the MBX reader (mbxsr) is enabled, it is used to
filter MBX and EML files. If the MBX reader is not enabled, the
EML reader (emlsr) is used.
•
•
80 ••
•
•
XML Export SDK C Programming Guide
Extract Sub Files from Mailbox Files
Extract Sub Files from Mailbox Files
A Mailbox (MBX) file is a collection of individual emails compiled with RFC 822
and RFC 2045 - 2049 (MIME), and divided by message separators. There are
many mail applications that export to an MBX format, such as Eudora Email and
Mozilla Thunderbird.
When an MBX file is extracted to disk, the message text and header information
(To, From, Sent, and so on) from each mail file are extracted to text files. (If you
do not want the header information to appear in the text file, see “Exclude
Metadata from the Extracted Text File” on page 80.)
In Eudora MBX files, attachments are inserted as a link and are stored externally
from the message. These attachments are not extracted, but the path to the
attachment is returned in the call to the fpGetSubFileInfo() function
(“fpGetSubFileInfo()” on page 153). You can write code to retrieve the attachment
based on the returned path.
For MBX files from other clients, KeyView extracts attachments when they are
embedded in the message.
NOTE The Mailbox (MBX) reader is an advanced feature and is sold and
licensed separately. To enable this reader in a KeyView SDK, you must
obtain the appropriate license key from Autonomy. See “Update License
Information” on page 33 for information on adding a new license key to an
existing installation.
Extract Sub Files from Outlook Personal Folders
Files
KeyView can extract Outlook items such as messages, appointments, contacts,
tasks, notes, and journal entries from a PST file. When a PST file is extracted to
disk, the text and header information (To, From, Sent, and so on) from each
Outlook item are extracted to a text file. (If you do not want the header information
to appear in the text file, see “Exclude Metadata from the Extracted Text File” on
page 80.)
You can also extract messages from PST files as MSG files, including all their
attachments, by setting the KVExtractionFlag_SaveAsMSG flag in the
KVExtractSubFileArg structure when calling fpExtractSubFile(). See
“KVExtractSubFileArg” on page 163.
XML Export SDK C Programming Guide
•
•
• 81
•
•
•
Chapter 3 Use the File Extraction API
If an Outlook item contains a non-mail attachment, the attachment is extracted in
its native format to a sub directory. If an Outlook item contains an Outlook
attachment, the attached item’s text and attachment(s) are extracted to a sub
directory.
NOTE The Microsoft Outlook Personal Folders (PST) reader is an
advanced feature and is sold and licensed separately. To enable this reader
in a KeyView SDK, you must obtain the appropriate license key from
Autonomy. See “Update License Information” on page 33 for information on
adding a new license key to an existing installation.
Use the Native or MAPI-based Reader
KeyView accesses PST files in one of two ways:

indirectly using the Microsoft’s Messaging Application Programming Interface
(MAPI) reader named pstsr.

directly using the native PST reader named pstnsr.
On UNIX and Windows x64 and IA-64, the native reader is always used to
process PST files because the MAPI-based reader only runs on Windows x86. On
Windows x86, you can specify either reader, however, the MAPI-based reader is
used by default. The differences between the two readers are summarized in the
following table:
•
•
82 ••
•
•
Feature/Requirement
Native Reader
(pstnsr)
MAPI-based Reader
(pstsr)
All platforms supported
Yes
Windows only
Outlook client required
No
Yes
MAPI properties
supported
Yes
Yes.
All properties defined in
mapitags.h. Object
properties are not
supported.
All properties defined in
mapitags.h. Object
properties are not
supported.
Password-protection
supported
Yes
Yes (using
KVCredential structure)
Compressible encryption
supported
Yes
Yes
High encryption
supported
No
Yes
XML Export SDK C Programming Guide
Extract Sub Files from Outlook Personal Folders Files
To specify the MAPI-based reader be used for PST files, change the PST entry in
the formats_e.ini file as follows:
297=pst
To specify the native reader be used for PST files, change the PST entry in the
formats_e.ini file as follows:
297=pstn
NOTE You must ensure the PST you are extracting is not open
in the Outlook client and the Outlook process is not running.
Use the Native PST Reader (pstnsr)
The native PST reader accesses PST files directly without relying on the Microsoft
interface to the PST format. It runs on both Windows and UNIX and does not
require an Outlook client to be installed on the system processing the PST files.
However, the native reader does not support password-protected PST files that
use high encryption.
Use the MAPI Reader (pstsr)
The pstsr reader accesses PST files indirectly using Microsoft’s Messaging
Application Programming Interface (MAPI). MAPI is a standard Windows
message interface that enables different mail programs and other mail-aware
applications (such as word processors and spreadsheets) to exchange messages
and attachments with each other. MAPI allows KeyView to open a PST file,
traverse the folders and Outlook items, and extract the items inside the PST file.
NOTE When extracting sub files from PST files, information on the
distribution list used in an e-mail is extracted to a file called
emailname.dist. This applies to the MAPI reader (pstsr) only.
System Requirements
Since MAPI is only supported on Windows platforms, you can only convert PST
files on Windows. And since MAPI relies on functionality in Microsoft Outlook, a
Microsoft Outlook client must be installed on the same machine as the application
converting PST files, and must be the default e-mail application. KeyView
supports the following PST formats and Outlook clients:

Outlook 97 or higher PST files

Outlook 2002 or Outlook 2003 clients
XML Export SDK C Programming Guide
•
•
• 83
•
•
•
Chapter 3 Use the File Extraction API
NOTE The Outlook client must be the same version as or
newer than the version of Outlook that generated the PST file.
MAPI Attachment Methods
The way in which the contents of a PST message attachment can be accessed is
determined by the MAPI attachment method applied to the attachment. For
example, if the attachment is an embedded OLE object, then it uses the
ATTACH_OLE attachment method. KeyView can access message attachments
that use the following attachment methods:
ATTACH_BY_VALUE
ATTACH_EMBEDDED_MSG
ATTACH_OLE
ATTACH_BY_REFERENCE
ATTACH_BY_REF_ONLY
ATTACH_BY_REF_RESOLVE
Attachments using the ATTACH_BY_VALUE, ATTACH_EMBEDDED_MSG, or
ATTACH_OLE attachment methods are extracted automatically when the PST file
is extracted. An “attach by reference” method means the attachment is not in
Outlook, but Outlook contains an absolute path to the attachment. Before you can
extract these types of attachments, you must retrieve the path to access the
attachment.
To extract “attach by reference” attachments
1. Determine whether the attachment uses an ATTACH_BY_REFERENCE,
ATTACH_BY_REF_ONLY, or ATTACH_BY_REF_RESOLVE method by retrieving
the MAPI property PR_ATTACH_METHOD.
2.
If the attachment uses one of the “attach by reference” methods, get the fully
qualified path to the attachment by retrieving the MAPI properties
PR_ATTACH_LONG_PATHNAME or PR_ATTACH_PATHNAME.
3. You can then either copy the files from their original location to the path where
the PST file is extracted, or use the Export API functions to convert the
attachment.
•
•
84 ••
•
•
XML Export SDK C Programming Guide
Extract Sub Files from Lotus Domino XML Language Files
Open Secured PST Files
KeyView enables you to specify credentials (user name and password), which are
used to open a secured PST file for extraction. See “Password Protected Files” on
page 409 for more information.
Detect PST Files While the Outlook Client is Running
If you are running an Outlook client while running the File Extraction API, the
KeyView format detection module (kwad) may not be able to open the PST file to
determine the file’s format because Outlook has the file locked. In this case, you
may do one of the following:

Close Outlook when using the Extraction API

Detect PST files by extension only and bypass the format detection module.
To enable this option, add the following lines to the formats_e.ini file.
[container_flags]
detectPSTbyExtension=1
NOTE The detectPSTbyExtension option only applies
when you are using the MAPI reader (pstsr).
NOTE If you use this option, you must ensure in your code that valid PST
files are passed to KeyView because the format detection module will not be
available to verify the file type and pass the file to the appropriate reader.
Extract Sub Files from Lotus Domino XML
Language Files
When a Lotus Domino XML Language (.DXL) file is extracted, its message text
and header information (To, From, Sent, and so on) are extracted to a text file.
NOTE To prevent header information from being extracted,
see “Exclude Metadata from the Extracted Text File” on
page 80.
You can ensure that dates and times extracted from Lotus Domino .DXL files are
displayed in a uniform format.
XML Export SDK C Programming Guide
•
•
• 85
•
•
•
Chapter 3 Use the File Extraction API
To extract custom date/time formats
 In the formats_e.ini file, set the DateTimeFormat option in the [dxlsr]
section. For example:
[dxlsr]
DateTimeFormat=%m/%d/%Y %I:%M:%S %p
In this example, dates and times are extracted in the following format:
02/11/2003 11:36:09 AM
The format arguments are the same as those for the strftime() function.
Refer to the following Web page for more information.
http://msdn.microsoft.com/en-us/library/fe06s4ak%28VS.71%29.aspx
Extract Sub Files from Lotus Notes Database Files
A Lotus Notes database is a single file that contains multiple documents called
notes. Notes include design notes (such as forms, views, folders, navigators,
outlines, pages, framesets, agents, and resources), data document notes, profile
document notes, access control list notes, and collection (index) notes. KeyView
can extract text items, attachments, and OLE objects from data document notes
only. Data document notes include emails, journal entries, discussion threads,
documents (Microsoft Office and Lotus SmartSuite), and so on.
All components of a note are prefixed by field names such as “SendTo:”,
“Subject:”, and “Body:”. When a note is extracted, the field names are not
included in the extracted output; only the field values are extracted.
When a mail message in an NSF file is extracted to disk, the body text and header
information, such as the values from the SendTo, From, and DeliveredDate
fields, in each message is extracted to a text file. (If you do not want the header
information to appear in the message text file, see “Exclude Metadata from the
Extracted Text File” on page 80.)
NOTE The Lotus Notes Database (NSF) reader is an advanced feature
and is sold and licensed separately. To enable this reader in a KeyView
SDK, you must obtain the appropriate license key from Autonomy. See
“Update License Information” on page 33 for information on adding a new
license key to an existing installation.
•
•
86 ••
•
•
XML Export SDK C Programming Guide
Extract Sub Files from Lotus Notes Database Files
System Requirements
The NSF format is proprietary. Therefore, KeyView accesses NSF files indirectly
using Lotus Notes API. Since the NSF reader relies on functionality in Lotus
Notes, a Lotus Notes client or Lotus Domino server must be installed and
configured on the same machine as the application converting NSF files. On UNIX
and Linux, the Lotus Domino server is required. On Windows, the Lotus Notes
client or Lotus Domino server is required.
KeyView supports the following Lotus Notes clients and Domino servers:

Lotus Notes 6.5.1

Lotus Domino 6.5.1
KeyView supports NSF files on the same platforms supported by Lotus Notes and
Lotus Domino:

Windows XP x86 (Service Pack 1 and 2)

Windows 2000 x86 (Service Pack 2)

Solaris 8.0 and 9.0 (built on Solaris 8.0)

Red Hat Enterprise Linux AS 3.0 (x86)

SuSE Linux Enterprise Server 8 and 9 (x86)

IBM AIX 5.1, 5L version 5.2
Installation and Configuration
Before KeyView can convert NSF files, you must set up the Lotus Notes client or
Lotus Domino server. Full configuration is not required. The following steps outline
the minimal setup for NSF conversion:
Windows
1. Install the Lotus Notes client or Lotus Domino server. You do not need to
configure the client or server.
2. Ensure the file notes.ini is in the proper location.
 If Lotus Notes is installed, the file should appear in the install\lotus\
notes directory, where install is the installation directory.
 If only Lotus Domino is installed, the file should appear in the install\
lotus\domino directory, where install is the installation directory.
If the file does not exist, create an ASCII file named notes.ini, and add the
following text:
[Notes]
XML Export SDK C Programming Guide
•
•
• 87
•
•
•
Chapter 3 Use the File Extraction API
3. Add the KeyView bin directory and the install\lotus\notes or
install\lotus\domino directory to the PATH environment variable (the
KeyView bin directory must be first in the path). It is recommended you add
the KeyView bin directory because the Lotus Notes or Domino server
installation may contain older KeyView OEM libraries.
Solaris
1. Install Lotus Domino server. You do not need to configure the server.
2. Ensure the file notes.ini is in the install/lotus/notes/latest/
sunspa directory, where install is the directory where Lotus Notes is
installed. If the file does not exist, create an ASCII file named notes.ini,
and add the following text:
[Notes]
3. Add the install/lotus/notes/latest/sunspa directory to the PATH
environment variable:
setenv PATH install/lotus/notes/latest/sunspa:$PATH
4. Add the install/lotus/notes/latest/sunspa and the KeyView bin
directory to the LD_LIBRARY_PATH environment variable:
setenv LD_LIBRARY_PATH keyview_bin:install/lotus/notes/latest/
sunspa:$LD_LIBARY_PATH
where keyview_bin is the location of the KeyView bin directory. It is
recommended you add the KeyView bin directory because the Lotus Notes
installation may contain older KeyView OEM libraries.
AIX 5.x
1. Install the bos.iocp.rte file set if it is not already installed, and reboot the
machine. See the Lotus Domino server documentation for more information.
2. Install Lotus Domino server. You do not need to configure the server.
3. Ensure the file notes.ini is in the install/lotus/notes/latest/
ibmpow directory, where install is the directory where Lotus Notes is
installed. If the file does not exist, create an ASCII file named notes.ini,
and add the following text:
[Notes]
4. Add the install/lotus/notes/latest/ibmpow directory to the PATH
environment variable:
setenv PATH install/lotus/notes/latest/ibmpow:$PATH
5. Add the install/lotus/notes/latest/ibmpow and the KeyView bin
directory to the LIBPATH environment variable:
•
•
88 ••
•
•
XML Export SDK C Programming Guide
Extract Sub Files from Lotus Notes Database Files
setenv LIBPATH keyview_bin:install/lotus/notes/latest/
ibmpow:$LIBPATH
where keyview_bin is the location of the KeyView bin directory. It is
recommended you add the KeyView bin directory because the Lotus Notes
installation may contain older KeyView OEM libraries.
Linux
1. Install Lotus Domino server. You do not need to configure the server.
2. Ensure the file notes.ini is in the install/lotus/notes/latest/
linux directory, where install is the directory where Lotus Notes is
installed. If the file does not exist, create an ASCII file named notes.ini,
and add the following text:
[Notes]
3. Add the install/lotus/notes/latest/linux directory to the PATH
environment variable:
setenv PATH install/lotus/notes/latest/linux:$PATH
4. Add the install/lotus/notes/latest/linux and the KeyView bin
directory to the LD_LIBRARY_PATH environment variable:
setenv LD_LIBRARY_PATH keyview_bin:install/lotus/notes/latest/
linux:$LD_LIBRARY_PATH
where keyview_bin is the location of the KeyView bin directory. It is
recommended you add the KeyView bin directory because the Lotus Notes
installation may contain older KeyView OEM libraries.
Open Secured NSF Files
KeyView enables you to specify credentials (user ID file and password) which are
used to open a secured NSF file for extraction. See “Password Protected Files” on
page 409 for more information.
Format Note Sub Files
The KeyView NSF reader uses XML templates to format note sub-files. You can
customize the templates as required to approximate the look and feel of the
original notes as closely as possible. For more information, see “Extract and
Format Lotus Notes Sub Files” on page 393.
XML Export SDK C Programming Guide
•
•
• 89
•
•
•
Chapter 3 Use the File Extraction API
Extract Sub Files from PDF Files
KeyView can extract document-level and page-level attachments from a PDF
document. Document-level attachments are added by using the Attach A File tool
and may include links to or from the parent document or to other file attachments.
Page-level attachments are added as comments by using various tools.
Page-level or comment attachments display the File Attachment icon or the
Speaker icon on the page where they are located.
When a PDF’s attachments are extracted to disk, the attachments are saved in
their native format.
Extract Embedded OLE Objects
Embedded OLE objects can be converted in two ways:

Using the File Extraction API, the OLE object is first extracted from the main
file and saved to disk (see “File Extraction API Functions” on page 145). It can
then be converted by making a separate conversion call.

Using the XML Export API, the main file is converted to XML and the OLE
object is converted to a graphics file that is referenced in the XML file (see
“XML Export API Functions” on page 183).
The File Extraction API can extract embedded OLE objects from the following
types of documents:

Lotus Notes (DXL)

Microsoft Excel

Microsoft Word

Microsoft PowerPoint

Microsoft Outlook

Microsoft Visio

Microsoft Project

OASIS Open Document

Rich Text Format (RTF)
When an embedded OLE object is extracted from its parent file, the location
where the embedded file appears in the original document is not available. The
parent and child are extracted as separate files.
•
•
90 ••
•
•
XML Export SDK C Programming Guide
Extract Sub Files from ZIP Files
Extract Sub Files from ZIP Files
ZIP files that are not password-protected can be extracted using the general
method (see “Extract Sub Files” on page 69. However, some ZIP files use
password protection, in which case you must use a different method to enter the
required credentials. See “Password Protected Files” on page 409 for more
information.
Default Filenames for Extracted Sub Files
When a filename is not specified in the call to fpExtractSubFile() (see
“fpExtractSubFile()” on page 148) in some cases, a default filename is applied to
the extracted sub file.
Default Filename for Mail Formats
To avoid naming conflicts and problems with long filenames, KeyView applies its
own names to the extracted mail items when a name is not supplied in the call to
fpExtractSubFile(). A non-mail attachment retains its original filename and
extension.
When the contents of a mail store or the message body of a mail message are
extracted, the extracted filenames may include the following:

The first valid eight characters of the original folder name or “Subject” line of
the mail message. If the “Subject” line is empty, the characters kvext are
used, where ext is the format’s extension. For example, the characters would
be “kvmsg” for MSG and “kvnsf” for NSF.
The following special characters are considered invalid and are ignored:
any non-printing character with a value less than 0x1F
angle brackets (< >)
double quote (“)
asterisk (*)
forward slash (/)
back slash (\)
pipe (|)
colon (:)
question mark (?)
For notes, the filename is derived from the first 24 characters of the note text.
For contact entries, the filename is derived from the full name of the contact.
XML Export SDK C Programming Guide
•
•
• 91
•
•
•
Chapter 3 Use the File Extraction API

The characters _kvn, where n is an integer incremented from 0 for each
extracted item.

One of the following extensions:
Type
File Extension
email message
.mail
calendar appointment
.cal
contact entry
.cont
task entry
.task
note
.note
journal entry
.jrnl
distribution list
.dist
posting note
.post
 If the type cannot be determined for an MSG or PST file, the file is given
a .mail extension.
 If the type cannot be determined for a NSF file, the file is given a .tmp
extension.
 The format of a MAIL file is plain text by default, but can be set to RTF with
the KVExtractionFlag_GetFormattedBody flag.
For example, an MSG mail message with the subject line RE: Product roadmap
containing the Microsoft Excel attachment release_schedule.xls would be
extracted as
RE produ_kv0.mail
release_schedule.xls
If an extracted message contains an embedded OLE object or any attachment
that does not have a name, the object or attachment is extracted as _kv#.tmp.
Default Filename for Embedded OLE Objects
KeyView can apply a default name to an extracted embedded OLE object when a
name is not supplied in the call to fpExtractSubFile(). When an embedded
OLE object is extracted, the extracted filename may include the following:

•
•
92 ••
•
•
The first valid eight characters of the main file. The following special
characters are considered invalid and are ignored:
XML Export SDK C Programming Guide
Default Filenames for Extracted Sub Files
any non-printing character with a value less than 0x1F
angle brackets (< >)
double quote (“)
asterisk (*)
forward slash (/)
back slash (\)
pipe (|)
colon (:)
question mark (?)

The characters _kvn, where n is an integer incremented from 0 for each
extracted object.

If KeyView can determine the embedded OLE is a Microsoft Office document,
the original extension is used. If the file type cannot be determined, the file is
given a .tmp extension.
For example, let us say a Microsoft Word document (sales_quarterly.doc)
contains two embedded OLE objects: a Microsoft Excel file called
west_region.xls, and a Bitmap created in the Word document. The
embedded objects would be extracted as
sales_qu_kv0.xls
sales_qu_kv1.tmp
XML Export SDK C Programming Guide
•
•
• 93
•
•
•
Chapter 3 Use the File Extraction API
•
•
94 ••
•
•
XML Export SDK C Programming Guide
CHAPTER 4
Use the XML Export API
This section describes how to perform some basic tasks using the XML Export
API. It contains the following topics:

Extract Metadata

Extract File Format Information

Convert Character Sets

Map Styles

Use Style Sheets

Display Vector Graphics on UNIX and Linux

Convert Revision Tracking Information

Convert PDF Files

Convert Spreadsheet Files

Convert XML Files
XML Export SDK C Programming Guide
•
•
• 95
•
•
•
Chapter 4 Use the XML Export API
Extract Metadata
When a file format supports metadata, KeyView can extract and process that
information. Metadata includes document information fields such as title, author,
creation date, and file size. Depending on the file’s format, metadata is referred to
in a number of ways: for example, “summary information,” “OLE summary
information,” “file information,” and “document properties.”
The metadata in mail formats (MSG and EML) and mail stores (PST, NSF, and
MBX) is extracted differently than other formats. For information on extracting
metadata from these formats, see “Extract Mail Metadata” on page 72.
NOTE KeyView can only extract metadata from a document if metadata is
defined in the document, and the document reader can extract metadata for
the file format. The section “Supported Formats” on page 294 lists the file
formats for which metadata can be extracted. KeyView does not generate
metadata automatically from the document contents.
Extract Metadata Using the API
You can extract the metadata at the API level. The API extracts all valid metadata
fields that exist in the file.
To extract metadata using the C API
1. Declare a pointer to the KVSummaryInfoEx structure. See
“KVSummaryInfoEx” on page 244.
2. Call the fpGetSummaryInfo() function. See “fpGetSummaryInfo()” on
page 197.
Extract Metadata Using a Template File
When using a template file, KeyView recognizes two types of metadata: standard
and non-standard. Standard metadata includes fields, such as Title, Author, and
Subject. The standard fields are enumerated from 1 to 41 in KVSumType in the
header file kvtypes.h. Non-standard metadata includes any field not listed from 1
to 41 in KVSumType, such as user-defined fields (for example, custom property
fields in Microsoft Word documents), or fields that are unique to a particular file
type (for example, “Artist” or “Genre” fields in MP3 files). Enumerated types 42
and greater are reserved for non-standard metadata.
•
•
96 ••
•
•
XML Export SDK C Programming Guide
Extract Metadata
To extract metadata using a template file
1. Insert metadata tokens in a member of the KVXMLTemplate structure in the
template files. This defines the point at which the metadata appears in the
XML output.
2. If you are using the $USERSUMMARY or $SUMMARY token, define the
szUserSummary member of the KVXMLTemplate structure in the template file.
This determines the markup and tokens generated when these metadata
tokens are processed.
3. In your application, read the template file and write the data to the
KVXMLTemplate structure. See “xmlini” on page 139.
The following tokens can be used in the template files:
$SUMMARYNN
Inserts the data from a specified metadata field. NN is a number
from 00 through 33 that is enumerated in KVSumType in
kvtypes.h.
$SUMMARY
Inserts the data from valid metadata fields in the range of 0 to 33
using the markup provided in pszUserSummary.
$USERSUMMARY
Inserts the data from every valid non-standard metadata field using
the markup provided in pszUserSummary.
$CONTENT
Inserts the content of the metadata field specified by the $NAME
token.
$NAME
Inserts the name of a the metadata field, such as “Title,” “Author,” or
“Subject.”
Examples
$SUMMARYNN
The following markup displays the contents of the “Title” field at the top of the
main XML file:
szMainTop=$SUMMARY01
In KVSumType, 01 is the enumerated value for the “Title” metadata field.
$SUMMARY
The following markup extracts all standard fields, and includes them in the first H1
XML block:
szFirstH1Start=$SUMMARY
szUserSummary=<MetaData name="$NAME" content="$CONTENT" />
XML Export SDK C Programming Guide
•
•
• 97
•
•
•
Chapter 4 Use the XML Export API
This example extracts the field name ($NAME) and field content ($CONTENT) for
standard metadata and includes it at the beginning of the first heading level 1 XML
block.
The generated XML may look like this:
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
<MetaData
name="CodePage" content="1252" \>
name="Title" content="My design document" \>
name="Subject" content="design specifications" \>
name="Author" content="John Doe" \>
name="Keywords" content="" \>
name="Comments" content="" \>
name="Template" content="Normal.dot" \>
name="LastAuthor" content="lchapman" \>
name="RevNumber" content="6" \>
name="EditTime" content="01/01/1601, 0:08" \>
name="LastPrinted" content="14/01/2002, 14:06" \>
name="Create_DTM" content="27/08/2003, 10:31" \>
name="LastSave_DTM" content="29/08/2003, 14:07" \>
name="PageCount" content="1" \>
name="WordCount" content="4062" \>
name="CharCount" content="23159" \>
name="AppName" content="Microsoft Word 9.0" \>
name="Security" content="0" \>
name="Category" content="software" \>
name="LineCount" content="192" \>
name="ParCount" content="46" \>
name="ScaleCrop" content="FALSE" \>
name="Manager" content="" \>
name="Company" content="Autonomy" \>
name="LinksDirty" content="FALSE" \>
$USERSUMMARY
The following markup extracts non-standard fields, and includes them at the
bottom of the main XML file:
szMainBottom=$USERSUMMARY
szUserSummary=<MetaData name="$NAME" content="$CONTENT" />
This example extracts the field name ($NAME) and field content ($CONTENT) for
non-standard metadata from a document, and includes it at the bottom of the main
XML file.
The generated XML may look like this:
<MetaData
<MetaData
<MetaData
<MetaData
•
•
98 ••
•
•
name="Telephone number" content="444-111-2222"
name="Recorded date" content="07/03/2003, 23:00"
name="Source" content="TRUE"
name="my property" content="reserved"
XML Export SDK C Programming Guide
Extract File Format Information
Extract File Format Information
Export can detect a file’s format, and report the information to the API, which in
turn reports the information to the developer’s application. This feature enables
you to apply customized conversion settings based on a file’s format. See “File
Format Detection” on page 347 for more information on format detection.
To extract file format information using the C API
1. Declare a pointer to the KVStreamInfo data structure. See “KVStreamInfo” on
page 239
2. Call the fpGetStreamInfo() function. See “fpGetStreamInfo()” on page 196.
Convert Character Sets
Export enables you to control the character set of both the input and the output
text. This is accomplished by either

setting the source and/or target character set in the API, or

basing the input/output on the character set of the document (if the document
character set is stored in the document and can be determined by the
document reader).
The character sets are enumerated in KVCharSet of kvtypes.h. Not all character
sets can be used to specify the target character set. See Table 33 on page 341 for
a list of character sets that can be used as a target character set.
Determine the Character Set of the Output Text
To determine the output character set of a converted document, Export considers
the following:

Whether the reader can extract the character set from the document. This
depends on whether the file format can provide character set information and
whether the document actually contains character set information.
The section “Supported Formats” on page 294 indicates the file formats for
which character set information can be extracted. If character set information
cannot be determined for your document type, you must set the source and/or
target character set in the API.

Whether a source character set is set in the API.
XML Export SDK C Programming Guide
•
•
• 99
•
•
•
Chapter 4 Use the XML Export API
NOTE To set the source character set, you must specify a
character set and set the bForceSrcCharSet member of
the KVXMLOptions structure to TRUE.

Whether a target character set is set in the API.
NOTE To set the target character set, you must specify a
character set and set the parameter
bForceOutputCharSet member of the KVXMLOptions
structure to TRUE.
Guidelines for Character Set Conversion
Figure 8 shows how the output character set is determined when the document
character set can be determined:
Figure 8 Document Character Set Can Be Determined
•
•
100 ••
•
•
XML Export SDK C Programming Guide
Convert Character Sets
Figure 9 shows how the output character set is determined when the document
character set cannot be determined:
Figure 9 Document Character Set Cannot Be Determined
Examples of Character Set Conversion
The examples below demonstrate possible configurations for mapping character
sets and the expected output for each scenario.
XML Export SDK C Programming Guide
•
•
• 101
•
•
•
Chapter 4 Use the XML Export API
Document Character Set Can be Determined
For the example in Table 8, the document is an RTF file. The section “Word
Processing Formats” on page 310 indicates the document character set can be
obtained from this file type. The document character set is Traditional Chinese
(BIG5).
Table 8 Document Character Set Can be Determined
Source charset set
Target charset set
Output charset
KVCS_GB
KVCS_UTF8
KVCS_UTF8
Converts GB (Simplified Chinese) to
UTF-8. Output character set is the target
character set specified in the API.
KVCS_GB
--
KVCS_GB
Converts BIG5 to GB (Simplified Chinese).
Output character set is the source
character set specified in the API.
--
KVCS_UTF8
KVCS_UTF8
Converts BIG5 to UTF-8. Output character
set is the target character set specified in
the API.
--
--
KVCS_BIG5
Output character set is the document
character set. No conversion.
•
•
102 ••
•
•
XML Export SDK C Programming Guide
Convert Character Sets
Document Character Set Cannot be Determined
For the example in Table 9, the document is an ASCII file. The section “Word
Processing Formats” on page 310 indicates the document character set cannot
be obtained from this file type. The document character set is KVCS_1251.
Table 9 Document Character Set Cannot be Determined
Source charset set
Target charset set
Output charset
KVCS_1252
KVCS_UTF8
KVCS_UTF8
Converts KVCS_1252 to KVCS_UTF8.
Output character set is the target character
set specified in the API.
KVCS_1252
KVCS_UNKNOWN
KVCS_1252
Output character set is the source
character set specified in the API because
KVCS_UNKNOWN cannot be used. No
conversion.
KVCS_1252
--
KVCS_1252
Output character set is the source
character set specified in the API. No
conversion.
--
KVCS_1252
KVCS_1252
Converts OS code page to KVCS_1252.
Output character set is the target character
set specified in the API.
--
--
Output character set is OS code page. No
conversion.
Set the Character Set During Conversion
You can convert the character set of a file at the time the file is converted.
To specify the source character set for documents from which the
document character set cannot be obtained by the reader
1. Set the eSrcCharSet member of the structure KVXMLOptions to one of the
character sets enumerated in KVCharSet in kvtypes.h. See “KVXMLOptions”
on page 253.
2. Set the bForceSrcCharSet member of the structure KVXMLOptions to TRUE.
See “KVXMLOptions” on page 253.
XML Export SDK C Programming Guide
•
•
• 103
•
•
•
Chapter 4 Use the XML Export API
To specify the target character set:
1. Set the eOutputCharSet member of the KVXMLOptions structure to one of
the character sets enumerated in KVCharSet in kvtypes.h. See
“KVXMLOptions” on page 253.
2. Set the bForceOutputCharSet member of the structure KVXMLOptions to
TRUE. See “KVXMLOptions” on page 253.
Set the Character Set During File Extraction from a Container
You can convert the character set of a container sub file at the time the sub file is
extracted from the container and before it is converted to XML. This is most often
used to set the output character set of a mail message’s body text. See “Use the
File Extraction API” on page 67.
To specify the source character set of a sub file, call the fpExtractSubFile()
function, and set the KVExtractSubFileArg->srcCharset argument to any
value in the enumerated list in KVCharSet of kvtypes.h. See “fpExtractSubFile()”
on page 148.
To specify the target character set of a sub file, call fpExtractSubFile(), and
set the KVExtractSubFileArg->trgCharSet argument to any value in the
enumerated list in KVCharSet of kvtypes.h. See “fpExtractSubFile()” on
page 148.
Map Styles
Export can map paragraph and character styles in any word processing format
that contains styles (such as Microsoft Word, RTF, or Folio Flat File) to
user-defined markup. With this feature, you can redact (hide) text in the source
document, delete content, or change the overall structure of the output. You can
also embed style sheet styles in the output defined in the XML.
To enable style mapping, you must indicate which paragraph and/or character
styles are to be mapped, and define the starting and ending markup to be
included in the XML output. For example, if the source Microsoft Word document
contains the character style “Recipe,” and the content of the style in Microsoft
Word is “Brownies,” you can specify that the starting markup be <recipe> and the
ending markup </recipe>. This would result in the output XML containing:
<recipe>Brownies</recipe>.
You can also use style mapping to control the look of the XML output by either
using a Cascading Style Sheet (CSS) or defining the style directly in the starting
markup. For example, if a Word document contains the paragraph style “Colorful”,
you can have markup of the form <div class=”rainbow”> inserted at the front
•
•
104 ••
•
•
XML Export SDK C Programming Guide
Map Styles
of the paragraph and markup of the form </div> inserted at the end of the
paragraph. “Rainbow” is a CSS style defined in an externally provided CSS file
referenced at the top of the XML output.
If you map styles to elements or attributes that are not defined in the DTD, you
must add the new elements or attributes to the DTD. You must also ensure the
new markup is defined in the API, either by entering the markup directly in the
classes, or populating the classes using the template files.
Use the C API
To map styles using the C API
1. Define the KVStyle structure. See “KVStyle” on page 241. The information in
this structure includes:
 the markup to be added to the beginning and end of a paragraph or
character style.
 the name of the word processing style (for example, “Heading 1”) to which
style mapping applies. Style names are case sensitive.
 the flag which defines instructions on how to process the content
associated with a paragraph or character style. The flags are defined in
kvtypes.h and described in Table 10 on page 107.
2. Call the fpSetStyleMapping() function. See “fpSetStyleMapping()” on
page 201.
Use a Template file
To map styles using a template file
1. Use the KVStyle parameter to specify how many styles are being mapped.
For example, if there are nine mapped heading levels, add the following:
[KVStyle]
NumStyles=9
2. For each style, there must be a [StyleX] entry that contains the markup that
appears at the start and end of the defined style. For example, the first
heading level is defined as follows:
[Style1]
StyleName=Colorful
MarkUpStart=<div class="colorful">
MarkUpEnd=<!-- end of colorful --></div>
These values are used in StyleName, MarkUpStart, and MarkUpEnd in the
KVStyle structure. See “KVStyle” on page 241.
XML Export SDK C Programming Guide
•
•
• 105
•
•
•
Chapter 4 Use the XML Export API
3. For each style, define the flag that applies. Flags define instructions on how to
process the content associated with a paragraph or character style. They are
defined in kvtypes.h and described in Table 10 on page 107. This value is
used in dwflags of the KVStyle structure. See “KVStyle” on page 241. The
value associated with each flag is a hexadecimal number. You can set an
option by either entering the converted decimal value or entering the flag’s
text.
Flags=0
A finished entry in a template file could look like this:
[KVStyle]
NumStyles=3
[Style1]
StyleName=Colorful
MarkUpStart=<div class="Colorful">
MarkUpEnd=<!-- End of Colorful --></div>
Flags=0
[Style2]
StyleName=RedactPara
MarkUpStart=<div class="RedactPara">
MarkUpEnd=<!-- End of RedactPara --></div>
Flags=2048
[Style3]
StyleName=Code
MarkUpStart=<pre>
MarkUpEnd=<!-- End of Code --></pre>
Flags=KVSTYLE_PRE
•
•
106 ••
•
•
XML Export SDK C Programming Guide
Map Styles
Table 10 Flags for Defining Styles
Flag
Description
KVSTYLE_PRE
The KVSTYLE_PRE flag specifies that white space should be
preserved (treated as characters, not word separators), and that
mode changes, such as changes in font size within a paragraph,
should be ignored. This allows the tags <pre> and </pre> to be
used.
KVSTYLE_HEADING[1-6]
The flags KVSTYLE_HEADING[1-6] specify that a given style is to
be detected and processed as a heading. Heading flags are
exclusive. This means a style cannot be processed as both H1 and
H2.
By default, Export maps the heading style “Heading 1” to <h1></
h1>, and so on, for heading levels 1 through 6. If you use style
mappings, the default mapping is overridden. Therefore, you must
supply markup for all heading levels. Export uses heading levels to
define the overall structure of the XML output.
KVSTYLE_ORDERLIST
The KVSTYLE_ORDERLIST flag specifies that the style should be
tagged as an ordered list. Currently not implemented.
KVSTYLE_UNORDEREDLIST
The KVSTYLE_UNORDERLIST flag specifies that the style should
be tagged as an unordered list. Currently not implemented.
KVSTYLE_DELETECONTENT
The KVSTYLE_DELETECONTENT flag specifies that the content
associated with the style tag should be deleted from the output.
KVSTYLE_ONCONSECUTIVE
PARAGRAPHS
The KVSTYLE_ONCONSECUTIVEPARAGRAPHS flag specifies that
the style should be applied to consecutive paragraphs of the
document. If this flag is used, and two or more paragraphs require
the same style, the opening and closing tags that normally appear
between each paragraph are not generated.
KVSTYLE_REDACT
The KVSTYLE_REDACT flag is used to hide sensitive or confidential
information in the source document. It specifies that the text
associated with the style tag should be replaced in the XML output
with a selected character. The default replacement character is “X,”
but you can specify a different replacement character by setting
cRedact. See “cRedact” on page 259.
XML Export SDK C Programming Guide
•
•
• 107
•
•
•
Chapter 4 Use the XML Export API
Use Style Sheets
XML is a content-based metalanguage designed to structure data. XML does not
include information about how a document should be displayed in a browser. To
view an XML document in a browser, information about how its displayed must be
provided by style sheets. These are coded using either Cascading Style Sheets
(CSS) or Extensible Stylesheet Language (XSL).
The style sheet options are enumerated in KVXMLStyleSheetType.
Use Extensible Style Sheet Language (XSL)
You can use XSL style sheets to specify how XML data is displayed in a browser.
Existing XSL style sheets can be used, but unlike CSS, style sheet information
cannot be written to an external XSL file during the conversion.
Both CSS and XSL style sheets can be used to format XML documents. However,
XSL can also transform XML documents. For example, list items can be
transformed to display in alphabetical order, words can be replaced by other
words, or empty elements can be replaced by text.
To use an existing XSL style sheet
1. Set eStyleSheetType to XML_XSL to enable XSL style sheet mapping.
2. Set bUseExistingStyleSheet to TRUE to apply a pre-existing style sheet to
an XML document. Pre-existing style sheets are not validated.
3. Specify the path and filename of the style sheet file in pszStyleSheet.
If bUseExistingStyleSheet is set to TRUE and pszStyleSheet is not
specified, a default XSL style sheet that is appropriate for the source
document type is used.
The following are default XSL style sheets:
 wp.xsl (for word processing documents)
 ss.xs l (for spreadsheets)
 pg.xsl (for presentation graphics)
Use Cascading Style Sheets (CSS)
In addition to XSL style sheets, Export can write style sheet information to an
external CSS file. The C sample program xmlini provides an example of how to
use an existing style sheet, and output formatting data to an external file. See
“xmlini” on page 139.
•
•
108 ••
•
•
XML Export SDK C Programming Guide
Display Vector Graphics on UNIX and Linux
To enable CSS mapping and output the resulting formatting data in an
external file
1. Set eStyleSheetType to XML_CSS.
2. Use the KVXMLSetStyleSheet() function to set the path and filename of the
external style sheet. See “KVXMLSetStyleSheet()” on page 219.
To enable CSS mapping and use an existing CSS file:
1. Set eStyleSheetType to XML_CSS.
2. Set bUseExistingStyleSheet to TRUE to specify a pre-existing style sheet
for an XML document.
3. Specify the path and filename of the style sheet file in pszStyleSheet.
If bUseExistingStyleSheet is set to TRUE and pszStyleSheet or
SetExternalStyleFile is not specified, a CSS style sheet is created.
NOTE Cascading style sheets can only be used with word
processing documents.
Display Vector Graphics on UNIX and Linux
Export offers the option of rasterizing vector graphic content from source
documents into a variety of graphics formats including JPEG, PNG, WMF, and
CGM. This solution is implemented with Windows Graphical Device Interface
(GDI) code, and therefore is not portable to other platforms.
The output format of vector graphics is defined by the member
eOutputVectorGraphicType of the structure KVXMLOptions, and the options are
enumerated in KVXMLGraphicType in kvxml.h. See “KVXMLOptions” on
page 253 and “KVXMLGraphicType” on page 279.
To display vector graphics in presentation, word processing, and spreadsheet files
on UNIX and Linux, Export can convert the files directly to JPEG using a Java
program named kvraster.class. This program uses the Java Abstract
Windowing Toolkit (AWT). The AWT requires access to an X Server.
NOTE If you are using KeyView 10.5.0.0 or Java 1.6, you do not have to
set up an X Server; however, if you are using a version of KeyView lower
than 10.4 with a version of Java lower than 1.6, you must set up an X
Server.
XML Export SDK C Programming Guide
•
•
• 109
•
•
•
Chapter 4 Use the XML Export API
To set up an X Server, do one of the following
 Run a virtual X Server, such as the Xvfb utility. This utility is included in the
X11R6 distribution or can be downloaded from the following site:
http://www.x.org/Downloads.html
For example, to run the Xvfb utility on a 512 Mb, Solaris 2.8 platform, follow
these steps:
a. Start Xvfb at root:
/usr/X11R6/bin/Xvfb :1 -screen 0 1152x900x8 &
b. Set the display environment variable:
setenv DISPLAY:1.0

Make an X display available to the Java runtime using the DISPLAY
environment variable. No windows appear on the display. For example, set the
DISPLAY environment variable as follows:
setenv DISPLAY computername:0.0
or
setenv DISPLAY ipaddress:0.0
After the X Server is set up, the file can be converted.
To convert the file
1. Add the location of the JRE to the PATH environment variable.
2. Set eOutputVectorGraphicType to JPEG in the template file or directly in the
API.
3. Convert the document to XML. The graphics in the document are converted to
JPEG and stored in the output directory.
Convert Revision Tracking Information
The revision tracking feature in applications—such as Microsoft Word’s Track
Changes—marks changes to a document (typically, strikethrough for deleted text
and underline for inserted text) and tracks each change by reviewer name and
date.
If revision tracking was enabled when changes were made to a document, Export
can be configured to convert the deleted text and graphics and include revision
tracking information in the XML output. (The deleted content and revision tracking
information is excluded from the XML output by default.)
•
•
110 ••
•
•
XML Export SDK C Programming Guide
Convert Revision Tracking Information
Content that was added to the document is identified by <ins> tags. Content that
was deleted from the document is identified by <del> tags. The <ins> and <del>
tags include cite and datetime attributes which define the name of the reviewer
who made the change and the date the change was made respectively. (The date
is in ISO-8601 format: YYYY-MM-DDThh:mm:ss.) The tags also include a title
attribute which allows you to display the author and date information in a browser.
These elements are included in the verity.dtd.
The following markup is generated for inserted text:
<ins title=”Inserted: JohnD, 2006-04-24Tl4:47:00”
cite="mailto:JohnD" datetime="2006-04-24T14:47:00">This text was
added</ins> in a previous version.
The following markup is generated for deleted text:
<del title=”Deleted: JohnD, 2006-04-24Tl4:56:00”
cite="mailto:JohnD" datetime="2006-04-24T14:56:00">This text was
deleted</del> in a previous version.
To convert deleted text and graphics and include revision tracking
information
1. Call the fpInit() function. See “fpInit()” on page 199.
2. Call the fpXMLConfig() function with the following arguments (see
“KVXMLConfig()” on page 205):
Argument
Parameter
nType
KVCFG_INCLREVISIONMARK
nValue
TRUE (non-zero)
pData
NULL
For example:
(*fpXMLConfig)(pKVXML, KVCFG_INCLREVISIONMARK, TRUE, NULL);
The xmlini sample program demonstrates this function. See “xmlini” on
page 139.
3. Call the fpConvertStream() or KVXMLConvertFile() function. See
“fpConvertStream()” on page 186 or “KVXMLConvertFile()” on page 214.
XML Export SDK C Programming Guide
•
•
• 111
•
•
•
Chapter 4 Use the XML Export API
Convert PDF Files
Export has special configuration options that allow greater control over the
conversion of PDF files. These options can improve the accuracy of the XML
output.
Convert PDF Files to a Logical Reading Order
The PDF format is primarily designed for presentation and printing of brochures,
magazines, forms, reports, and other materials with complex visual designs. Most
PDF files do not contain the logical structure of the original document—the correct
reading order, for example, and the presence and meaning of significant elements
such as headers, footers, columns, tables, and so on.
KeyView can convert a PDF file by either using the file’s internal unstructured
paragraph flow, or by applying a structure to the paragraphs to reproduce the
logical reading order of the visual page. Logical reading order enables KeyView to
output PDF files containing languages that read from right-to-left (such as Hebrew
and Arabic) in the correct reading direction.
NOTE The algorithm used to reproduce the reading order of a PDF page
is based on common page layouts. The paragraph flow generated for PDFs
with unique or complex page designs may not emulate the original reading
order exactly.
For example, page design elements such as drop caps, callouts that cross
column boundaries, and significant changes in font size, may disrupt the
logical flow of the output text.
Logical Reading Order and Paragraph Direction
By default, KeyView produces an unstructured text stream for PDF files. This
means PDF paragraphs are extracted in the order in which they are stored in the
file, not the order in which they appear on the visual page. For example, a
three-column article could be output with the headers and the title at the end of
the output file, and the second column extracted before the first column. Although
this output does not represent a logical reading order, it accurately reflects the
internal structure of the PDF.
You can configure KeyView to produce a structured text stream that flows in a
specified direction. This means PDF paragraphs are extracted in the order (logical
reading order) and direction (left-to-right or right-to-left) in which they appear on
the page.
•
•
112 ••
•
•
XML Export SDK C Programming Guide
Convert PDF Files
The following paragraph direction options are available:
Paragraph
Direction Option
Description
Left-to-right
Paragraphs flow logically and read from left to right. This option
should be specified when most of your documents are in a
language using a left-to-right reading order, such as English or
German.
Right-to-left
Paragraphs flow logically and read from right to left. This option
should be specified when most of your documents are in a
language using a right-to-left reading order, such as Hebrew or
Arabic.
Dynamic
Paragraphs flow logically. The PDF reader determines the
paragraph direction for each PDF page, and then sets the direction
accordingly. When a paragraph direction is not specified, this option
is used.
NOTE Conversions may be slower when logical reading
order is enabled. For optimal speed, use an unstructured
paragraph flow.
The paragraph direction options control the direction of paragraphs on a page;
they do not control the text direction in a paragraph. For example, let us say a
PDF file contains English paragraphs in three columns that read from left to right,
but 80% of the second paragraph contains Hebrew characters. If the left-to-right
logical reading order is enabled, the paragraphs are ordered logically in the
output—title paragraph, then paragraph 1, 2, 3, and so on—and flow from the top
left of the first column to the bottom right of the third column. However, the text
direction of the second paragraph is determined independently of the page by the
PDF reader, and is output from right to left.
NOTE Extraction of metadata is not affected by the paragraph direction
setting. The characters and words in metadata fields are extracted in the
correct reading direction regardless of whether logical reading order is
enabled.
Enable Logical Reading Order
You can enable logical reading order using either the API or the formats_e.ini
file. Setting the direction in the API overrides the setting in the formats_e.ini
file.
XML Export SDK C Programming Guide
•
•
• 113
•
•
•
Chapter 4 Use the XML Export API
Use the C API
To enable PDF logical reading order in the C API
1. Call the fpInit() function. See “fpInit()” on page 199.
2. Call the fpXMLConfig() function with the following arguments (See
“KVXMLConfig()” on page 205):
Argument
Parameter
nType
KVCFG_LOGICALPDF
nValue
Set to one of the following flags which are defined in kvtypes.h. (see
“LPDF_DIRECTION” on page 290):
 LPDF_LTR—Logical reading order and left-to-right paragraph
direction.
 LPDF_RTL—Logical reading order and right-to-left paragraph
direction.
 LPDF_AUTO—Logical reading order. The PDF reader determines the
paragraph direction for each PDF page, and then sets the direction
accordingly. When a paragraph direction is not specified, this option is
used.
 LPDF_RAW—Unstructured paragraph flow. This is the default
behavior. If logical reading order is enabled, and you want to return to
an unstructured paragraph flow, set this flag.
pData
NULL
For example:
(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_RTL, NULL);
The cnv2xml sample program demonstrates this function. See “cnv2xml” on
page 136.
3. Call the fpConvertStream() or KVXMLConvertFile() function. See
“fpConvertStream()” on page 186 or “KVXMLConvertFile()” on page 214.
Use the formats_e.ini File
The formats_e.ini file is in the directory install\OS\bin, where install is
the pathname of the Export installation directory and OS is the name of the
operating system.
To enable logical reading order using the formats_e.ini file
1. Change the PDF reader entry in the [Formats] section of the formats_e.ini
file as follows:
•
•
114 ••
•
•
XML Export SDK C Programming Guide
Convert PDF Files
[Formats]
200=lpdf
2. Optionally, add the following section to the end of the formats_e.ini file:
[pdf_flags]
pdf_direction=paragraph_direction
where paragraph_direction is one of the following:
Flag
Description
LPDF_LTR
Left-to-right paragraph direction
LPDF_RTL
Right-to-left paragraph direction
LPDF_AUTO
The PDF reader determines the paragraph direction for each PDF
page, and then sets the direction accordingly. When a paragraph
direction is not specified, this option is used.
LPDF_RAW
Unstructured paragraph flow. This is the default behavior. If logical
reading order is enabled, and you want to return to an unstructured
paragraph flow, set this flag.
Control Hyphenation
There are two types of hyphens in a PDF document:
 A soft hyphen is added to a word by a word processor to divide the word
across two lines. This is a discretionary hyphen and is used to ensure proper
text flow in justified text.
 A hard hyphen is intentionally added to a word regardless of the word’s
position in the text flow. It is required by the rules of grammar and/or word
usage. For example, compound words, such as “three-week vacation” and
“self-confident,” contain hard hyphens.
By default, KeyView maintains the source document’s soft hyphens in the output
XML to more accurately represent the source document’s layout. However, if you
are using Export to generate text output for an indexing engine or are not
concerned with maintaining the document’s layout, it is recommended you remove
soft hyphens from the XML output. To remove soft hyphens, you must enable the
soft hyphen flag.
NOTE If the soft hyphen flag is enabled, every hyphen at the end of a line
is considered a soft hyphen and removed from the XML output. If a hard
hyphen appears at the end of a line, it will also be removed. This may result
in an intentionally hyphenated word being extracted without a hyphen.
XML Export SDK C Programming Guide
•
•
• 115
•
•
•
Chapter 4 Use the XML Export API
To remove soft hyphens from the XML output
1. Call the fpInit() function. See “fpInit()” on page 199.
2. Call the KVXMLConfig() function, with the following arguments (see
“KVXMLConfig()” on page 205):
Argument
Parameter
nType
KVCFG_DELSOFTHYPHEN
nValue
TRUE (non-zero)
pData
NULL
For example:
(*fpXMLConfig)(pKVXML, KVCFG_DELSOFTHYPHEN, TRUE, NULL);
3. Call the fpConvertStream() or KVXMLConvertFile() function. See
“fpConvertStream()” on page 186 or “KVXMLConvertFile()” on page 214.
Improve Performance for PDFs with Many Small Images
To improve performance when converting PDF files containing many small pixel
images, you can specify in the formats_e.ini file the minimum pixel height and
width for images that are converted to JPEG. If an image is smaller than the
minimum height and width, KeyView does not generate a JPEG file for the image.
For example, to specify that images 16 pixels in height and width and less are not
converted, you would add the following to the [pdf_flags] section of the
formats_e.ini:
[pdf_flags]
process_images_with_min_height=17
process_images_with_min_width=17
Extract Custom Metadata from PDF Files
To extract custom metadata from your PDF files, add the custom metadata names
to the pdfsr.ini file provided, and copy the modified file to the \bin directory.
You can then extract metadata as you normally would.
The pdfsr.ini is in the directory samples\pdfini, and has the following
structure:
<META>
<TOTAL>total_item_number</TOTAL>,
/metadata_tag_name datatype,
</META>
•
•
116 ••
•
•
XML Export SDK C Programming Guide
Convert Spreadsheet Files
Parameter
Description
total item number
The total number of metadata tags that are listed.
metadata_tag_name
The metadata tag name used in the PDF files.
datatype
The data type of the metadata field. Data types are
defined in KVSumInfoType. See “KVSumInfoType” on
page 285.
For example:
<META>
<TOTAL> 4 </TOTAL>
/part_number
INT4
/volume
INT4
/purchase_date
DATETIME
/customer
STRING
</META>
Convert Spreadsheet Files
Export has special configuration options that allow greater control over the
conversion of spreadsheet files.
Convert Hidden Text in Microsoft Excel Files
Normally, Export does not convert hidden text from a Microsoft Excel spreadsheet
because it is assumed the text should not be exposed. You can change this
default behavior, and convert text in hidden rows, columns, and sheets by adding
the following lines to the formats_e.ini file:
[Options]
gethiddeninfo=1
Convert Headers and Footers in Microsoft Excel 2003 Files
Normally, Export does not convert headers and footers from Microsoft Excel 2003
spreadsheets. You can change this default behavior and convert headers and
footers by adding the following lines to the formats_e.ini file:
[Options]
ShowHeaderFooter=1
XML Export SDK C Programming Guide
•
•
• 117
•
•
•
Chapter 4 Use the XML Export API
Specify Date and Time Format on UNIX Systems
System date and time format information is not stored in Microsoft Excel files. On
Windows systems, you can specify a locale setting to determine the date and time
format. However, on UNIX systems, the date and time format is set to the U.S.
short date format by default (mm/dd/yyyy). To change the format, you must use a
formats_e.ini option.
To specify the system date and time format on UNIX systems
 In the formats.ini file, set the SysDateTime option in the
[LocaleSetting] section. For example:
SysDateTime=%d/%m/%Y
In this example, dates and times are extracted in the following format:
28/02/2008
The format arguments are the same as those for the strftime() function.
Refer to the following Web page for more information.
http://linux.die.net/man/3/strftime
Extract Microsoft Excel Formulas
Normally, the actual value of a formula is extracted from an Excel spreadsheet;
the formula from which the value is derived is not included in the output. However,
KeyView enables you to include the value as well as the formula in the output. For
example, if Export is configured to extract the formula and the formula value, the
output may look like this:
245 = SUM(B21:B26)
The calculated value from the cell is 245 and the formula from which the value is
derived is SUM(B21:B26).
NOTE Depending on the complexity of the formulas,
enabling formula extraction may result in slightly slower
performance.
To set the extraction option for formulas, add the following lines to the
formats_e.ini file:
[Options]
getformulastring=option
•
•
118 ••
•
•
XML Export SDK C Programming Guide
Convert Spreadsheet Files
where option is one of the following:
Option
Description
0
Extract the formula value only. This is the default.
If formula extraction is enabled, and you want to return to the default, set
this option.
1
Extract the formula only.
2
Extract the formula and the formula value.
NOTE If a function in a formula is not supported or is invalid, and option 1
or 2 is specified, only the calculated value is extracted. See Table 11 for a
list of supported functions.
When formula extraction is enabled, Export can extract Microsoft Excel formulas
containing the functions listed in Table 11:
Table 11 Supported Microsoft Excel Functions
=ABS()
=ACOS()
=AND()
=AREAS()
=ASIN()
=ATAN2()
=ATAN2()
=AVERAGE()
=CELL()
=CHAR()
=CHOOSE()
=CLEAN()
=CODE()
=COLUMN()
=COLUMNS()
=CONCATENATE()
=COS()
=COUNT()
=COUNTA()
=DATE()
=DATEVALUE()
=DAVERAGE()
=DAY()
=DCOUNT()
=DDB()
=DMAX()
=DMIN()
=DOLLAR()
=DSTDEV()
=DSUM()
=DVAR()
=EXACT()
=EXP()
=FACT()
=FALSE()
=FIND()
=FIXED()
=FV()
=GROWTH()
=HLOOKUP()
=HOUR()
=ISBLANK()
=IF()
=INDEX()
=INDIRECT()
=INT()
=IPMT()
=IRR()
=ISERR()
=ISERROR()
=ISNA()
=ISNUMBER()
=ISREF()
=ISTEXT()
=LEFT()
=LEN()
XML Export SDK C Programming Guide
•
•
• 119
•
•
•
Chapter 4 Use the XML Export API
=LINEST()
=LN()
=LOG()
=LOG10()
=LOGEST()
=LOOKUP()
=LOWER()
=MATCH()
=MAX()
=MDETERM()
=MID()
=MIN()
=MINUTE()
=MINVERSE()
=MIRR()
=MMULT()
=MOD()
=MONTH()
=N()
=NA()
=NOT()
=NOW()
=NPER()
=NPV()
=OFFSET()
=OR()
=PI()
=PMT()
=PPMT()
=PRODUCT()
=PROPER()
=PV()
=RATE()
=REPLACE()
=REPT()
=RIGHT()
=ROUND()
=ROUND()
=ROW()
=ROWS()
=SEARCH()
=SECOND()
=SIGN()
=SIN()
=SLN()
=SQRT()
=STDEV()
=SUBSTITUTE()
=SUM()
=SYD()
=T()
=TAN()
=TEXT()
=TIME()
=TIMEVALUE()
=TODAY()
=TRANSPOSE()
=TREND()
=TRIM()
=TRUE()
=TYPE()
=UPPER()
=VALUE()
=VAR()
=VLOOKUP()
=WEEKDAY()
=YEAR()
Convert XML Files
Export enables you to extract all or selected content from source XML files (see
“Configure Element Extraction for XML Documents” on page 121). It detects the
following XML formats:

generic XML

Microsoft Office 2003 XML (Word, Excel, and Visio)

StarOffice/OpenOffice XML (text document, presentation, and spreadsheet)
See Appendix E for more information on format detection.
•
•
120 ••
•
•
XML Export SDK C Programming Guide
Convert XML Files
Configure Element Extraction for XML Documents
When converting XML files, you can specify which elements and attributes are
extracted according to the file’s format ID or root element. This is useful when you
want to extract only relevant text elements, such as abstracts from reports, or a list
of authors from an anthology.
A root element is an element in which all other elements are contained. In the
XML sample below, book is the root element:
<book>
<title>XML Introduction</title>
<product id="33-657" status="draft">XML Tutorial</product>
<chapter>Introduction to XML
<para>What is HTML</para>
<para>What is XML</para>
</chapter>
<chapter>XML Syntax
<para>Elements must have a closing tag</para>
<para>Elements must be properly nested</para>
</chapter>
</book>
For example, you could specify that when converting files with the root element
book, the element title is extracted as metadata, and only product elements
with a status attribute value of draft are extracted. When you extract an
element, the child elements within the element are also extracted. For example, if
you extract the element chapter from the sample above, the child element para
is also extracted.
Export defines default element extraction settings for the following XML formats:

generic XML

Microsoft Office 2003 XML (Word, Excel, and Visio)

StarOffice/OpenOffice XML (text document, presentation, and spreadsheet)
These settings are defined internally and are used when converting these file
formats; however, you can modify their values.
In addition to the default extraction settings, you can also add custom settings for
your own XML document types. If you do not define custom settings for your own
XML document types, the settings for the generic XML are used.
XML Export SDK C Programming Guide
•
•
• 121
•
•
•
Chapter 4 Use the XML Export API
Modify Element Extraction Settings
You can modify configuration settings for XML documents through either the API
or the kvxconfig.ini file.
NOTE You can only use customized element extraction
settings when converting files in process. When converting
out of process, the default extraction settings are used.
Use the C API
You can use the C API to modify the settings for the standard XML document
types or add configuration settings for your own XML document types.
To modify settings
1. Call the fpInit() function. See “fpInit()” on page 199.
2. Define the KVXConfigInfo data structure. See “KVXConfigInfo” on page 245.
3. Call the KVXMLConfig() function with the following arguments (see
“KVXMLConfig()” on page 205):
Argument
Parameter
nType
KVCFG_SETXMLCONFIGINFO
nValue
0
pData
address of the KVXConfigInfo structure
For example:
KVXConfigInfo xinfo; /* populate xinfo */
(*fpXMLConfig)(pKVXML, KVCFG_SETXMLCONFIGINFO, 0, &xinfo);
4. Repeat steps 2 and 3 until the settings for all the XML document types you
want to customize are defined.
5. Call the function fpConvertStream() or KVXMLConvertFile(). See
“fpConvertStream()” on page 186 or “KVXMLConvertFile()” on page 214.
Use an Initialization File
You can use the initialization file to modify the settings for the standard XML
document types or add configuration settings for your own XML document types.
To modify settings
1. Modify the kvxconfig.ini file.
•
•
122 ••
•
•
XML Export SDK C Programming Guide
Convert XML Files
2. Use the template file when processing the XML file. See “Modify Element
Extraction Settings in the kvxconfig.ini File” on page 123.
The sample program (xmlini) demonstrates how to use a template file during
the conversion process. See “xmlini” on page 139.
Modify Element Extraction Settings in the kvxconfig.ini File
The kvxconfig.ini file contains default element extraction settings for
supported XML formats. The file is in the directory install\OS\bin, where
install is the pathname of the Export installation directory and OS is the name
of the operating system. For example, the following entry defines extraction
settings for the Microsoft Visio 2003 XML format:
[config3]
eKVFormat=MS_Visio_XML_Fmt
szRoot=
szInMetaElement=DocumentProperties
szExMetaElement=PreviewPicture
szInContentElement=Text
szExContentElement=
szInAttribute=
The following options are available:
Configuration Option
Description
eKVFormat
The format ID as detected by the KeyView detection module.
This determines the file type to which these extraction
settings apply. See Appendix E for more information on
format ID values.
If you are adding configuration settings for a custom XML
document type, this is not defined.
szRoot
The file’s root element. When the format ID is not defined, the
root element is used to determine the file type to which these
settings apply.
To further qualify the element, specify its namespace. See
“Specify an Element’s Namespace and Attribute” on
page 125.
szInMetaElement
The elements extracted from the file as metadata. All other
elements are extracted as text.
Multiple entries must be separated by commas. To further
qualify the element, specify its namespace and/or attributes.
See “Specify an Element’s Namespace and Attribute” on
page 125.
XML Export SDK C Programming Guide
•
•
• 123
•
•
•
Chapter 4 Use the XML Export API
Configuration Option
Description
szExMetaElement
The child elements in the included metadata elements that
are not extracted from the file as metadata. For example, the
default extraction settings for the Visio XML format extracts
the DocumentProperties element as metadata. This
element includes child elements such as Title, Subject,
Author, Description, and so on. However, the child
element PreviewPicture is defined in
szExMetaElement because it is binary data and should not
be extracted.
You cannot exclude any metadata elements from the output for
StarOffice files. All metadata is extracted regardless of this
setting.
Multiple entries must be separated by commas. To further
qualify the element, specify its namespace and/or attributes.
See “Specify an Element’s Namespace and Attribute” on
page 125.
szInContentElement
The elements extracted from the file as content text. Enter an
asterisk (*) to extract all elements including child elements.
Multiple entries must be separated by commas. To further
qualify the element, specify its namespace and/or attributes.
See “Specify an Element’s Namespace and Attribute” on
page 125.
szExContentElement
The child elements in the included content elements that are
not extracted from the file as content text.
Multiple entries must be separated by commas. To further
qualify the element, specify its namespace and/or attributes.
See “Specify an Element’s Namespace and Attribute” on
page 125.
szInAttribute
The attribute values extracted from the file. If attributes are
not defined here, attribute values are not extracted.
Enter the namespace (if used), element name, and attribute
name in the following format:
namespace:[email protected]
For example:
Autonomy:[email protected]
Multiple entries must be separated by commas.
•
•
124 ••
•
•
XML Export SDK C Programming Guide
Convert XML Files
Specify an Element’s Namespace and Attribute
To further qualify an element, you can specify that the element exist in a certain
namespace and/or contain a specific attribute. To define the namespace and
attribute of an element, enter the following:
ns_prefix:[email protected]=attribvalue
Attribute values containing spaces must be enclosed in quotation marks.
For example, the following entry:
bg:[email protected]=xml
extracts a language element in the namespace bg that contains the attribute
name id with the value of “xml”. This entry extracts the following element from
an XML file:
<bg:language id="xml">XML is a simple, flexible text format
derived from SGML</bg:language>
but does not extract:
<bg:language id="sgml">SGML is a system for defining markup
languages.</bg:language>
or
<adv:language id="xml">The namespace should be a Uniform Resource
Identifier (URI).</adv:language>
Add Configuration Settings for Custom XML Document Types
You can define element extraction settings for custom XML document types by
adding the settings to the kvxconfig.ini file. For example, for files containing
the root element autonomyxml, we could add the following section to the end of
the initialization file:
[config101]
eKVFormat=
szRoot=autonomyxml
szInMetaElement=dc:title,dc:[email protected],dc:[email protected]=title
szExMetaElement=
szInContentElement=autonomy:[email protected]=dev,autonomy:[email protected]
ame=export,[email protected]="Heading 1"
szExContentElement=
szInAttribute=autonomy:[email protected]
The custom extraction settings must be preceded by a section heading named
[configN], where N is an integer starting at 100 and increasing by 1 for each
additional file type, as in[config100], [config101], [config102], and so on.
The default extraction settings for the supported XML formats are numbered
config0 to config99. Currently only 0 to 6 are used.
XML Export SDK C Programming Guide
•
•
• 125
•
•
•
Chapter 4 Use the XML Export API
Since a custom XML document type is not recognized by the KeyView detection
module, the format ID is not defined. The file type is identified by the file’s root
element only.
If a custom XML document type is not defined in the kvxconfig.ini file or by the
KVXMLConfig() function, then the default extraction settings for a generic XML
document are used.
Show Hidden Data
Microsoft Word, Excel, or PowerPoint documents contain hidden information,
some of which is shown by default when exported and some of which is hidden by
default. There are several options that allow you to determine exactly which types
of hidden data are exported.
Hidden Data in Microsoft Documents
You can show or display four types of hidden data from Microsoft Word, Excel,
and PowerPoint documents, each of which has a corresponding flag in the
KVXMLConfig() function, which you can toggle to determine whether the hidden
data is shown or not. Table 12 lists each data type, its default behavior, and its
corresponding configuration API flag.
Table 12 Hidden data settings
Hidden Data Type
Default Behavior
Configuration API Flag
Commentsa
Shownb
KVCFG_WP_NOCOMMENTS
Hidden text
Hidden
KVCFG_WP_SHOWHIDDENTEXT
Date field codes
Calculated date
KVCFG_WP_SHOWDATEFIELDCODE
File name field codes
Document file name
KVCFG_WP_SHOWFILENAMEFIELDCODE
Hidden information
Hidden
KVCFG_SS_SHOWHIDDENINFOR
Comments
Hidden
KVCFG_SS_SHOWCOMMENTS
Formulas
Calculated value
KVCFG_SS_SHOWFORMULA
Shown
KVCFG_PG_HIDEHIDDENSLIDE
Microsoft Word
Microsoft Excel
Microsoft PowerPoint
Hidden slides
•
•
126 ••
•
•
XML Export SDK C Programming Guide
Show Hidden Data
Table 12 Hidden data settings
Hidden Data Type
Default Behavior
Configuration API Flag
Comments
Shownc
KVCFG_PG_HIDECOMMENT
Comments slide
Hidden
KVCFG_PG_SHOWCOMMENTSSLIDEd
Slide notese
Hidden
KVCFG_PG_SHOWSLIDENOTES
a. Word comment settings can also be toggled with a configuration parameter in the formats_e.ini file.
See “Toggle Word Comment Settings in the formats_e.ini File” on page 127.
b. Shown by default in Microsoft Word 97 to 2003 documents.
c. Shown by default in Microsoft PowerPoint 97 to 2000 documents.
d. This setting affects PowerPoint 2003 and 2007 only.
e. PowerPoint slide note settings can also be toggled with a configuration parameter in the
formats_e.ini file. See “Toggle PowerPoint Slide Note Settings in the formats_e.ini File” on
page 128.
To toggle the display of any type of hidden data
 Use the configuration API and set the third parameter to TRUE or FALSE:
(*fpXMLConfig)(pKVXML, KVCFG_WP_NOCOMMENTS, TRUE, NULL)
In this example, comments will not be exported from Word documents.
NOTE The third parameter affects the default behavior.
To change the default behavior, set it to TRUE.
For more information, see “KVXMLConfig()” on page 205.
Toggle Word Comment Settings in the formats_e.ini File
Microsoft Word 97 to 2003 comment settings can also be controlled through a
parameter in the formats_e.ini file.
The formats_e.ini file is in the directory install\OS\bin, where install
is the pathname of the Export installation directory and OS is the name of the
operating system.
To toggle comment output in formats_e.ini
1. Open the formats_e.ini file in a text editor.
2. Under [Options], add the WP_NOCOMMENTS parameter and set it to 0 to
show comments or 1 to hide comments. For example:
[Options]
XML Export SDK C Programming Guide
•
•
• 127
•
•
•
Chapter 4 Use the XML Export API
WP_NOCOMMENTS=1
NOTE The configuration API flag
KVCFG_WP_NOCOMMENTS overrides the setting in
formats_e.ini.
Toggle PowerPoint Slide Note Settings in the formats_e.ini File
Microsoft PowerPoint slide note settings can also be controlled through a
parameter in the formats_e.ini file.
The formats_e.ini file is in the directory install\OS\bin, where install
is the pathname of the Export installation directory and OS is the name of the
operating system.
To toggle slide note output in formats_e.ini
1. Open the formats_e.ini file in a text editor.
2. Under [Options], add the ShowSlideNotes parameter and set it to 1 to
show slide notes or 0 to hide slide notes. For example:
[Options]
ShowSlideNotes=1
NOTE The configuration API flag
KVCFG_PG_SHOWSLIDENOTES overrides the setting in
formats_e.ini.
•
•
128 ••
•
•
XML Export SDK C Programming Guide
Show Hidden Data
Show Hidden Data
Microsoft Word, Excel, and PowerPoint documents contain hidden information,
some of which is shown by default when exported and some of which is hidden by
default. There are several options that allow you to determine which types of
hidden data are shown.
Hidden Data in Microsoft Documents
You can show several types of hidden data from Microsoft Word, Excel, and
PowerPoint documents, each of which has a corresponding flag in the
KVXMLConfig() function, which you can toggle to determine whether the hidden
data is shown or not. Table 12 lists each data type, its default behavior, and its
corresponding configuration API flag.
Table 13 Hidden data settings
Hidden Data Type
Default Behavior
Configuration API Flag
Commentsa
Shownb
KVCFG_WP_NOCOMMENTS
Hidden text
Hidden
KVCFG_WP_SHOWHIDDENTEXT
Date field codes
Calculated date
KVCFG_WP_SHOWDATEFIELDCODE
File name field codes
Document file name
KVCFG_WP_SHOWFILENAMEFIELDCODE
Hidden information
Hidden
KVCFG_SS_SHOWHIDDENINFOR
Comments
Hidden
KVCFG_SS_SHOWCOMMENTS
Formulas
Calculated value
KVCFG_SS_SHOWFORMULA
Hidden slides
Shown
KVCFG_PG_HIDEHIDDENSLIDE
Comments
Shownc
KVCFG_PG_HIDECOMMENT
Comments slide
Hidden
KVCFG_PG_SHOWCOMMENTSSLIDEd
Slide notese
Hidden
KVCFG_PG_SHOWSLIDENOTES
Microsoft Word
Microsoft Excel
Microsoft PowerPoint
a. Word comment settings can also be toggled with a configuration parameter in the formats_e.ini file.
See “Toggle Word Comment Settings in the formats_e.ini File” on page 127.
b. Shown by default in Microsoft Word 97 to 2003 documents.
c. Shown by default in Microsoft PowerPoint 97 to 2000 documents.
XML Export SDK C Programming Guide
•
•
• 129
•
•
•
Chapter 4 Use the XML Export API
d. This setting affects PowerPoint 2003 and 2007 only.
e. PowerPoint slide note settings can also be toggled with a configuration parameter in the formats_e.ini
file. See “Toggle PowerPoint Slide Note Settings in the formats_e.ini File” on page 128.
To toggle the display of any type of hidden data
 Use the configuration API and set the third parameter to TRUE or FALSE:
(*fpHTMLConfig)(pKVHTML, KVCFG_WP_NOCOMMENTS, TRUE, NULL)
In this example, comments will not be exported from Word documents.
NOTE The third parameter affects the default behavior.
To change the default behavior, set it to TRUE.
For more information, see “KVXMLConfig()” on page 205.
Toggle Word Comment Settings in the formats_e.ini File
Microsoft Word 97 to 2003 comment settings can also be controlled through a
parameter in the formats_e.ini file.
The formats_e.ini file is in the directory install\OS\bin, where install
is the pathname of the Export installation directory and OS is the name of the
operating system.
To toggle comment output in formats_e.ini
1. Open the formats_e.ini file in a text editor.
2. Under [Options], add the WP_NOCOMMENTS parameter and set it to 0 to
show comments or 1 to hide comments. For example:
[Options]
WP_NOCOMMENTS=1
NOTE The configuration API flag
KVCFG_WP_NOCOMMENTS overrides the setting in
formats_e.ini.
Toggle PowerPoint Slide Note Settings in the formats_e.ini File
Microsoft PowerPoint slide note settings can also be controlled through a
parameter in the formats_e.ini file.
The formats_e.ini file is in the directory install\OS\bin, where install
is the pathname of the Export installation directory and OS is the name of the
operating system.
•
•
130 ••
•
•
XML Export SDK C Programming Guide
Show Hidden Data
To toggle slide note output in formats_e.ini
1. Open the formats_e.ini file in a text editor.
2. Under [Options], add the ShowSlideNotes parameter and set it to 1 to
show slide notes or 0 to hide slide notes. For example:
[Options]
ShowSlideNotes=1
NOTE The configuration API flag
KVCFG_PG_SHOWSLIDENOTES overrides the setting in
formats_e.ini.
XML Export SDK C Programming Guide
•
•
• 131
•
•
•
Chapter 4 Use the XML Export API
•
•
132 ••
•
•
XML Export SDK C Programming Guide
CHAPTER 5
Sample Programs
This section describes the sample programs provided with XML Export. It contains
the following topics:

Introduction

tstxtract

cnv2xml

cnv2xmloop

metadata

xmlindex

xmlini

xmlcallback

xmlonefile

xmlmulti

Export Demo
Introduction
The sample programs demonstrate how to use the C and Visual Basic
implementations of XML Export. The sample code is intended to provide a starting
point for your own applications or to be used for reference purposes.
XML Export SDK C Programming Guide
•
•
• 133
•
•
•
Chapter 5 Sample Programs
The source code and makefile for each program are in the directory
install\xmlexport\programs\program_name
where install is the pathname of the Export installation directory, and
program_name is the name of the sample program.
C Sample Programs
The C sample programs demonstrate how to use the C implementation of XML
Export. The sample code is intended to provide a starting point for your own
applications or to be used for reference purposes.
The following C sample programs are provided:

tstxtract

cnv2xml

cnv2xmloop

metadata

xmlindex

xmlini

xmlcallback

xmlonefile

xmlmulti
The source code and makefile for each program are in the directory
install\xmlexport\programs\program_name
where install is the pathname of the Export installation directory, and
program_name is the name of the sample program.
NOTE The sample programs do not parse white space in filenames. If your
filenames contain spaces, use quotation marks around the entire path
name. Inserting quotation marks around the filename only does not work.
To compile the C sample programs, use the makefiles provided in the sample
programs directories. Ensure the XML Export include directory is specified in the
include path of the project. Once the executables are compiled and built, they
must be placed in the same directory as the XML Export libraries.
•
•
134 ••
•
•
XML Export SDK C Programming Guide
tstxtract
Compile the Visual Basic Sample Program
To compile Export Demo, use the Visual Studio project file (demo_vb.vbp) in the
directory install\xmlexport\programs\ExportDemo, where install is the
pathname of the Export installation directory.
tstxtract
The tstxtract sample program demonstrates the File Extraction API. It opens
a file, extracts sub files from the file, and repeats the extraction process until all
sub files are extracted. It also demonstrates how to extract the default set of
metadata and pass integer or string names to extract specific metadata. After the
files are extracted, you can convert the files using one of the conversion sample
programs.
The source code for the tstxtract sample program is the same for the Filter
and Export SDKs. A flag in the makefile specifies whether the program is
compiled for Filter, HTML Export, or XML Export.
To run tstxtract, type the following command line:
tstxtract [options] input_file output_directory bin_directory
where options is one or more of the following:
Option
Description
-c charset
Specify the target character set, for example KVCS_SJIS. See
“Coded Character Sets” on page 341 for a full list of supported
character sets.
-cf keyfile1,
keyfile2,...
Specify one or more credential files (private keys) to use to
decrypt encrypted .EML, .MBX, .PST, or .MSG files.
-l logfile
Specify the path and filename of the log file in which metadata is
written.
-lm
Retrieve metadata and write the data to the log file.
-lms
metaname1,
metaname2,...
Retrieve metadata with string metanames and write the data to
the log file for .MSG, .EML, .MBX, and .NSF files.
-lmi metaint1,
metaint2,...
Retrieve metadata with integer (hexadecimal) metanames and
write the data to the log file for .PST files.
-lma
Retrieve all metadata from an .NSF file and write the data to the
log file.
XML Export SDK C Programming Guide
•
•
• 135
•
•
•
Chapter 5 Sample Programs
Option
Description
-r
Recursively extract second-level subfiles to the specified output
directory. For example, if a .ZIP file contains a Microsoft Word file
and the Word file contains an embedded Microsoft Excel file, set
the -r option to extract both the Word and Excel files.
If this option is not set, only first-level subfiles are extracted. For
the example above, only the Word file would be extracted.
-msg
Extract mail messages in a .PST file as an .MSG file, including all
of its attachments. If this flag is not set, the mail message is
extracted as text. This applies to PST files on Windows only.
-f
Extract the formatted version of the message body (HTML or
RTF) from mail files when possible. If neither an HTML nor RTF
version of the message body exists in the mail file, then it is
extracted as plain text. If this flag is not set, the message body is
extracted as plain text when possible.
-t
Preserve the timestamp of embedded files when possible.
-h
Extract hidden text.
input_file is the full path and filename of the source document.
output_directory is the directory to which the files will be extracted.
bin_directory is the path to the Export bin directory. This is required if you do
not run the program from the install\Export SDK\bin directory.
cnv2xml
The cnv2xml sample program creates a single, formatted XML output file. It is
called by the Export Demo sample program, but can also be used on its own. This
program runs on both Windows and UNIX platforms.
To run cnv2xml, type the following command line:
cnv2xml [options] inputfile outputfile
where,
options is one or more of the options listed in Table 14.
inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
•
•
136 ••
•
•
XML Export SDK C Programming Guide
cnv2xmloop
The following options are available:
Table 14 Options for the cnv2xml Sample Program
Option
Description
-c KVCFG_SUPPRESSIMAGES
Specifies that XML output includes verbose markup, but no
images. If this option is not set, then embedded images in a
document are regenerated as separate files and stored in
the output directory. See “KVXMLConfig()” on page 205.
-c KVCFG_ENABLEPOSITIONINFO
Specifies that a position element is included in the markup
for PDF documents. The position element defines the
absolute position of the text relative to the bottom left corner
of the page, and includes additional information such as font
and color. See “KVXMLConfig()” on page 205.
-c KVCFG_DELSOFTHYPHEN
Specifies that soft hyphens in PDF files are deleted from the
converted output. See “Control Hyphenation” on page 115.
-pdfltr
Specifies that PDF files are output in a logical reading order,
and the paragraph direction is left to right. See “Convert PDF
Files to a Logical Reading Order” on page 112.
-pdfrtl
Specifies that PDF files are output in a logical reading order,
and the paragraph direction is right to left. See “Convert PDF
Files to a Logical Reading Order” on page 112.
-pdfauto
Specifies that PDF files are output in a logical reading order.
The PDF reader determines the paragraph direction
(left-to-right or right-to-left) for each PDF page, and then sets
the direction accordingly. See “Convert PDF Files to a
Logical Reading Order” on page 112.
-pdfraw
Specifies that PDF files are output in an unstructured
paragraph flow. This is the default. If logical reading order is
enabled, and you want to return to an unstructured
paragraph flow, set this flag. See “Convert PDF Files to a
Logical Reading Order” on page 112.
cnv2xmloop
The cnv2xmloop sample program creates a single, formatted XML output file, but
unlike cnv2xml, it converts the file out of process. See “Convert Files Out of
Process” on page 43 for more information on out of process conversions. This
program runs on both Windows and UNIX platforms.
To run cnv2xmloop, type the following command line:
cnv2xmloop [options] inputfile outputfile
XML Export SDK C Programming Guide
•
•
• 137
•
•
•
Chapter 5 Sample Programs
where,
options is one or more of the options listed in Table 15.
inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the XML output file.
The following options are available:
Table 15 Options for the cnv2xmloop Sample Program
Option
Description
-c KVCFG_SUPPRESSIMAGES
Specifies that XML output includes verbose
markup, but no images. If this option is not set, then
embedded images in a document are regenerated
as separate files and stored in the output directory.
See “KVXMLConfig()” on page 205.
-c KVCFG_ENABLEPOSITIONINFO
Specifies that a position element is included in the
markup for PDF documents. The position element
defines the absolute position of the text relative to
the bottom left corner of the page, and includes
additional information such as font and color. See
“KVXMLConfig()” on page 205.
metadata
The metadata sample program converts a source document into a single XML file
that only contains the document metadata (Author, Subject, Title, and so on). This
program runs on both Windows and UNIX platforms.
To run metadata, type the following command line:
metadata inputfile outputfile
where,
inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
xmlindex
The xmlindex sample program produces stripped-down XML output suitable for
use with indexing engines. It converts a source document into a single, largely
unformatted XML file. This program runs on both Windows and UNIX platforms.
•
•
138 ••
•
•
XML Export SDK C Programming Guide
xmlini
To run index, type the following command line:
xmlindex inputfile outputfile
where,
inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
xmlini
The xmlini sample program is used in conjunction with template files to produce
well-formed XML documents. For more information, see “Set Conversion Options
Using the Template Files” on page 53. Sample template files are in the directory
programs\ini. This program runs on both Windows and UNIX platforms.
To run xmlini, type the following command line:
xmlini [options] inifile inputfile outputfile
where,
options is one or more of the options listed in Table 16.
inifile is the full path and filename of a template file.
inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
XML Export SDK C Programming Guide
•
•
• 139
•
•
•
Chapter 5 Sample Programs
The following options are available:
Table 16 Options for the xmlini Sample Program
Option
Description
-s stylesheetfile
Reads style sheet information from an existing style
sheet file, or writes the information to an external CSS
file. See “Use Style Sheets with xmlini” on page 140.
-x xmlconfig_filename
Converts an XML file using customized element
extraction settings defined in the kvxconfig.ini file. If
you do not enter the full path to the template file, the
program looks for the file in the current working directory
(install\OS\bin, where install is the pathname of
the Export installation directory and OS is the name of the
operating system). See “Convert XML Files” on
page 120.
-rm
If this is set, text and graphics that were deleted from a
document with a revision tracking feature enabled are
converted and revision tracking information is included in
the XML output. See “Convert Revision Tracking
Information” on page 110.
-oop
Runs the conversion out of process.
-fl
Prints a list of converted files in the console.
If the XML file is output to a directory other than the directory programs\tempout,
you must update the XML markup so that, the browser can find images used by
the template (such as backgrounds or corporate logos) and the style sheet. The
markup contains relative references to the image files (..\images).
Use Style Sheets with xmlini
The xmlini sample program provides an option that allows XML Export to read
Cascading Style Sheet (CSS), or Extensible Stylesheet Language (XSL) style
sheet information from an existing style sheet file, or to write CSS information to
an external CSS file. If the CSS does not exist, it is created. The style sheet name
is referenced in the output XML, for example:
<?xml-stylesheet href="c:\mystyle.css" type="text/css"?>
This type of conversion makes the XML output document significantly smaller and
allows you to use the same style sheet for many conversions.
•
•
140 ••
•
•
XML Export SDK C Programming Guide
xmlcallback
To apply an existing style sheet to a conversion using the xmlini sample
program
1. In the template file, set eStyleSheetType to either XML_CSS or XML_XSL. This
specifies that the formatting data is stored in either a CSS, or an XSL style
sheet.
2. At the command prompt, type:
xmlini -s stylesheetname inifile inputfile outputfile
where stylesheetname is the path and filename of the CSS or XSL file.
xmlcallback
The xmlcallback sample program demonstrates how you can control the
conversion to generate specialized output while it is in progress. The program
employs developer-defined callbacks and memory management functions during
conversion. This program runs on Windows platforms only.
To run xmlcallback, type the following command line:
xmlcallback inputfile outputfile
where,
inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
xmlonefile
The xmlonefile sample program converts a source document into a single,
formatted XML file. This program runs on Windows platforms only.
To run xmlonefile, type the following command line:
xmlonefile inputfile outputfile
where,
inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
XML Export SDK C Programming Guide
•
•
• 141
•
•
•
Chapter 5 Sample Programs
xmlmulti
The xmlmulti sample program creates multiple XML files from a source
document. The main file contains the table of contents. Each H1 heading is
contained within its own file. The main file contains hyperlinks to each H1 block;
each H1 file contains navigation to the table of contents, as well as to the previous
and next blocks. This program runs on Windows platforms only.
To run multi, type the following command line:
xmlmulti inputfile outputfile
where,
inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
Export Demo
Export Demo is a Visual Basic program that provides an easy-to-use graphical
user interface to the KeyView Export technology. It allows you to select files,
convert them to XML, and view the result in a browser object. The output options
that control the look of the output files are pre-defined in Export Demo and cannot
be changed in the user interface.
Export Demo accesses the Export functionality by returning to the operating
system and running a C program named cnv2xml. To adapt the sample program
to your needs, modify the GUI using Visual Basic, and the cnv2xml program
using C. See “cnv2xml” on page 136.
To launch Export Demo, select Export Demo from Start | Programs | Autonomy
| Export SDK | XML Export.
The source code for the program is in the directory install\xmlexport\
programs\ExportDemo, where install is the pathname of the Export
installation directory. Export Demo is for Windows only.
See “Use the Export Demo Program” on page 56 for more information.
•
•
142 ••
•
•
XML Export SDK C Programming Guide
PART 3
C API Reference
This section provides detailed reference information for the
C-language implementation of the File Extraction and Export
APIs. It includes the following chapters:

File Extraction API Functions

File Extraction API Structures

XML Export API Functions

XML Export API Callback Functions

XML Export API Structures

Enumerated Types
Part 3 C API Reference
•
•
144 ••
•
•
XML Export SDK C Programming Guide
CHAPTER 6
File Extraction API Functions
This section describes the functions in the File Extraction API. The File Extraction
functions open a container file, and extract the container’s sub files so that the sub
files are exposed and available for conversion. Sub files may be files within a Zip
archive, messages in a mail store, attachments in a mail message, or OLE objects
embedded in a compound document. See “Sub File Extraction” on page 51 for
more information.
Each function appears as a function prototype followed by a description of its
arguments, its return value, and a discussion of its use. This section contains the
following topics.

KVGetExtractInterface()

fpCloseFile()

fpExtractSubFile()

fpFreeStruct()

fpGetMainFileInfo()

fpGetSubFileInfo()

fpGetSubFileMetaData()

fpOpenFile()
XML Export SDK C Programming Guide
•
•
• 145
•
•
•
Chapter 6 File Extraction API Functions
KVGetExtractInterface()
This function is the entry point to obtain the file extraction functions. It supplies
pointers to the file extraction functions, and in the case of out-of-process mode
starts the kvoop.exe server and initializes out-of-process extraction services.
When KVGetExtractInterface() is called, it assigns the function pointers in
the structure KVExtractInterface to the functions described in this section.
Syntax
int pascal KVGetExtractInterface (
void
*pContext,
KVExtractInterface
pIextract);
Arguments
pContext
Pointer returned from fpInit().
pIextract
Pointer to the structure KVExtractInterface, which contains
function pointers that KVGetExtractInterface() assigns to all
other file extraction functions. See “KVExtractInterface” on page 162.
Before initializing the KVExtractInterface structure, use the
macro KVStructInit to initialize the KVStructHead structure. See
“KVStructHead” on page 240.
Returns

If the call is successful, the return value is KVERR_Success.

If the call is not successful, the return value is an error code.
Example
fpKVGetExtractInterface =
(int (pascal *)( void *, KVExtractInterface))myGetProcAddress(hKVExport,
(char*)"KVGetExtractInterface");
/*Initialize file extraction interface structure using KVStructInit*/
KVStructInit(&extractInterface);
/* Retrieve file extraction interface */
error = (*fpKVGetExtractInterface)(pExport,&extractInterface))
•
•
146 ••
•
•
XML Export SDK C Programming Guide
fpCloseFile()
fpCloseFile()
This function frees the memory allocated by fpOpenFile() and closes the file.
See “fpOpenFile()” on page 157.
Syntax
int (pascal *fpCloseFile) (void *pFile);
Arguments
pFile
Identifier of the file. This is a file handle returned from fpOpenFile().
See “fpOpenFile()” on page 157.
Returns

If the file is closed, the return value is KVERR_Success.

If the file is not closed, the return value is an error code.
Example
extractInterface->fpCloseFile(pFile);
pFile = NULL;
XML Export SDK C Programming Guide
•
•
• 147
•
•
•
Chapter 6 File Extraction API Functions
fpExtractSubFile()
This function extracts a sub file from a container file to a user-defined path or
output stream. This call returns file format information when file is extracted to a
path.
Syntax
int (pascal *fpExtractSubFile)
void
KVExtractSubFileArg
KVSubFileExtractInfo
(
*pFile,
extractArg,
*extractInfo);
Arguments
pFile
Identifier of the file. This is a file handle returned from
fpOpenFile(). See “fpOpenFile()” on page 157.
extractArg
Pointer to the structure KVExtractSubFileArg, which defines the
sub file to be extracted. See “KVExtractSubFileArg” on page 163.
Before initializing the KVExtractSubFileArg structure, use the
macro KVStructInit to initialize the KVStructHead structure. See
“KVStructHead” on page 240.
extractInfo
Pointer to the structure KVSubFileExtractInfo, which defines
information about the extracted sub file. See “KVSubFileExtractInfo”
on page 176.
Returns

If the sub file is extracted from the container file, the return value is
KVERR_Success.

If the sub file is not extracted from the container file, the return value is an
error code.
Discussion
 After the file is extracted, call fpFreeStruct() to free the memory allocated
by this function. See “fpFreeStruct()” on page 150.

•
•
148 ••
•
•
If the sub file is embedded in the main file as a link and is stored externally,
extractInfo->infoFlag is set to
KVSubFileExtractInfoFlag_External. For example, the sub file may
be an object that was embedded in a Word document using “Link to File,” or
XML Export SDK C Programming Guide
fpExtractSubFile()
an attachment that is referenced in an MBX message. This type of sub file
cannot be extracted. You must write code to access the sub file based on the
path in the member extractInfo->filePath or
extractInfo->fileName. See “KVSubFileExtractInfo” on page 176.
Example
KVSubFileExtractInfo
extractInfo = NULL;
KVStructInit(&extractArg);
extractArg.index = index;
extractArg.extractionFlag = KVExtractionFlag_CreateDir |
KVExtractionFlag_Overwrite;
extractArg.filePath = subFileInfo->subFileName;
/*Extract this sub file*/
error=extractInterface->fpExtractSubFile(pFile,&extractArg,&extractInfo);
if ( error )
{
extractInterface->fpFreeStruct(pFile,extractInfo);
subFileInfo = NULL;
}
XML Export SDK C Programming Guide
•
•
• 149
•
•
•
Chapter 6 File Extraction API Functions
fpFreeStruct()
This function frees the memory allocated by fpGetMainFileInfo(),
fpGetSubFileInfo(), fpGetSubFileMetadata(), and
fpExtractSubFile().
Syntax
int (pascal *fpFreeStruct) (
void
*pFile,
void
*obj);
Arguments
pFile
Identifier of the file. This is a file handle returned from fpOpenFile().
See “fpOpenFile()” on page 157.
obj
Pointer to the result object returned by fpGetMainFileInfo(),
fpGetSubFileInfo(), fpGetSubFileMetaData, or
fpExtractSubFile().
Returns

If the allocated memory is freed, the return value is KVERR_Success.

Otherwise, the return value is an error code.
Example
The example below frees the memory allocated by fpGetSubFileInfo():
if ( subFileInfo )
{
extractInterface->fpFreeStruct(pFile,subFileInfo);
subFileInfo = NULL;
}
•
•
150 ••
•
•
XML Export SDK C Programming Guide
fpGetMainFileInfo()
fpGetMainFileInfo()
This function determines whether a file is a container file—that is, whether it
contains sub files—and should be extracted further.
Syntax
int (pascal *fpGetMainFileInfo) (
void
*pFile,
KVMainFileInfo
*fileInfo);
Arguments
pFile
Identifier of the file. This is a file handle returned from fpOpenFile().
See “fpOpenFile()” on page 157.
fileInfo
Pointer to the structure KVMainFileInfo. This structure contains
information about the file. See “KVMainFileInfo” on page 169.
Returns

If the file information is retrieved, the return value is KVERR_Success.

If the file information is not retrieved, the return value is an error code.
Discussion
 After the file information is retrieved, call fpFreeStruct() to free the
memory allocated by this function. See “fpFreeStruct()” on page 150.

If the file is a container (fileInfo->numSubFiles is non-zero), call
fpGetSubFileInfo() and fpExtractSubFile() for each sub file. See
“fpGetSubFileInfo()” on page 153 and “fpExtractSubFile()” on page 148.

If the file is not a container (fileInfo->numSubFiles is 0) and contains
text (fileInfo->infoFlag is set to
KVMainFileInfoFlag_HasContent), pass the file directly to the
conversion functions. See “XML Export API Functions” on page 183.
Example
KVMainFileInfo
fileInfo
= NULL;
if( (error=extractInterface->fpGetMainFileInfo(pFile,&fileInfo)))
{
XML Export SDK C Programming Guide
•
•
• 151
•
•
•
Chapter 6 File Extraction API Functions
/* Free result object allocated in fileInfo */
extractInterface->fpFreeStruct(pFile,fileInfo);
fileInfo = NULL;
}
•
•
152 ••
•
•
XML Export SDK C Programming Guide
fpGetSubFileInfo()
fpGetSubFileInfo()
This function gets information about a sub file in a container file.
Syntax
int (pascal *fpGetSubFileInfo) (
void
*pFile,
int
index,
KVSubFileInfo
*subFileInfo);
Arguments
pFile
Identifier of the main file. This is a file handle returned from
fpOpenFile(). See “fpOpenFile()” on page 157.
index
The index number of the sub file for which information will be
retrieved.
subFileInfo
Pointer to the structure KVSubFileInfo, which defines information
about the sub file. See “KVSubFileInfo” on page 178.
Returns

If the file information is retrieved, the return value is KVERR_Success.

If the file information is not retrieved, the return value is an error code.
Discussion
 After the sub file information is retrieved, call fpFreeStruct() to free the
memory allocated by this function. See “fpFreeStruct()” on page 150.

If the root node is not enabled, the first sub file is index 0. If the root node is
enabled, the first sub file is index 1. The root node is required to recreate a
file’s hierarchy. See “Create a Root Node” on page 70.

The members subFileInfo->parentIndex and
subFileInfo->childArray enable you to recreate a file’s hierarchy. Since
childArray only retrieves the first-level children in the sub file, you must call
fpGetSubFileInfo() repeatedly until information for the leaf-node children
is extracted. See “Recreate a File’s Hierarchy” on page 70.

If the sub file is embedded in the main file as a link and is stored externally,
subFileInfo->infoFlag is set to KVSubFileInfoFlag_External. For
example, the sub file may be an object that was embedded in a Word
XML Export SDK C Programming Guide
•
•
• 153
•
•
•
Chapter 6 File Extraction API Functions
document using “Link to File,” or an attachment that is referenced in an MBX
message. This type of sub file cannot be extracted. You must write code to
access the sub file based on the path in the member
subFileInfo->subFileName. See “KVSubFileInfo” on page 178.
The KVSubFileInfoFlag_External flag will not be set for an OLE object
that is embedded as a link in a Microsoft PowerPoint file. KeyView can only
detect linked objects in a Microsoft PowerPoint file when the object is
extracted. See “fpExtractSubFile()” on page 148.
Example
KVSubFileInfo
subFileInfo = NULL;
for ( index = 0; index < fileInfo->numSubFiles; index++)
{
error=extractInterface->fpGetSubFileInfo(pFile,index,&subFileInfo);
if ( error )
{
extractInterface->fpFreeStruct(pFile,subFileInfo);
subFileInfo = NULL;
}
•
•
154 ••
•
•
XML Export SDK C Programming Guide
fpGetSubFileMetaData()
fpGetSubFileMetaData()
This function extracts metadata from mail stores, mail messages, and non-mail
items in an NSF file. See “Extract Mail Metadata” on page 72.
Syntax
int (pascal *fpGetSubFileMetaData) (
void
*pFile,
KVGetSubFileMetaArg
metaArg,
KVSubFileMetaData
*metaData);
Arguments
pFile
Identifier of the file. This is a file handle returned from fpOpenFile().
See “fpOpenFile()” on page 157.
metaArg
Pointer to the structure KVGetSubFileMetaArg, which defines
metadata tags whose values are retrieved. See
“KVGetSubFileMetaArg” on page 167.
Before initializing the KVGetSubFileMetaArg structure, use the
macro KVStructInit to initialize the KVStructHead structure. See
“KVStructHead” on page 240.
metaData
Pointer to the structure KVSubFileMetaData, which contains the
retrieved metadata values. See “KVSubFileMetaData” on page 181.
Returns

If the metadata is retrieved, the return value is KVERR_Success.

If the metadata is not retrieved, the return value is an error code.

When you pass in 0 for metaArg->metaNameCount, and NULL for
metaArg->metaNameArray, a set of default metadata is retrieved. See
“Extract Mail Metadata” on page 72.
Discussion
 After the metadata is retrieved, call fpFreeStruct() to free the memory
allocated by this function. See “fpFreeStruct()” on page 150.

If a field is repeated in an EML or MBX mail header, the values in each
instance of the field are concatenated and returned as one field. The values
are separated by five pound signs (#####) delimiter.
XML Export SDK C Programming Guide
•
•
• 155
•
•
•
Chapter 6 File Extraction API Functions
Example
KVSubFileMetaData
metaData = NULL;
KVStructInit(&metaArg);
/* retrieve all the default metadata elements */
metaArg.metaNameCount = 0;
metaArg.metaNameArray = NULL;
metaArg.index = Index;
error =
extractInterface->fpGetSubFileMetaData(pFile,&metaArg,&metaData);
...
extractInterface->fpFreeStruct(pFile,metaData);
metaData = NULL;
/* retrieve specific metadata fields */
KVMetaName
pName[2];
KVMetaNameRec names[2];
names[0].type = KVMetaNameType_Integer;
names[0].name.iname = KVPR_SUBJECT;
names[1].type = KVMetaNameType_Integer;
names[1].name.iname = KVPR_DISPLAY_TO;
pName[0] = &names[0];
pName[1] = &names[1];
metaArg.metaNameCount = 2;
metaArg.metaNameArray = pName;
metaArg.index = Index;
error = extractInterface->fpGetSubFileMetaData
(pFile,&metaArg,&metaData);
...
extractInterface->fpFreeStruct(pFile,metaData);
metaData = NULL;
•
•
156 ••
•
•
XML Export SDK C Programming Guide
fpOpenFile()
fpOpenFile()
This function opens a file to make the file accessible for sub file extraction or
conversion.
Syntax
int (pascal *fpOpenFile) (
void
KVOpenFileArg
void
*pContext,
openArg,
**pFile);
Arguments
pContext
Pointer returned from fpInit().
openArg
Pointer to the structure KVOpenFileArg. This structure defines the
input parameters necessary to open a file for extraction, such as
credentials, and the default extraction directory. See “KVOpenFileArg”
on page 173.
Before initializing the KVOpenFileArg structure, use the macro
KVStructInit to initialize the KVStructHead structure. See
“KVStructHead” on page 240.
pFile
Handle for the opened file. This handle is used in subsequent file
extraction calls to identify the source file.
Returns

If the file is opened, the return value is KVERR_Success.

If the file is not opened, the return value is an error code and pFile is NULL.
Discussion
Call fpCloseFile() to free the memory allocated by this function. See
“fpCloseFile()” on page 147.
Example
KVOpenFileArgRec
openArg;
/*Initialize the structure using KVStructInit*/
KVStructInit(&openArg);
XML Export SDK C Programming Guide
•
•
• 157
•
•
•
Chapter 6 File Extraction API Functions
openArg.extractDir = destDir;
openArg.filePath
= srcFile;
/*Open the main file */
if ( (error =
extractInterface->fpOpenFile(pExport,&openArg,&pFile)))
{
extractInterface->fpCloseFile(pFile);
pFile = NULL;
}
•
•
158 ••
•
•
XML Export SDK C Programming Guide
CHAPTER 7
File Extraction API Structures
This section provides information on the structures used by the File Extraction
API. These structures define the input and output parameters required to extract
sub files from a container file, and are defined in kvxtract.h. This section
contains the following topics.

KVCredential

KVCredentialComponent

KVExtractInterface

KVExtractSubFileArg

KVGetSubFileMetaArg

KVMainFileInfo

KVMetadataElem

KVMetaName

KVOpenFileArg

KVOutputStream

KVSubFileExtractInfo

KVSubFileInfo

KVSubFileMetaData
XML Export SDK C Programming Guide
•
•
• 159
•
•
•
Chapter 7 File Extraction API Structures
KVCredential
This structure contains a count of the number of credential elements, and a
pointer to the first element of the array of individual elements. It is initialized by
calling fpOpenFile(). See “fpOpenFile()” on page 157. It is defined in
kvxtract.h.
typedef struct tag_KVCredential
{
int
itemCount;
KVCredentialComponent
*items;
}
KVCredentialRec, *KVCredential;
Member Descriptions
•
•
160 ••
•
•
itemCount
The number of credentials defined for this file.
items
Pointer to the structure KVCredentialComponent. This structure
contains the individual credential elements used to open a protected file.
See “KVCredentialComponent” on page 161.
XML Export SDK C Programming Guide
KVCredentialComponent
KVCredentialComponent
This structure contains the value of a credential item. It is defined in
kvxtract.h.
typedef struct tag_KVCredentialComponent
{
KVCredKeyType
keytype;
union
{
void
*pkey;
char
*skey;
unsigned int
ikey;
}
keyobj;
}
KVCredentialComponentRec, *KVCredentialComponent;
Member Descriptions
keytype
The type of credential (such as a user name or password). The types
are defined by the enumerated type KVCredKeyType. See
“KVCredKeyType” on page 271.
pkey
Pointer to a structure defining credentials. Reserved for future use.
skey
Pointer to a string credential key.
ikey
An integer credential key.
XML Export SDK C Programming Guide
•
•
• 161
•
•
•
Chapter 7 File Extraction API Structures
KVExtractInterface
The members of this structure are pointers to the file extraction functions
described in “File Extraction API Functions” on page 145. When the function
KVGetExtractInterface() is called, this structure assigns pointers to the
functions. The structure is defined in kvxtract.h. See
“KVGetExtractInterface()” on page 146.
typedef struct
{
tag_KVExtractInterface
KVStructHeader;
int (pascal *fpOpenFile) (void *pContext,KVOpenFileArg openArg,
void **pFileHandle);
int (pascal *fpCloseFile)
(void *pFileHandle);
int (pascal *fpGetMainFileInfo) (void *pFile, KVMainFileInfo
*MainFileInfo);
int (pascal *fpGetSubFileInfo) (void *pFile, int index,
KVSubFileInfo *subFileInfo);
int (pascal *fpGetSubFileMetaData) (void *pFile,
KVGetSubFileMetaArg metaArg, KVSubFileMetaData *metaData);
int (pascal *fpExtractSubFile) (void *pFile,
KVExtractSubFileArg extractArg, KVSubFileExtractInfo
*extractInfo);
int (pascal *fpFreeStruct) (void *pFile, void *obj);
}
KVExtractInterfaceRec, *KVExtractInterface;
Member Descriptions
The member functions are described in “File Extraction API Functions” on
page 145.
Discussion
Before initializing a File Extraction structure, use the macro KVStructInit to
initialize the KVStructHead structure. This sets the revision number of the File
Extraction API and supports binary compatibility with future releases. See
“KVStructHead” on page 240.
•
•
162 ••
•
•
XML Export SDK C Programming Guide
KVExtractSubFileArg
KVExtractSubFileArg
This structure defines the input parameters required to extract a sub file. See
“fpExtractSubFile()” on page 148. It is defined in kvxtract.h.
typedef struct tag_KVExtractSubFileArg
{
KVStructHeader;
int
index;
KVCharSet
srcCharset;
KVCharSet
trgCharset;
int
isMSBLSB;
DWORD
extractionFlag
char
*filePath;
char
*extractDir;
KVOutputStream
*stream;
}
KVExtractContainerSubFileArgRec, *KVExtractContainerSubFileArg;
Member Descriptions
KVStructHeader
The KeyView version of the structure. See “KVStructHead” on
page 240.
index
The index number of the sub file to be extracted.
srcCharset
Specifies the source character set of the sub file when the file
format’s reader cannot determine the character set. The character
sets are enumerated in KVCharSet of kvtypes.h. See
“Discussion” below.
trgCharset
If the file type is KVFileType_Main, this is the target character
set of the extracted file. Otherwise, this is ignored. The character
sets are enumerated in KVCharSet in kvtypes.h. See
“Discussion” below.
isMSBLSB
This flag indicates whether the byte order for Unicode text is Big
Endian (MSBLSB) or Little Endian (LSBMSB).
XML Export SDK C Programming Guide
•
•
• 163
•
•
•
Chapter 7 File Extraction API Structures
extractionFlag
A bitwise flag defining additional parameters for file extraction.
The following flags are available:
 KVExtractionFlag_CreateDir
Indicates whether the directory structure of a sub file should be
created. If this is set, the path defined in filePath is created
if it does not already exist. If this is not set, the path is not
created, and the function returns FALSE.
 KVExtractionFlag_Overwrite
If this is set, and the file being extracted has the same name as
a file in the target path, the file in the target path is overwritten
without warning. If this is not set, and a sub file has the same
name as a file in the target path, the error
KVError_OutputFileExists is generated.
 KVExtractionFlag_ExcludeMailHeader
If this is set, header information (To, From, Sent, and so on) in
a mail file is not included in the extracted data. If this is not set,
the extracted data contains header information and the
message’s body text. See “Exclude Metadata from the
Extracted Text File” on page 80.
 KVExtractionFlag_GetFormattedBody
If this is set, the formatted version of the message body (HTML
or RTF) is extracted from mail files when possible. If neither an
HTML nor RTF version of the message body exists in the mail
file, then it is extracted as plain text. If this flag is not set, the
message body is extracted as plain text when possible.
Note: When an HTML or RTF message body is extracted, the
message’s mail headers (such as “From,” “To,” and “Subject,”)
are extracted, saved in the same format, and added to the
beginning of the sub file. This applies to PST (MAPI-based
reader), MSG and NSF files only.
 KVExtractionFlag_SaveAsMSG
If this is set, the mail message is extracted as an MSG file,
including all of its attachments. If this flag is not set, the mail
message is extracted as text. This applies to PST files on
Windows only.
Note: In file mode, when the application sets this flag in
fpExtractSubFile(), it must also check the
KVSubFileExtractInfo structure’s filePath parameter
to verify the filename used for extraction. See
“fpExtractSubFile()” on page 148 and “KVSubFileExtractInfo”
on page 176.
•
•
164 ••
•
•
XML Export SDK C Programming Guide
KVExtractSubFileArg
filePath
Pointer to the suggested path or filename to which the sub file is
extracted. This can be a filename, partial path, or full path. This
can be used in conjunction with extractDir to create the full
output path. See “Discussion” below.
extractDir
Pointer to the directory to which sub files are extracted. This
directory must exist. If this is set, the path specified in
KVOpenFileArg->extractDir is ignored. This is used in
conjunction with filePath to create the full output path.
stream
Pointer to an output stream defined by KVOutputStream. See
“KVOutputStream” on page 175. See “Discussion” below.
Discussion

If the document character set is detected and is also specified in
srcCharset, the detected character set is overridden by the specified
character set. If the source character set is not detected and is not specified,
character set conversion does not occur. The section “Supported Formats” on
page 294 lists the formats for which the source character set can be
determined.

The KVSubFileExtractInfoFlag_CharsetConverted flag in the
KVSubFileExtractInfo structure indicates whether the character set of
the sub file was converted during extraction. See “KVSubFileExtractInfo” on
page 176.

The following applies when the output is to a file:
 If filePath is a valid full path, filePath is the output path, and the path
in extractDir is ignored.
 If filePath is a filename or partial path, the target directory specified in
either KVExtractSubFileArg->extractDir or
KVOpenFileArg->extractDir is used to create the full path. See
“KVOpenFileArg” on page 173.
 If filePath is a full path or partial path, and createDir is TRUE, the
directory is created if it does not already exist.
 If filePath is not specified, a default name and the target directory
specified in either KVExtractSubFileArg->extractDir or
KVOpenFileArg->extractDir are used to create a full path.
 If both filePath and extractDir are not specified or are invalid, an
error is returned.
 If filePath is valid, but extractDir is not valid, an error is returned.

The following applies when the output is to a stream:
XML Export SDK C Programming Guide
•
•
• 165
•
•
•
Chapter 7 File Extraction API Structures
 Set filePath and extractDir to NULL.
 The file format (docInfo) and extraction file path (filePath) are not
returned in KVSubFileExtractInfo. See “KVSubFileExtractInfo” on
page 176.
 The flags KVExtractionFlag_CreateDir and
KVExtractionFlag_Overwrite are ignored.
•
•
166 ••
•
•
XML Export SDK C Programming Guide
KVGetSubFileMetaArg
KVGetSubFileMetaArg
This structure defines the metadata tags whose values are retrieved by
fpGetSubFileMetaData(). See “fpGetSubFileMetaData()” on page 155. It is
defined in kvxtract.h.
typedef struct tag_KVGetSubFileMetaArg
{
KVStructHeader;
int
index;
int
metaNameCount;
KVMetaName
*metaNameArray;
KVCharSet
srcCharset;
KVCharSet
trgCharset;
int
isMSBLSB;
}
KVGetSubFileMetaArgRec, *KVGetSubFileMetaArg;
Member Descriptions
KVStructHeader
The KeyView version of the structure. See “KVStructHead” on
page 240.
index
The index number of the sub file for which metadata is extracted.
metaNameCount
The number of metadata fields to be extracted.
metaNameArray
Pointer to the structure KVMetaName containing an array of
metadata tags whose values are retrieved. See “KVMetaName”
on page 172.
srcCharset
Specifies the source character set of the metadata when the
format’s reader cannot determine the character set. The
character sets are enumerated in KVCharSet of kvtypes.h.
See “Discussion” below.
trgCharset
The target character set of the extracted metadata.
The character sets are enumerated in KVCharSet in
kvtypes.h.
isMSBLSB
This flag indicates whether the byte order for Unicode text is Big
Endian (MSBLSB) or Little Endian (LSBMSB).
Discussion

If the character set is detected and is also specified in srcCharset, the
detected character set is overridden by the specified character set. If the
XML Export SDK C Programming Guide
•
•
• 167
•
•
•
Chapter 7 File Extraction API Structures
source character set is not detected and is not specified, character set
conversion does not occur. The section “Supported Formats” on page 294 lists
the formats for which the source character set can be determined.

•
•
168 ••
•
•
To retrieve a pre-defined list of metadata, pass 0 for metaNameCount and
NULL for metaNameArray. The metadata in Table 6 on page 73 is extracted.
XML Export SDK C Programming Guide
KVMainFileInfo
KVMainFileInfo
This structure contains information about a main file that is open for extraction. It
is initialized by calling fpGetMainFileInfo(). See “fpGetMainFileInfo()” on
page 151. It is defined in kvxtract.h.
typedef struct tag_KVMainFileInfo
{
KVStructHeader;
int
numSubFiles;
ADDOCINFO
docInfo;
KVCharSet
charset;
int
isMSBLSB;
unsigned long
infoFlag;
}
KVMainFileInfoRec, *KVMainFileInfo;
Member Descriptions
KVStructHeader
The KeyView version of the structure. See “KVStructHead” on
page 240.
numSubFiles
The number of sub files in the main file.
docInfo
The file’s major format (such as Microsoft Word or Corel
Presentation) as defined by the structure ADDOCINFO. See
“ADDOCINFO” on page 234.
charset
The character set of the main file.
isMSBLSB
This flag indicates whether the byte order for Unicode text is Big
Endian (MSBLSB) or Little Endian (LSBMSB).
infoFlag
A bitwise flag providing additional information about the main file.
The following flag is available:
KVMainFileInfoFlag_HasContent—The main file contains
text that can be converted. Below are some examples of how this
flag is used:
 For an MSG file without attachments, numSubFiles is 1
(message body text), and this flag is FALSE because the
MSG file itself does not contain text.
 For a Zip file with three files, numSubFiles is 3, and this flag
is FALSE because a Zip file does not contain text.
 For a Microsoft Word file with an embedded OLE object,
numSubFiles is 1 (OLE object), and this flag is TRUE (Word
file contains text to be converted).
XML Export SDK C Programming Guide
•
•
• 169
•
•
•
Chapter 7 File Extraction API Structures
Discussion
•
•
170 ••
•
•

If numSubFiles is non-zero, get information on the sub file by calling
fpGetSubFileInfo(), and then extract the sub files using
fpExtractSubFile(). See “fpGetSubFileInfo()” on page 153 and
“fpExtractSubFile()” on page 148.

If numSubFiles is 0, the file does not contain sub files and does not need to
be extracted further. If the KVMainInfoFlag_HasContent flag is set, the
file contains body text and can be passed directly to the conversion functions.
See “XML Export API Functions” on page 183.

If openFlag is set to KVOpenFileFlag_CreateRootNode in the call to
fpOpenFile(), numSubFiles also includes the root object (index 0) which
is created by KeyView for reconstructing the file’s hierarchy. See
“KVOpenFileArg” on page 173.
XML Export SDK C Programming Guide
KVMetadataElem
KVMetadataElem
This structure contains metadata field values extracted from a mail file. It is
defined in kvtypes.h.
typedef struct tag_KVMetadataElem
{
int
isDataValid;
int
dataID;
KVMetadataType
dataType;
char*
strType;
void*
data;
int
dataSize;
}
KVMetadataElem;
Member Descriptions
isDataValid
Specifies whether the metadata returned from the API is valid data.
dataID
The integer name of the extracted metadata field.
dataType
The data type of the metadata field. The types are defined in
KVMetadataType in kvtypes.h. See “KVMetadataType” on
page 283.
strType
Pointer to the string name of the metadata field.
data
The contents of the metadata field.
If the type member is KVMetadata_Int4 or
KVMetadata_Bool, this member contains the actual value.
Otherwise, this member is a pointer to the actual value.
KVMetadata_DateTime points to an 8-byte value.
KVMetadata_String and KVMetadata_Unicode point to the
beginning of the string containing the text. The strings are NULL
terminated.
KVMetadata_Binary points to the first element of a byte array.
dataSize
XML Export SDK C Programming Guide
The byte count of data when the type is KVMetadata_Binary,
KVMetadata_Unicode or KVMetadata_String.
•
•
• 171
•
•
•
Chapter 7 File Extraction API Structures
KVMetaName
This structure defines the names of the metadata fields to be extracted from a mail
file. It is defined in kvxtract.h.
typedef struct tag_KVMetaName
{
KVMetaNameType
type;
union
{
void
*pname;
int
iname;
char
*sname;
}name;
}
KVMetaNameRec, *KVMetaName;
Member Descriptions
type
The type of metadata name (such as integer or string). The types are
defined by the enumerated type KVMetaNameType. See
“KVMetaNameType” on page 285. Note MAPI property names are of
type integer.
pname
Pointer to a structure defining the metadata fields to be retrieved.
iname
The name of a metadata field of type integer.
sname
Pointer to the name of a metadata field of type string.
Discussion
If you specify the MAPI tag name (for example, PR_CONVERSATION_TOPIC), you
must include the Windows header files mapitags.h and mapidefs.h in which
PR_CONVERSATION_TOPIC is defined as 0x0070001e.
•
•
172 ••
•
•
XML Export SDK C Programming Guide
KVOpenFileArg
KVOpenFileArg
This structure defines the input arguments necessary to open a file for extraction.
It is initialized by calling fpOpenFile(). See “fpOpenFile()” on page 157. It is
defined in kvxtract.h.
typedef struct tag_KVOpenFileArg
{
KVStructHeader;
KVCredential
cred;
KVInputStream *stream;
char
*filePath;
char
*extractDir;
DWORD
openFlag;
DWORD
reserved;
void
*pReserved;
}
KVOpenFileArgRec, *KVOpenFileArg;
Member Descriptions
KVStructHeader
The KeyView version of the structure. See “KVStructHead” on
page 240.
cred
The credentials required to open a protected PST or NSF file. This
is a pointer to the KVCredential structure. Your application can
define multiple credentials to this member for multiple formats.
See “KVCredential” on page 160.
stream
Pointer to the developer-assigned instance of KVInputStream.
The structure KVInputStream defines the input stream
containing the source. See “KVInputStream” on page 235.
If you are using a file as input, this is NULL.
filePath
Pointer to the full file path to the source file.
If you are using a stream as input, this is NULL.
extractDir
Pointer to the default directory to which sub files are extracted.
This directory must exist.
This is used in conjunction with
KVExtractSubFileArg->filePath to create the full output
path. See “KVExtractSubFileArg” on page 163.
XML Export SDK C Programming Guide
•
•
• 173
•
•
•
Chapter 7 File Extraction API Structures
openFlag
A bitwise flag defining additional parameters for opening the file.
The following flag is available:
KVOpenFileFlag_CreateRootNode—If this flag is set,
KeyView creates a root object when extracting this file’s sub files.
This root node does not have a parent and is at the highest level
of the file’s tree structure. It is used internally to provide a
reference point from which all other child nodes are determined,
and the file’s hierarchy is created.
If you want to maintain the file’s hierarchy when you extract sub
files from a container, you must set this flag. See “Recreate a File’s
Hierarchy” on page 70 for more information.
The root node has an index of zero. Although not all container
formats require an artificial root node, the root is created for all
container formats regardless of whether the file itself contains a
root directory or file.
•
•
174 ••
•
•
reserved
Reserved for future use. It must be NULL.
pReserved
Reserved for future use. It must be NULL.
XML Export SDK C Programming Guide
KVOutputStream
KVOutputStream
This structure defines an output stream for the extracted sub file.
typedef struct tag_OutputStream
{
void *pOutputStreamPrivateData;
BOOL (pascal *fpCreate)(struct tag_OutputStream
UINT (pascal *fpWrite) (struct tag_OutputStream
BOOL (pascal *fpSeek) (struct tag_OutputStream
long (pascal *fpTell) (struct tag_OutputStream
BOOL (pascal *fpClose) (struct tag_OutputStream
}
KVOutputStream;
*,TCHAR *);
*, BYTE *, UINT);
*, long, int);
*);
*);
Member Descriptions
All member functions are equivalent to their counterparts in the ANSI standard
library.
XML Export SDK C Programming Guide
•
•
• 175
•
•
•
Chapter 7 File Extraction API Structures
KVSubFileExtractInfo
This structure contains information about an extracted sub file. It is initialized by
calling fpExtractSubFile(). See “fpExtractSubFile()” on page 148. It is
defined in kvxtract.h.
typedef struct tag_KVSubFileExtractInfo
{
KVStructHeader;
char
*filePath;
char
*fileName;
unsigned long
infoFlag;
ADDOCINFO
docInfo;
}
KVSubFileExtractInfoRec, *KVSubFileExtractInfo;
Member Descriptions
KVStructHeader
The KeyView version of the structure. See “KVStructHead” on
page 240.
filePath
The full path to which the sub file was extracted.
If the sub file is embedded in the main file as a link, this is the
external path to the sub file.
If you output the data to a stream, the extraction path is not
returned.
•
•
176 ••
•
•
XML Export SDK C Programming Guide
KVSubFileExtractInfo
fileName
The original path and/or filename of the sub file.
If the sub file is embedded in the main file as a link, this is the
external path to the sub file.
infoFlag
A bitwise flag providing additional information about the extracted
sub file. The following flags are available:
 KVSubFileExtractInfoFlag_NeedsExtraction—The
file may contain sub files and should be extracted further.
 KVSubFileExtractInfoFlag_FileCreated—The file
was created on disk.
 KVSubFileExtractInfoFlag_CharsetConverted—The
sub file’s character set was converted.
 KVSubFileExtractInfoFlag_External—The sub file is
embedded in the main file as a link and is stored externally.
For example, the sub file may be an object that was
embedded in a Word document using “Link to File,” or an
attachment that is referenced in an MBX message. This type
of file cannot be extracted. You must write code to access the
sub file based on the path in the member filePath or
fileName.
 KVSubFileExtractInfoFlag_FolderCreated—A folder
was created.
 KVSubFileExtractInfoFlag_NonFormattedBodyExtra
cted—Indicates that a plain text version of the message was
extracted due to an error extracting the formatted version of
the message.
docInfo
The file’s major format (such as Microsoft Word or Corel
Presentation) as defined by the structure ADDOCINFO. See
“ADDOCINFO” on page 234.
If you output the data to a stream, the file format is not returned.
XML Export SDK C Programming Guide
•
•
• 177
•
•
•
Chapter 7 File Extraction API Structures
KVSubFileInfo
This structure contains information about a sub file in a container file. It is
initialized by calling fpGetSubFileInfo(). See “fpGetSubFileInfo()” on
page 153. It is defined in kvxtract.h.
typedef struct tag_KVSubFileInfo
{
KVStructHeader;
char
*subFileName;
int
subFileType;
long
subFileSize;
unsigned long
infoFlag;
KVCharSet
charset;
int
isMSBLSB;
BYTE
fileTime[8];
int
parentIndex;
int
childCount;
int
*childArray;
}
KVContainerSubFileInfoRec, *KVSubFileInfo;
Member Descriptions
KVStructHeader
The KeyView version of the structure. See “KVStructHead” on
page 240.
subFileName
The path and/or file name of the sub file.
If the sub file is the body text of a mail file or is an embedded OLE
object, KeyView provides a default filename. See “Default
Filenames for Extracted Sub Files” on page 91.
subFileType
The sub file’s position in the container file’s hierarchy. The
following options are available:
 KVSubFileType_Main—The sub file is at the top level of the
main file. This is the default sub file type. See “Discussion”
below.
 KVSubFileType_Attachment—The sub file is an
attachment in a file.
 KVSubFileType_OLE—The sub file is an embedded OLE
object in a compound document.
 KVSubFileType_Folder—The sub file is a folder or the
artificial root node (see “Create a Root Node” on page 70).
•
•
178 ••
•
•
XML Export SDK C Programming Guide
KVSubFileInfo
subFileSize
The size of the sub file in bytes. This information may be useful if
you do not want to extract very large files.
This value is approximate and is the maximum size of the sub file.
The sub file is usually smaller than this value when it is extracted.
infoFlag
A bitwise flag providing additional information about the sub file.
The following flags are available:
 KVSubFileInfoFlag_NeedsExtraction—The sub file
may contain sub files. It must be extracted further to
conclusively determine whether it contains sub files.
 KVSubFileInfoFlag_Secure—The sub file is secured and
credentials (such as user name and password) are required to
extract it. This flag applies to ZIP, RAR, and PDF files only.
 KVSubFileInfoFlag_SMIME—The sub file is S/
MIME-encrypted and credentials are required to extract it. This
applies to .eml and .pst files only.
 KVSubFileInfoFlag_External—The sub file is embedded
in the main file as a link and is stored externally. For example,
the sub file may be an object that was embedded in a Word
document using “Link to File,” or an attachment that is
referenced in an MBX message. This type of file cannot be
extracted. You must write code to access the sub file based on
the path in the member subFileName.
 KVSubFileInfoFlag_MailItem—When the sub file type is
KVSubFileType_Attachment, this indicates the attachment
is a mail item. This flag applies to PST, MSG, and NSF files
only.
charset
If the sub file is not an attachment, this is the character set of the
sub file. If the sub file is an attachment, the character set is
KVCS_UNKNOWN.
isMSBLSB
This flag indicates whether the byte order for Unicode text is Big
Endian (MSBLSB) or Little Endian (LSBMSB).
fileTime
When the sub file is a mail message, this is the file’s Sent time.
Otherwise, it is the last modified time. The file time is not available
for the following file types:
 EML attachments
 OLE objects in a Microsoft Office document
XML Export SDK C Programming Guide
•
•
• 179
•
•
•
Chapter 7 File Extraction API Structures
parentIndex
The index number of this file’s parent. For example, this may be
the index of a folder in which the sub file is stored, or file to which
the sub file is attached. If a file does not have a parent, the
parentIndex is -1.
childCount
The number of first-level children in the sub file.
childArray
Pointer to an array of first-level children in the sub file.
Discussion

•
•
180 ••
•
•
The KVSubFileType_Main type applies to the following for each file format:
File format
KVSubFileType_Main applies to...
MSG and EML
the message body.
Zip files
a file inside the archive.
PST files
an item that is not an attachment, an OLE object, or a root node.
MBX files
a message in the MBX file.
NSF files
an item that is not an attachment, an OLE object, or a root node.
PDF files
an item that is not an attachment or a root node.

If the flag KVSubFileInfoFlag_NeedsExtraction is set, open the sub
file and extract its children. See “fpOpenFile()” on page 157 and
“fpExtractSubFile()” on page 148.

The members parentIndex and childArray provide information about the
sub file’s parent and children. This information can be used to recreate the file
hierarchy on extraction. Since childArray only retrieves the first-level
children in the sub file, you must call fpGetSubFileInfo() repeatedly until
information for the leaf-node children is extracted. See “Recreate a File’s
Hierarchy” on page 70.
XML Export SDK C Programming Guide
KVSubFileMetaData
KVSubFileMetaData
This structure contains a count of the number of metadata elements extracted
from a mail file, and a pointer to the first element of the array of elements. It is
initialized by calling fpGetSubFileMetadata(). See
“fpGetSubFileMetaData()” on page 155. It is defined in kvxtract.h.
typedef struct tag_KVSubFileMetaData
{
KVStructHeader;
int
nElem;
KVMetadataElem**
ppElem;
unsigned long
infoFlag;
}
KVSubFileMetaDataRec, *KVSubFileMetaData;
Member Descriptions
KVStructHeader
The KeyView version of the structure. See “KVStructHead” on
page 240.
nElem
The number of metadata fields contained in the array.
ppElem
Pointer to an array of pointers that are the memory addresses of
metadata field values in the structure KVMetadataElem. See
“KVMetadataElem” on page 171.
infoFlag
A bitwise flag defining additional properties of the extracted
metadata. The following flag is available:
KVSubFileMetaInfoFlag_CharsetConverted—Indicates
the metadata’s character set was converted.
XML Export SDK C Programming Guide
•
•
• 181
•
•
•
Chapter 7 File Extraction API Structures
•
•
182 ••
•
•
XML Export SDK C Programming Guide
CHAPTER 8
XML Export API Functions
This section describes the functions in the XML Export API. These functions
manage the input and output streams, and perform the document conversion.
Each function appears as a function prototype followed by a description of its
arguments, return value, and discussion of its use. This section contains the
following topics:

KVXMLGetInterface()

fpConvertStream()

fpFileToInputStreamCreate()

fpFileToInputStreamFree()

fpFileToOutputStreamCreate()

fpFileToOutputStreamFree()

fpGetAnchor()

fpGetConvertFileList()

fpGetStreamInfo()

fpGetSummaryInfo()

fpInit()

fpSetStyleMapping()

fpShutDown()

fpValidateTemplate()
XML Export SDK C Programming Guide
•
•
• 183
•
•
•
Chapter 8 XML Export API Functions
•
•
184 ••
•
•

KVXMLConfig()

KVXMLConvertFile()

KVXMLEndOOPSession()

KVXMLSetStyleSheet()

KVXMLStartOOPSession()
XML Export SDK C Programming Guide
KVXMLGetInterface()
KVXMLGetInterface()
This function is exported by the Export definition file. It supplies function pointers
to other Export functions. When KVXMLGetInterface() is called, it assigns the
function pointers in the structure KVXMLInterface to other functions described in
this chapter. For example, KVXMLInterface.fpInit is assigned to point to
KVXMLInit().
Syntax
void pascal KVXMLGetInterface (KVXMLInterface *pInterface);
Arguments
pInterface
Pointer to the structure KVXMLInterface. See
“KVXMLInterface” on page 251.
Returns
None.
Discussion

One of the initial steps in using the XML Export API is to create an instance of
a KVXMLInterface structure and use this function to gain access to other
functions.

The functions can be called directly. For example, you can call
KVXMLGetSummaryInfo() instead of using fpGetSummaryInfo() in
KVXMLInterface. However, it is recommended that you assign the function
pointers in KVXMLInterface to the functions for efficiency.
XML Export SDK C Programming Guide
•
•
• 185
•
•
•
Chapter 8 XML Export API Functions
fpConvertStream()
This function converts either a source stream or file to an output stream.
Syntax
BOOL pascal fpConvertStream(
void
*pContext,
void
*pCallingContext,
KVInputStream
*pInput,
KVOutputStream
*pOutput,
KVXMLTemplate
*pTemplates,
KVXMLOptions
*pOptions,
KVXMLTOCOptions
*pTOCCreateOptions,
KVXMLCallbacks
*pCallbacks,
BOOL
bIndex,
KVErrorCode
*pError );
Arguments
pContext
Pointer returned from fpInit().
pCallingContext
Pointer passed back to the callback functions.
pInput
Pointer to the developer-assigned instance of
KVInputStream. The structure KVInputStream defines
the input stream containing the source for the conversion. See
“KVInputStream” on page 235.
pOutput
Pointer to the developer-assigned instance of
KVOutputStream. The structure KVOutputStream defines
the output stream to which Export writes the generated XML.
See “KVOutputStream” on page 237.
pTemplates
Pointer to the data structure KVXMLTemplate. It defines the
overall structure of the output. Individual elements within the
structure define the markup written at specific points in the
output stream. See “KVXMLTemplate” on page 262.
If this pointer is NULL, the default values for the structure are
used.
•
•
186 ••
•
•
XML Export SDK C Programming Guide
fpConvertStream()
pOptions
Pointer to the data structure KVXMLOptions. It defines the
options that control the markup written in response to the
general style and attributes (font, color, and so on) of the
document. See “KVXMLOptions” on page 253.
If this pointer is NULL, the default values for the structure are
used.
pTOCCreateOptions
Pointer to the data structure KVXMLTOCOptions. It specifies
whether a heading is included in the table of contents. See
“KVXMLTOCOptions” on page 267.
If this pointer is NULL, the default values for the structure are
used.
pCallbacks
Pointer to the data structure KVXMLCallbacks. It is a
structure of functions that Export calls for specific,
user-defined purposes. See “KVXMLCallbacks” on page 247.
If callbacks are not used, then this can be NULL.
bIndex
Set this to TRUE to generate output with minimal markup and
without images. Since the generated output is minimized to
textual content, it is suitable for an indexing engine. If bIndex
is set to FALSE, embedded images in a document are
regenerated as separate files and stored in the output
directory.
This can be set through the bIndexOnly member of the
structure KVXMLOptions. See “KVXMLOptions” on
page 253.
To generate output with verbose markup and without images,
set the nType argument of the function KVXMLConfig() to
KVCFG_SUPPRESSIMAGES. See “KVXMLSetStyleSheet()” on
page 219.
pError
Pointer to an error code if the call to fpConvertStream()
fails.
Returns

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE.

Only pContext, pInput, pOutput, and bIndex are required. All other pointers
should be NULL when they are not set.
Discussion
XML Export SDK C Programming Guide
•
•
• 187
•
•
•
Chapter 8 XML Export API Functions

If pCallbacks is NULL, pOptions->pszDefaultOutputDirectory must be
valid, except when bIndex is set to TRUE.

This function runs in-process or out of process. See “Convert Files Out of
Process” on page 43.

When converting out of process, this function must be called after the call to
KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession().
See “KVXMLStartOOPSession()” on page 221 and
“KVXMLEndOOPSession()” on page 217.

When converting out of process, the values for the KVXMLTemplate,
KVXMLOptions, and KVXMLTOCOptions structures should be set to NULL.
These structures are already passed in the call to KVXMLStartOOPSession().
See “KVXMLStartOOPSession()” on page 221.
Example
The following sample code is from the cnv2xml sample program:
if(!(*KVXMLInt.fpConvertStream)(
pKVXML,
/* Pointer returned by fpInit()
*/
NULL,
/* Pointer for callback functions */
&Input,
/* Input stream
*/
&Output,
/* Output stream
*/
NULL,
/* Mark-up and related variables */
&XMLOptions,
/* Options
*/
NULL,
/* TOC options
*/
NULL,
/* Pointer to callback functions */
FALSE,
/* Index mode
*/
&error))
/* Error return value
*/
{
printf("Error converting %s to XML %d\n", argv[i - 1], error);
}
else
{
printf("Conversion of %s to XML completed.\n\n", argv[i - 1]);
}
•
•
188 ••
•
•
XML Export SDK C Programming Guide
fpFileToInputStreamCreate()
fpFileToInputStreamCreate()
This function creates an input stream from an input file.
Syntax
BOOL pascal _export
void
char
KVInputStream
fpFileToInputStreamCreate(
*pContext,
*pszFileName,
*pInput);
Arguments
pContext
Pointer returned from fpInit().
pszFileName
Pointer to the name of the input file to be converted.
pInput
Pointer to the developer-assigned instance of KVInputStream.
The structure KVInputStream defines the input stream
containing the source for the conversion. See “KVInputStream” on
page 235.
Returns

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE. Processing is halted.
Discussion
After the conversion is complete, call fpFileToInputStreamFree() to free the
memory allocated by this function.
Example
The following sample code is from the cnv2xml sample program:
if(!(*KVXMLInt.fpFileToInputStreamCreate)(pKVXML, argv[i++],
&Input))
{
printf("Error creating input stream\n");
(*KVXMLInt.fpShutDown)(pKVXML);
mpFreeLibrary(hKVXML);
return (5);
}
XML Export SDK C Programming Guide
•
•
• 189
•
•
•
Chapter 8 XML Export API Functions
fpFileToInputStreamFree()
This function frees the memory used to create an input stream.
Syntax
BOOL pascal _export fpFileToInputStreamFree(
void
*pContext,
KVInputStream
*pInput);
Arguments
pContext
Pointer returned from fpInit().
pInput
Pointer to the developer-assigned instance of KVInputStream.
The structure KVInputStream defines the input stream containing
the source for the conversion. See “KVInputStream” on page 235.
Returns

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE. Processing is halted.
Discussion
After the conversion is complete, call this function to free the memory allocated by
fpFileToInputStreamCreate().
•
•
190 ••
•
•
XML Export SDK C Programming Guide
fpFileToOutputStreamCreate()
fpFileToOutputStreamCreate()
This function creates an output stream from an output file.
Syntax
BOOL pascal _export fpFileToOutputStreamCreate(
void
*pContext,
char
*pszFileName,
KVOutputStream
*pOutput );
Arguments
pContext
Pointer returned from fpInit().
pszFileName
Pointer to the name of the output file to be created.
pOutput
Pointer to the developer-assigned instance of
KVOutputStream. The structure KVOutputStream defines
the output stream to which Export writes the generated XML.
See “KVOutputStream” on page 237.
Returns

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE. Processing is halted.
Discussion
After the conversion is complete, call fpFileToOutputStreamFree() to free the
memory allocated by this function.
Example
The following sample code is from the cnv2xml sample program:
if (!(*KVXMLInt.fpFileToOutputStreamCreate)(pKVXML, argv[i],
&Output))
{
printf("Error creating output stream\n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpShutDown)(pKVXML);
mpFreeLibrary(hKVXML);
return 6;
}
XML Export SDK C Programming Guide
•
•
• 191
•
•
•
Chapter 8 XML Export API Functions
fpFileToOutputStreamFree()
This function frees the memory used to create the output stream.
Syntax
BOOL pascal _export fpFileToOutputStreamFree(
void
*pContext,
KVOutputStream
*pOutput );
Arguments
pContext
Pointer returned from fpInit().
pOutput
Pointer to the developer-assigned instance of
KVOutputStream. The structure KVOutputStream defines
the output stream to which Export writes the generated XML.
See “KVOutputStream” on page 237.
Returns

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE. Processing is halted.
Discussion
After the conversion is complete, call this function to free the memory allocated by
fpFileToOutputStreamCreate().
•
•
192 ••
•
•
XML Export SDK C Programming Guide
fpGetAnchor()
fpGetAnchor()
This function gets the filename automatically generated by Export and used for
external graphics referenced with <a xmlns:xlink= xlink href=> tags and for
heading-level table of contents entries.
Syntax
BOOL pascal fpGetAnchor(
void
*pCallingContext,
KVXMLAnchorType
eAnchorType,
char
*pszAnchor,
int
cbAnchorMax,
BYTE
*pcHTML,
UINT
cbHTML);
Arguments
pCallingContext
Pointer passed back to the callback functions.
eAnchorType
Graphic or block anchor type for the output stream. It must be
one of the enumerated types defined in KVXMLAnchorType.
See “KVXMLAnchorType” on page 278.
pszAnchor
Pointer to the location in which the new anchor is stored.
cbAnchorMax
Maximum number of bytes to place in pszAnchor.
pcHTML
Pointer to either the markup defining the contents of the table of
contents entry, a pointer to the external graphic name, or NULL.
cbHTML
Number of valid bytes in pcHTML.
Returns

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE. Processing is halted.

pszAnchor must be assigned. It may be derived from the cbAnchorMax,
pcHTML, and cbHTML values that are also provided.

pcHTML may be NULL if the graphic is an internal part of the document.
Discussion
XML Export SDK C Programming Guide
•
•
• 193
•
•
•
Chapter 8 XML Export API Functions

•
•
194 ••
•
•
This function is exposed so that it may be called from the GetAnchor()
callback function to obtain default behavior for anchor types the callback is not
set to handle.
XML Export SDK C Programming Guide
fpGetConvertFileList()
fpGetConvertFileList()
This function gets the list of files automatically converted to XML during a call to
fpConvertStream() or KVXMLConvertFile().
Syntax
char ** pascal _export fpGetConvertFileList(
void
*pContext,
int
*pnSize );
Arguments
pContext
Pointer returned from fpInit().
pnSize
Pointer to the number of files generated by the conversion.
Returns
If no files are converted, the return value is a NULL pointer. Otherwise, the return
value is a pointer to an array of strings that provides the available path information
for each converted file.
Discussion

The array of file path information includes all externally generated files,
including graphic files. Note that the main output file is not included in the
array, nor in the count of the number of files converted.

The memory used by the array of file path information is freed by the API.

The array is not valid after a call to fpShutDown().

This function runs in-process or out of process. See “Convert Files Out of
Process” on page 43.

When converting out of process, this function must be called after the call to
KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession().
See “KVXMLStartOOPSession()” on page 221 and
“KVXMLEndOOPSession()” on page 217.
XML Export SDK C Programming Guide
•
•
• 195
•
•
•
Chapter 8 XML Export API Functions
fpGetStreamInfo()
This function extracts file format information and character set from the source
document.
Syntax
BOOL pascal _export
void
KVInputStream
KVStreamInfo
fpGetStreamInfo (
*pContext,
*pInput,
*pStreamInfo );
Arguments
pContext
Pointer returned from fpInit().
pInput
Pointer to the developer-assigned instance of KVInputStream.
The structure KVInputStream defines the input stream containing
the source for the conversion. See “KVInputStream” on page 235.
pStreamInfo
Pointer to the developer-assigned instance of KVStreamInfo. The
structure KVStreamInfo defines the input stream document type
and character set. See “KVStreamInfo” on page 239.
You can examine the fields in the structure to determine the
appropriate template to use based on the document type.
Returns
•
•
196 ••
•
•

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE.
XML Export SDK C Programming Guide
fpGetSummaryInfo()
fpGetSummaryInfo()
This function extracts all metadata from the input stream. See “Extract Metadata”
on page 96 for more information.
Syntax
BOOL pascal _export fpGetSummaryInfo(
void
*pContext,
KVInputStream
*pInput,
KVSummaryInfoEx
*pSummary,
BOOL
bFree );
Arguments
pContext
Pointer returned from fpInit().
pInput
Pointer to the developer-assigned instance of
KVInputStream. The KVInputStream structure points to
the input stream containing the source for the conversion. See
“KVInputStream” on page 235.
pSummary
Points to the developer-assigned instance of
KVSummaryInfoEx. See “KVSummaryInfoEx” on page 244.
In this structure, nElem provides a count of the number of
metadata elements, and pElem points to the first element of the
array of individual elements as defined by the structure
KVSumInfoElemEx. See “KVSumInfoElemEx” on page 243.
bFree
Flag to free or fill the memory allocated to the document
metadata.
Returns

If the call is successful, the return value is TRUE. When the document does
not contain metadata, but the document reader can extract metadata from the
specified format, then this function returns TRUE with nElem set to 0.

If this call is unsuccessful, the return value is FALSE. This function returns
FALSE when the document reader does not support metadata extraction for
the specified format, or there is an error in extraction. The section “Supported
Formats” on page 294 lists the file formats for which metadata can be
determined.
XML Export SDK C Programming Guide
•
•
• 197
•
•
•
Chapter 8 XML Export API Functions
Discussion

For metadata to be extracted by Export, metadata must be defined in the
source document, and the document reader must be able to extract metadata
for the file format. The section “Supported Formats” on page 294 lists the file
formats for which metadata can be determined. Export does not generate
metadata automatically from the document contents.

This function runs in-process or out of process. See “Convert Files Out of
Process” on page 43.

This function may be called any time after the call to KVXMLInit().

When converting out of process, this function must be called after the call to
KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession().
See “KVXMLStartOOPSession()” on page 221 and
“KVXMLEndOOPSession()” on page 217.

Call this function with bFree set to FALSE to return an array of
KVSummaryInfoEx structures, each containing an element of available
document metadata.
 After processing the information in the structure, call this function with bFree
set to TRUE to free the memory allocated to the document metadata.
•
•
198 ••
•
•
XML Export SDK C Programming Guide
fpInit()
fpInit()
This function initializes an Export session. Its return value, pContext, is passed
as the first parameter to the File Extraction interface and all other Export
functions.
Syntax
void* pascal _export fpInit(
KVMemoryStream
*pMemAllocator,
char
*pszKeyViewDir,
char
*pszDataFile,
KVErrorCode
*pError,
DWORD
dWord);
Arguments
pMemAllocator
Pointer to a developer-defined memory allocator. If NULL is
passed, then the default C run-time memory allocation is used.
pszKeyViewDir
Pointer to the directory where the Export components are located.
This is normally the directory install\OS\bin, where install is
the pathname of the Export installation directory and OS is the name
of the operating system.
pszDataFile
Pointer to the directory and filename of the Export data file,
formats_e.ini. This file determines whether a format is
supported. If a format does not exist in this file, the conversion fails.
The formats_e.ini file is normally stored in the directory
install\OS\bin, where install is the pathname of the Export
installation directory and OS is the name of the operating system.
See “File Format Detection” on page 347 for more information.
pError
Pointer to an error code defined in KVErrorCode or
KVErrorCodeEx in kvtypes.h. See “KVErrorCode” on page 272
and “KVErrorCodeEx” on page 274.
dWord
Reserved. Must be 0.
Returns

If the call is successful, the return value is a pointer passed to all other
functions.

If the call is unsuccessful, the return value is a NULL pointer.
XML Export SDK C Programming Guide
•
•
• 199
•
•
•
Chapter 8 XML Export API Functions
Discussion

If pszKeyViewDir is NULL, the required components cannot be found. Ensure
it is valid.

If this function returns NULL, check stderr for the KeyView installation error
messages, “KeyView Export SDK License Key has Expired” and
“KeyView Export SDK License Key is Invalid”, and pass them to your
application. See the Export SDK Installation Instructions for more information
on the KeyView license feature.

To ensure multi-threaded conversions are thread-safe, you must create a
unique context pointer for every thread by calling fpInit(). In addition,
threads must not share context pointers, and the same context pointer must
be used for all API calls in the same thread. Creating a context pointer for
every thread does not affect performance because the context pointer uses
minimal resources.

When the conversion context is no longer required, it should be terminated by
calling fpShutdown(). See “fpShutDown()” on page 203.
Example
The following sample code is from the cnv2xml sample program:
pKVXML = (*KVXMLInt.fpInit)(NULL, ".", NULL, &error, 0);
if(!pKVXML)
{
printf("Error initializing KVXML: %d\n", error);
mpFreeLibrary(hKVXML);
return 4;
}
•
•
200 ••
•
•
XML Export SDK C Programming Guide
fpSetStyleMapping()
fpSetStyleMapping()
This function is used to set the mapping for user-defined styles. Export does not
make a distinction between paragraph styles or character styles, but operates
under the assumption that each style has a unique name.
Syntax
BOOL pascal _export fpSetStyleMapping(
void
*pContext,
KVStyle
*pStyles,
int
iStyles,
BOOL
bCopy);
Arguments
pContext
Pointer returned from fpInit().
pStyles
Pointer to the developer-assigned instance of KVStyle. See
“KVStyle” on page 241. The KVStyle structure defines the
elements of a custom style.
iStyles
Number of elements in the pStyles array.
bCopy
If Export is to allocate memory to copy the pStyles array, set
this to TRUE. If pStyles remains valid throughout the
conversion process, set this to FALSE.
Returns

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE.

Paragraph styles are presently implemented only for documents in Microsoft
Word, RTF, Folio Flat files, WordPro, and WordPerfect 6.x.

This function runs in-process or out of process. See “Convert Files Out of
Process” on page 43.

When converting out of process, this function must be called after the call to
KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession().
See “KVXMLStartOOPSession()” on page 221 and
“KVXMLEndOOPSession()” on page 217.
Discussion
XML Export SDK C Programming Guide
•
•
• 201
•
•
•
Chapter 8 XML Export API Functions

•
•
202 ••
•
•
Once this API function is called, the styles are valid until fpShutDown() is
called, or until this function is called again with a new style or NULL.
XML Export SDK C Programming Guide
fpShutDown()
fpShutDown()
This function terminates an Export session that was initialized by fpInit(), and
frees allocated system resources. It is called when the conversion context is no
longer required.
Syntax
void pascal _export fpShutDown(KVXMLContext *pContext);
Arguments
pContext
Pointer returned from fpInit().
Returns
None.
Discussion
After this function is called, the pContext pointer must not be passed to any XML
Export API.
XML Export SDK C Programming Guide
•
•
• 203
•
•
•
Chapter 8 XML Export API Functions
fpValidateTemplate()
This function is used to ensure that the markup is well-formed and valid according
to the DTD. It is currently not implemented.
•
•
204 ••
•
•
XML Export SDK C Programming Guide
KVXMLConfig()
KVXMLConfig()
This function is called directly and provides a way to configure options prior to the
document conversion. Currently, the function is used for the following
configurations:

Generate output without images
Generate output with verbose markup and without images. To generate output
with minimal markup (ID and style paragraph attributes) and without images,
set the bIndexOnly member of the structure KVXMLOptions. See
“KVXMLOptions” on page 253.

Enable PDF position information
Include position information in the markup generated for a PDF document.

Configure PDF bookmarks
Specify whether bookmarks in a PDF file are converted to simple XLinks in the
XML output.

Configure Word bookmarks
Disable the conversion of Microsoft Word bookmarks to zone elements.

Designate temporary directory
Specify a directory in which temporary files created during XML conversion
processes are stored.
NOTE On Windows systems, there is a 64K size limit to the temp
directory. Once the limit is reached, you must either create a new
directory or delete the contents of the existing directory; otherwise, you
may receive an error message.

Configure XML conversion
Specify the elements and attributes extracted from an XML document based
on the files document type.

Enable PDF logical reading order
Convert paragraphs in PDF files in the order in which they appear on the page
and with left-to-right or right-to-left paragraph direction. See “Convert PDF
Files to a Logical Reading Order” on page 112.

Configure PDF soft hyphens
XML Export SDK C Programming Guide
•
•
• 205
•
•
•
Chapter 8 XML Export API Functions
Specify whether soft hyphens are removed from the XML output. See “Control
Hyphenation” on page 115.

Enable Revision Marks
Converts text and graphics that were deleted from a document with revision
tracking enabled and includes revision tracking information in the XML output.
See “Convert Revision Tracking Information” on page 110.

Protected file password
Specifies the password to use to open a password-protected file for export.
Syntax
KVErrorCode pascal KVXMLConfig(
void
*pContext,
int
nType,
int
nValue,
void
*p );
Arguments
pContext
Pointer returned from fpInit().
nType
The configuration flag. This is a symbolic constant defined in
kvtypes.h. The available options are described in “Configuration
Flags” on page 207.
nValue
Integer value defined for the flags above.
This is TRUE or FALSE for all flags except KVCFG_LOGICALPDF,
KVCFG_SETTEMPDIRECTORY, and KVCFG_SETXMLCONFIGINFO.
For KVCFG_LOGICALPDF, this is one of the paragraph direction options
defined in the LPDF_DIRECTION enumerated type in kvtypes.h. See
“LPDF_DIRECTION” on page 290.
For KVCFG_SETTEMPDIRECTORY and KVCFG_SETXMLCONFIGINFO,
this is not set.
p
The data for the configuration flag.
This is NULL for all flags except KVCFG_SETTEMPDIRECTORY and
KVCFG_SETXMLCONFIGINFO.
For KVCFG_SETTEMPDIRECTORY, this is path to the directory where
temporary files are stored.
For KVCFG_SETXMLCONFIGINFO, this is a pointer to the
KVXConfigInfo structure. See “KVXConfigInfo” on page 245.
For KVCFG_SETPASSWORD, this is the source file password.
•
•
206 ••
•
•
XML Export SDK C Programming Guide
KVXMLConfig()
Configuration Flags
The following flags are available for the nType argument in KVXMLConfig().
These flags are defined in kvtypes.h.
Flag
Description
KVCFG_SUPPRESSIMAGES
If KVCFG_SUPPRESSIMAGES is set, the XML output includes
verbose markup, but no images. If this option is not set, then
embedded images in a document are regenerated as separate
files and stored in the output directory. To generate output with
minimal markup (ID and style paragraph attributes) and without
images, set the bIndexOnly member of the structure
KVXMLOptions to TRUE. See “KVXMLOptions” on page 253.
KVCFG_ENABLEPOSITIONINFO
If KVCFG_ENABLEPOSITIONINFO is set, then a position
element is included in the markup for PDF documents. The
position element defines the absolute position of the text relative
to the bottom left corner of the page, and includes additional
information such as font and color.
KVCFG_SUPPRESSTOCPRINTIMAGE
If the flag KVCFG_SUPPRESSTOCPRINTIMAGE is set, then
bookmarks in a PDF file are not converted to simple XLinks in
the XML output. By default, PDF bookmarks are converted to
source and destination anchors. For example,
<a xmlns:xlink="http://www.w3.org/TR/xlink"
xlink:href="#bmk1">Highlight File Format</a>
<a xmlns:xlink="http://www.w3.org/TR/xlink"
name="bmk1"/><img src="pdf14640.jpg"/>
KVCFG_DISABLEZONE
If the flag KVCFG_DISABLEZONE is set, the conversion of
Microsoft Word bookmarks to zone elements (<zone name
=“xxx”>) in the output XML is disabled.
A bookmark in Microsoft Word documents is a name given to a
selected area of the document. The bookmark may enclose
words, paragraphs, tables, table cells, lists, list items or the
entire document. In XML Export, bookmarks are converted to
zone elements (<Zone name="xxx">) using the KeyView
KVT_ZONE token.
Depending on how bookmarks are defined in the original
document, the creation of zone elements may result in
malformed XML. In this case, you can disable zone creation to
avoid these validity errors. Zone element creation is enabled by
default.
XML Export SDK C Programming Guide
•
•
• 207
•
•
•
Chapter 8 XML Export API Functions
Flag
Description
KVCFG_SETTEMPDIRECTORY
The flag KVCFG_SETTEMPDIRECTORY enables you to specify
the directory in which temporary files created during conversion
processes are stored. By default, the system temporary
directory is used.
To define a directory for temporary files generated during an
out-of-process conversion, set the tempfilepath parameter
in the formats_e.ini file. See “Convert Files Out of Process”
on page 43.
NOTE: On Windows systems, there is a 64K size limit to the
temp directory. Once the limit is reached, you must either create
a new directory or delete the contents of the existing directory;
otherwise, you may receive an error message.
KVCFG_SETXMLCONFIGINFO
The flag KVCFG_SETXMLCONFIGINFO enables you to define
which elements and attributes are extracted from XML
documents with a specified format ID or root element. This can
be used to override the default settings for the supported XML
formats (see “Convert XML Files” on page 120), or to define
settings for custom XML document types.
The settings are defined in the KVXConfigInfo structure (see
“KVXConfigInfo” on page 245). To set custom settings for more
than one document type, call the KVXMLConfig() function
once for each type.
Element extraction settings can also be modified using the
kvxconfig.ini file. See “Configure Element Extraction for
XML Documents” on page 121.
KVCFG_LOGICALPDF
The flag KVCFG_LOGICALPDF converts paragraphs in a PDF
file in the order in which they appear on the page (logical reading
order). The nValue argument specifies the paragraph
direction. See “Convert PDF Files to a Logical Reading Order”
on page 112.
KVCFG_DELSOFTHYPHEN
If the flag KVCFG_DELSOFTHYPHEN is set, soft hyphens in the
source document are removed, and the hyphenated words are
joined in the XML output. By default, soft hyphens are
maintained. See “Control Hyphenation” on page 115.
It is recommended you remove soft hyphens if you use Export
to generate text output for an indexing engine or are not
concerned with maintaining the document’s layout. See
“fpConvertStream()” on page 186 or “KVXMLConvertFile()” on
page 214 for more information on running Export in index mode.
•
•
208 ••
•
•
XML Export SDK C Programming Guide
KVXMLConfig()
Flag
Description
KVCFG_INCLREVISIONMARK
If this flag is set to TRUE, text and graphics that were deleted
from a document with a revision tracking feature enabled is
converted, and revision tracking information is included in the
XML output.
To reset the flag and exclude deleted content and revision
tracking information from the XML output, set the flag to FALSE.
See “Convert Revision Tracking Information” on page 110. The
default is FALSE.
KVCFG_WP_NOCOMMENTS
Set to TRUE not to export text from comments in Microsoft Word
documents. Comment text is exported by default from Microsoft
Word 97 to 2003 files.
Comment output can also be toggled by modifying the
formats_e.ini file. See “Show Hidden Data” on page 126.
KVCFG_WP_SHOWHIDDENTEXT
Set to TRUE to export hidden text from Microsoft Word
documents.
KVCFG_WP_SHOWDATEFIELDCODE
Set to TRUE to export date field codes from Microsoft Word
documents.
KVCFG_WP_SHOWFILENAMEFIELDCODE
Set to TRUE to export the file name field code from Microsoft
Word documents.
KVCFG_SS_SHOWHIDDENINFOR
Set to TRUE to export hidden information from Microsoft Excel
files.
KVCFG_SS_SHOWCOMMENTS
Set to TRUE to export comments from Microsoft Excel files.
KVCFG_SS_SHOWFORMULA
Set to TRUE to export formulas from Microsoft Excel files.
KVCFG_PG_HIDEHIDDENSLIDE
Set to TRUE not to export hidden slides from Microsoft
PowerPoint files.
KVCFG_PG_HIDECOMMENT
Set to TRUE not to export comments from Microsoft PowerPoint
files. Comments are exported by default from PowerPoint 97 to
2000 files.
XML Export SDK C Programming Guide
•
•
• 209
•
•
•
Chapter 8 XML Export API Functions
Flag
Description
KVCFG_PG_SHOWCOMMENTSSLIDE
Set to TRUE to export comments slides from Microsoft
PowerPoint 2003 and 2007 files.
KVCFG_PG_SHOWSLIDNOTES
Set to TRUE to export slide notes from Microsoft PowerPoint
files.
Slide note output can also be toggled by modifying the
formats_e.ini file. See “Show Hidden Data” on page 126.
KVCFG_SETPASSWORD
This flag enables you to define a password used to open a
password-protected file for export. See “Export Password
Protected Files” on page 411.
nValue is TRUE.
p is the source file password, which can have a maximum length
of 255 characters (the final byte is null).
Returns
The return value is one of the error codes defined in KVErrorCode in kvtypes.h.
Discussion

This function must be called after the call to fpInit() and before the call to
fpConvertStream() or KVXMLConvertFile().

This function runs in-process or out of process. See “Convert Files Out of
Process” on page 43.

When converting out of process, this function must be called after the call to
KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession().
See “KVXMLStartOOPSession()” on page 221 and
“KVXMLEndOOPSession()” on page 217.

To generate verbose markup, but no images:
Examples
(*fpXMLConfig)(pKVXML, KVCFG_SUPPRESSIMAGES, TRUE, NULL);

To specify bookmarks in a PDF file are not converted to XLinks in the XML
output:
(*fpXMLConfig)(pKVXML, KVCFG_SUPPRESSTOCPRINTIMAGE, TRUE,
NULL);

•
•
210 ••
•
•
To disable the conversion of zone elements:
XML Export SDK C Programming Guide
KVXMLConfig()
(*fpXMLConfig)(pKVXML, KVCFG_DISABLEZONE, TRUE, NULL);

To set a directory for temporary files:
char
tmpDir[250];
strcpy (tmpDir, "c:\\temp\\xmlexport");
(*fpXMLConfig)(pKVXML, KVCFG_SETTEMPDIRECTORY, 0, tmpDir);

To specify custom extraction settings for conversion of an XML file:
KVXConfigInfo xinfo; /* populate xinfo */
(*fpXMLConfig)(pKVXML, KVCFG_SETXMLCONFIGINFO, 0, &xinfo);

To specify PDF files are converted to a logical reading order, and the
paragraph direction for the PDF output is left to right:
(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_LTR, NULL);

To specify PDF files are converted to a logical reading order, and the
paragraph direction for the PDF output is right to left:
(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_RTL, NULL);

To specify PDF files are converted to a logical reading order, and the
paragraph direction for the PDF output is determined on the fly for each page:
(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_AUTO, NULL);

To specify soft hyphens are removed from the XML output:
(*fpXMLConfig)(pKVXML, KVCFG_DELSOFTHYPHEN, TRUE, NULL);

To convert text and graphics that are identified by revison marks:
(*fpXMLConfig)(pKVXML, KVCFG_INCLREVISIOMARK, TRUE, NULL);

To toggle hidden data output from Microsoft Word documents, use one of the
KVCFG_WP flags:
(*fpXMLConfig)(pKVXML, KVCFG_WP_NOCOMMENTS, TRUE, NULL);

To toggle hidden data output from Microsoft Excel documents, use one of the
KVCFG_SS flags:
(*fpXMLConfig)(pKVXML, KVCFG_SS_SHOWHIDDENINFOR, TRUE, NULL);

To toggle hidden data output from Microsoft PowerPoint documents, use one
of the KVCFG_PG flags:
(*fpXMLConfig)(pKVXML, KVCFG_PG_HIDEHIDDENSLIDE, TRUE, NULL);

To specify a password to open a password-protected file for export:
(*fpXMLConfig)(pKVXML, KVCFG_SETPASSWORD, TRUE, password);
where password is a null-terminated string of 255 or fewer characters.
XML Export SDK C Programming Guide
•
•
• 211
•
•
•
Chapter 8 XML Export API Functions

To include a position element in the markup for PDF documents:
(*fpXMLConfig)(pKVXML, KVCFG_ENABLEPOSITIONINFO, TRUE, NULL);
Using the PDF position element significantly changes the generated markup.
For example, without the option, the XML output from a section of a PDF
document looks like this:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE VerityXMLExport (View Source for full doctype...)>
- <VerityXMLExport>
- <WP>
- <p id="p1" font-size="33pt">
<img src="ecpe.pdf38760.jpg" height="140px" width="292px" />
Economic Fiscal Update
<font size="18pt" color="#777777">Theand</font>
<font size="14pt" color="#ffffff">October 30, 2002</font>
<font size="29pt" color="#a4a4a4">Overview</font>
</p>
With the option enabled, the same section of the PDF document looks like
this:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE VerityXMLExport (View Source for full doctype...)>
- <VerityXMLExport>
- <WP>
<Position style="position:absolute;top:534px;left:254px;font-family:'Times New
Roman';font-size:33pt;white-space:nowrap;" />
<Position style="position:absolute;top:393px;left:254px;white-space:nowrap;" />
<img src="ecpe.pdf36000.jpg" height="140px" width="292px" />
<Position style="position:absolute;top:308px;left:256px;font-family:'Times New
Roman';font-size:33pt;white-space:nowrap;" />
Economic
<Position style="position:absolute;top:346px;left:256px;font-family:'Times New
Roman';font-size:33pt;white-space:nowrap;" />
Fiscal Update
<Position style="position:absolute;top:298px;left:281px;font-family:'Times New
Roman';font-size:18pt;color:#777777;background-color:#ffffff;white-space:nowrap;"
/>
The
<Position style="position:absolute;top:336px;left:299px;font-family:'Times New
Roman';font-size:18pt;color:#777777;background-color:#ffffff;white-space:nowrap;"
/>
and
<Position style="position:absolute;top:543px;left:397px;font-family:'Times New
Roman';font-size:14pt;color:#ffffff;background-color:#000000;white-space:nowrap;"
/>
October 30, 2004
•
•
212 ••
•
•
XML Export SDK C Programming Guide
KVXMLConfig()
<Position style="position:absolute;top:627px;left:382px;font-family:'Times New
Roman';font-size:29pt;color:#a4a4a4;background-color:#ffffff;white-space:nowrap;"
/>
Overview
XML Export SDK C Programming Guide
•
•
• 213
•
•
•
Chapter 8 XML Export API Functions
KVXMLConvertFile()
This function is called directly and converts a source file to an output file.
Syntax
BOOL pascal KVXMLConvertFile (
void
*pContext,
void
*pCallingContext,
char
*pInFileName,
char
*pOutFileName,
KVXMLTemplate
*pTemplates,
KVXMLOptions
*pOptions,
KVXMLTOCOptions
*pTOCCreateOptions,
KVXMLCallbacks
*pCallbacks,
BOOL
bIndex,
KVErrorCode
*pError)
Arguments
pContext
Pointer returned from fpInit().
pCallingContext
Pointer passed back to the callback functions.
pInFileName
Pointer to the input file.
pOutFileName
Pointer to the output file.
pTemplates
Pointer to the data structure KVXMLTemplate. It defines the
overall structure of the output. Individual elements within the
structure define the markup written at specific points in the
output stream. See “KVXMLTemplate” on page 262.
If this pointer is NULL, the default values for the structure are
used.
pOptions
Pointer to the data structure KVXMLOptions. It defines the
options that control the markup written in response to the
general style and attributes (font, color, and so on) of the
document. See “KVXMLOptions” on page 253.
If this pointer is NULL, the default values for the structure are
used.
•
•
214 ••
•
•
XML Export SDK C Programming Guide
KVXMLConvertFile()
pTOCCreateOptions
Pointer to the data structure KVXMLTOCOptions. It specifies
whether a heading is included in the table of contents. See
“KVXMLTOCOptions” on page 267.
If this pointer is NULL, the default values for the structure are
used.
pCallbacks
Pointer to the data structure KVXMLCallbacks. It is a
structure of functions that Export calls for specific, user-defined
purposes. See “KVXMLCallbacks” on page 247.
If callbacks are not used, then this can be NULL.
bIndex
Set this to TRUE to generate output with minimal markup and
without images. Since the generated output is minimized to
textual content, it is suitable for an indexing engine. If bIndex
is set to FALSE, embedded images in a document are
regenerated as separate files and stored in the output directory.
This can also be set through the bNoPictures member in the
template files.
pError
Pointer to an error code if the call to KVXMLConvertFile()
fails.
Returns

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE.

Only pContext, pInFileName, pOutFileName, and bIndex are required. All
other pointers should be NULL when they are not set.

If pCallbacks is NULL, pOptions->pszDefaultOutputDirectory must be
valid, except when bIndex is set to TRUE.

This function runs in-process or out of process. See “Convert Files Out of
Process” on page 43.

When converting out of process, this function must be called after the call to
KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession().
See “KVXMLStartOOPSession()” on page 221 and
“KVXMLEndOOPSession()” on page 217.

When converting out of process, the values for the KVXMLTemplate,
KVXMLOptions, and KVXMLTOCOptions structures should be set to NULL.
These structures are already passed in the call to KVXMLStartOOPSession().
See “KVXMLStartOOPSession()” on page 221.
Discussion
XML Export SDK C Programming Guide
•
•
• 215
•
•
•
Chapter 8 XML Export API Functions
Example
if(!(*KVXMLInt.KVXMLConvertFile)(
pKVXML,
/* Pointer returned by fpInit()
*/
NULL,
/* Pointer for callback functions */
&InputFile,
/* Input file
*/
&OutputFile,
/* Output file
*/
&XMLTemplates, /* Mark-up and related variables */
&XMLOptions,
/* Options
*/
NULL,
/* TOC options
*/
NULL,
/* Pointer to callback functions */
FALSE,
/* Index mode
*/
&error))
/* Error return value
*/
{
printf("Error converting %s to XML %d\n", argv[i - 1], error);
}
else
{
printf("Conversion of %s to XML completed.\n\n", argv[i - 1]);
}
•
•
216 ••
•
•
XML Export SDK C Programming Guide
KVXMLEndOOPSession()
KVXMLEndOOPSession()
This function terminates the current out-of-process conversion session, and
releases the source data and resources related to the session.
Syntax
BOOL pascal KVXMLEndOOPSession(
void
*pContext,
BOOL
bKeepServantAlive,
KVErrorCodeEx
*pError
DWORD
dwOptions,
void
*pReserved1,
void
*pReserved2 );
Arguments
pContext
Pointer returned from fpInit().
bKeepServantAlive
Set this to TRUE to keep a Servant process active after the
Export out-of-process session is terminated. If the Servant
remains active, subsequent conversion requests are
processed more quickly because the Servant is already
prepared to receive data.
Set this to FALSE to terminate the Export out-of-process
session and the associated Servant process.
pError
Pointer to an error code defined in KVErrorCodeEx in
kvtypes.h.
dwOptions
Reserved for future use.
pReserved1
Reserved for future use.
pReserved2
Reserved for future use.
Returns

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE.
Example
The following sample code is from the cnv2xmloop sample program:
XML Export SDK C Programming Guide
•
•
• 217
•
•
•
Chapter 8 XML Export API Functions
/* declare endsession function pointer */
BOOL (pascal *fpKVXMLEndOOPSession)( void
BOOL
,
KVErrorCode
*,
DWORD
,
void
*,
void
*);
*,
/* assign OOP endsession function pointer */
fpKVXMLEndOOPSession = (BOOL (pascal *)( void *,
BOOL
,
KVErrorCode
*,
DWORD
,
void
*,
void
* ))mpGetProcAddress(hKVXML,
"KVXMLEndOOPSession");
if(!fpKVXMLEndOOPSession)
{
printf("Error assigning KVXMLEndOOPSession() pointer\n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);
mpFreeLibrary(hKVXML);
return 8;
}
/********END OOP SESSION, DO NOT KEEP SERVANT ALIVE *********/
if(!(*fpKVXMLEndOOPSession)(pKVXML,
FALSE,
&error,
0,
NULL,
NULL))
{
printf("Error calling fpKVXMLEndOOPSession \n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);
(*KVXMLInt.fpShutDown)(pKVXML);
mpFreeLibrary(hKVXML);
return 10;
}
•
•
218 ••
•
•
XML Export SDK C Programming Guide
KVXMLSetStyleSheet()
KVXMLSetStyleSheet()
This function is called directly and is used to specify the full path and filename of
an external Style Sheet (XSL or CSS).
Syntax
BOOL pascal
void
char
char
KVXMLSetStyleSheet(
*pContext,
*pszStyleSheetName,
*pszRef);
Arguments
pContext
Pointer returned from fpInit().
pszStyleSheetName
Pointer to the full path and filename of the style sheet.
pszUrlRef
Pointer to the URL or filename of style sheet.
Returns

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE.

When the value for eStyleSheetType in KVXMLOptions is set to XML_XSL or
XML_CSS, an external style sheet is referenced by a processing instruction of
the form:
Discussion
<?xml-stylesheet href="pszRef" type="text/xsl"?>
or
<?xml-stylesheet href="pszRef" type="text/css"?>

If the value for pszStyleSheetName includes the output directory, the href
only consists of the filename since the XML output resides in the same
directory as the style sheet file.

If the value for pszStyleSheetName points to a directory other than the output
directory, the href consists of the full path and filename.

Style sheet information cannot be written to an external XSL file. XML Export
can only reference an existing XSL style sheet.
XML Export SDK C Programming Guide
•
•
• 219
•
•
•
Chapter 8 XML Export API Functions

When XML_CSS is specified, a CSS file can be created based on
pszStyleSheetName.

If the name of the CSS is not specified by using this function, a CSS style file
is created with an automatically-generated filename.

If this function is used to specify the name of the style file, that file is
referenced in the processing instruction.
 If the CSS file does not exist in the specified location, it is created.
 If it exists, but is empty, CSS styles are written to it.
 If the CSS file exists and is not empty, the file is not altered. There is no
attempt made to validate the file.
•
•
220 ••
•
•

If there are multiple calls made to fpConvertStream() or
KVXMLConvertFile(), and the name of the style sheet has been set using
KVXMLSetStyleSheet, the filename can be disabled by calling
KVXMLSetStyleSheet again with the pszStyleSheetName and pszRef set to
NULL. The filename can then be set to a different value by calling
KVXMLSetStyleSheet with the new filename prior to the next call to
fpConvertStream() or KVXMLConvertFile().

This function runs in-process or out of process. See “Convert Files Out of
Process” on page 43.

When converting out of process, this function must be called after the call to
KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession().
See “KVXMLStartOOPSession()” on page 221 and
“KVXMLEndOOPSession()” on page 217.
XML Export SDK C Programming Guide
KVXMLStartOOPSession()
KVXMLStartOOPSession()
This function performs the following:

Initializes the out-of-process session.

Specifies the input stream or file.

Sets conversion options in the KVXMLTemplate, KVXMLOptions, and
KVXMLTOCOptions data structures.

Creates a Servant process.

Establishes a communication channel between the application thread and the
Servant.

Sends the data to the Servant.
Syntax
BOOL pascal KVXMLStartOOPSession(
void
*pContext,
KVInputStream
*pInputStream,
char
*pFileName,
KVXMLTemplate
*pTemplates,
KVXMLOptions
*pOptions,
KVXMLTOCOptions
*pTOCCreateOptions
DWORD
*pPID,
KVErrorCode
*pError
DWORD
dwOptions,
void
*pReserved1,
void
*pReserved2 );
XML Export SDK C Programming Guide
•
•
• 221
•
•
•
Chapter 8 XML Export API Functions
Arguments
pContext
Pointer returned from fpInit().
pInputStream
Pointer to the developer-assigned instance of
KVInputStream. The structure KVInputStream defines
the input stream containing the source for the conversion.
If pInput is defined, then pFileName must be NULL. The
input data can be defined as a data stream or file, but not
both.
pFileName
Pointer to the file to be converted. The file must exist on the
same file system as the Servant.
If pFileName is defined, then pInput must be NULL. The
input data can be defined as a data stream or file, but not
both.
pTemplatesEx
Pointer to the data structure KVXMLTemplate. It defines the
overall structure of the output. Individual elements within the
structure define the markup written at specific points in the
output stream. See “KVXMLTemplate” on page 262.
If this pointer is NULL, the default values for the structure are
used.
pOptionsEx
Pointer to the data structure KVXMLOptions. It defines the
options that control the markup written in response to the
general style and attributes (font, color, and so on) of the
document. See “KVXMLOptions” on page 253.
If this pointer is NULL, the default values for the structure are
used.
pTOCCreateOptions
Pointer to the data structure KVXMLTOCOptions. It specifies
whether a heading is included in the table of contents. See
“KVXMLTOCOptions” on page 267.
If this pointer is NULL, the default values for the structure are
used.
•
•
222 ••
•
•
pPID
Address of a DWORD into which the Servant process ID is
returned.
pError
Pointer to an error code defined in KVErrorCode in
kvtypes.h.
dwOptions
Reserved for future use.
pReserved1
Reserved for future use.
pReserved2
Reserved for future use.
XML Export SDK C Programming Guide
KVXMLStartOOPSession()
Returns

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE.
Discussion
 After the out-of-process session is started successfully, all conversion
functions can be called. The data is then processed on the Servant until the
session is terminated by a call to KVXMLEndOOPSession().
 All functions that can run out of process must be called within the
out-of-process session, that is, after the call to KVXMLStartOOPSession(),
and before the call to KVXMLEndOOPSession().

The KVXMLConvertFile(), and fpGetSummary() functions can only be
called once in a single out-of-process session.

Since the KVXMLTemplate, KVXMLOptions, and KVXMLTOCOptions data
structures are passed by this function, the same pointers in the call to
KVXMLConvertFile() are ignored.
Example
The following sample code is from the cnv2xmloop sample program:
/* declare OOP startsession function pointer */
BOOL (pascal *fpKVXMLStartOOPSession)( void
*,
KVInputStream
*,
char
*,
KVXMLTemplate
*,
KVXMLOptions
*,
KVXMLTOCOptions
*,
DWORD
*,
KVErrorCode
*,
DWORD
,
void
*,
void
* );
/* assign OOP startsession function pointer */
fpKVXMLStartOOPSession = (BOOL (pascal *)( void
KVInputStream
*,
char
*,
KVXMLTemplate
*,
KVXMLOptions
*,
KVXMLTOCOptions
*,
DWORD
*,
KVErrorCode
*,
XML Export SDK C Programming Guide
*,
•
•
• 223
•
•
•
Chapter 8 XML Export API Functions
DWORD
,
void
*,
void
* ))mpGetProcAddress(hKVXML,
"KVXMLStartOOPSession");
if(!fpKVXMLStartOOPSession)
{
printf("Error assigning KVXMLStartOOPSession() pointer\n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);
mpFreeLibrary(hKVXML);
return 7;
}
/********START OOP SESSION *****************/
if(!(*fpKVXMLStartOOPSession)(pKVXML,
&Input,
NULL,
&XMLTemplates,
/* Mark-up and related variables */
&XMLOptions,
/* Options */
NULL,
/* TOC options */
&oopServantPID,
&error,
0,
NULL,
NULL))
{
printf("Error calling fpKVXMLStartOOPSession \n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);
(*KVXMLInt.fpShutDown)(pKVXML);
mpFreeLibrary(hKVXML);
return 9;
}
•
•
224 ••
•
•
XML Export SDK C Programming Guide
CHAPTER 9
XML Export API Callback
Functions
This section describes the XML Export API callback functions. It contains the
following topics:

Introduction

Continue()

GetAnchor()

GetAuxOutput()

UserCB()
XML Export SDK C Programming Guide
•
•
• 225
•
•
•
Chapter 9 XML Export API Callback Functions
Introduction
The fpConvertStream() and KVXMLConvertFile() functions enable you to
specify a callback function. A callback function controls the conversion while it is
in progress. For example, you can specify a callback function to report progress
during the conversion.
To use the API callback functions, declare one or more instances of the
KVXMLCallbacks structure (see “KVXMLCallbacks” on page 247). Each
member of this instance may then be initialized by assigning a function pointer to
the application-defined callback functions, cast to the appropriate function
prototype. Each instance of KVXMLCallbacks may define unique callback
functions. Alternatively, the functions may be common to all instances of
KVXMLCallbacks; these functions will take appropriate action, depending on the
value of the pointer pCallingContext.
The second parameter (pCallingContext) of the call to
fpConvertStream() and KVXMLConvertFile() provides a void pointer used
to identify the context of this call. If more than one call to fpConvertStream()
or KVXMLConvertFile() is made within a single application, any resulting
callbacks are identified by the first parameter of the callback function. This allows
the callback function to take any appropriate action, depending on which calling
context is returned.
The seventh parameter (pCallbacks) of the call to fpConvertStream() and
KVXMLConvertFile() must be set to the address of the KVXMLCallbacks
structure to be used for this call.
For sample code, see the sample program xmlcallback.c. It creates an XML
stream and demonstrates the use of the callback functions.
•
•
226 ••
•
•
XML Export SDK C Programming Guide
Continue()
Continue()
When fpConvertStream() or KVXMLConvertFile() is called control is not
returned to the application until the entire document is processed. This callback
function provides a means of monitoring progress and terminating the conversion
process before the conversion is completed.
Syntax
BOOL (pascal *Continue) (
void
*pCallingContext,
int
nPercentComplete);
Arguments
pCallingContext
Pointer passed back to the caller-provided callback functions.
This pointer, which may be NULL, is specified as the second
parameter of the call to fpConvertStream() and
KVXMLConvertFile().
nPercentComplete
Approximate percentage of the current conversion that is
completed.
You can monitor the progress of the conversion by checking
the value of nPercentDone, which indicates how many
blocks out of the total number of blocks have been processed.
Returns

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE. Processing is halted.

There is a callback to this function for every entry that appears in the
generated table of contents.

The application is free to execute any required code in the callback function,
with the exception of fpShutDown().
Discussion
XML Export SDK C Programming Guide
•
•
• 227
•
•
•
Chapter 9 XML Export API Callback Functions
GetAnchor()
This function gets the filename automatically generated by Export and used for
external graphics referenced with <a xmlns:xlink= xlink href=> tags,
heading-level table of contents entries and external files (such as, CSS files and
revision summary files).
Syntax
BOOL (pascal *GetAnchor)
void
KVXMLAnchorType
char
int
BYTE
UINT
(
*pCallingContext,
eAnchorType,
*pszAnchor,
cbAnchorMax,
*pcHTML,
cbHTML);
Arguments
pCallingContext
Pointer that gets passed back to the caller-provided callback
functions. This pointer, which may be NULL, is specified as the
second parameter of the call to fpConvertStream().
eAnchorType
The anchor type for the output stream. It must be one of the
enumerated types defined in KVXMLAnchorType. See
“KVXMLAnchorType” on page 278.
pszAnchor
Pointer to the location where the new anchor is stored.
cbAnchorMax
Maximum number of bytes to place in pszAnchor.
pcHTML
This is either NULL or a pointer to one of the following:
 markup defining the contents of a table of contents entry
 the external graphic filename
 the external filename
cbHTML
Number of valid bytes in pcHTML.
Returns
•
•
228 ••
•
•

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE. Processing is halted.
XML Export SDK C Programming Guide
GetAnchor()
Discussion

If this callback is NULL, default anchor names are generated. The generated
names are unique across the document.

This function is called once per block, block chunk, graphic anchor, or extra
file. Any required code may be executed here as long as a unique value for
pszAnchor is assigned. If this string is not unique, an existing file may be
overwritten, producing undesirable results. The callback function should
contain the functionality to verify whether files already exist.

If you want to specify graphic anchor names, but use default anchor names for
all other anchors, provide the graphic names when eAnchorType is
VectorPictureAnchor or RasterPictureAnchor. For all other anchor
types, call with the same parameters you were passed.

pszAnchor must be assigned. It may be derived from the cbAnchorMax,
pcHTML, and cbHTML values, which are also provided.

pcHTML may be null if the graphic is an internal part of the document.
XML Export SDK C Programming Guide
•
•
• 229
•
•
•
Chapter 9 XML Export API Callback Functions
GetAuxOutput()
This callback function allows the calling application to specify an auxiliary output
stream for a block or graphic.
Syntax
BOOL (pascal *GetAuxOutput)
void
KVXMLAnchorType
char
KVOutputStream
(
*pCallingContext,
eAnchorType,
*pszAnchor,
*pNewOutput);
Arguments
pCallingContext
Pointer passed back to the caller-provided callback functions.
This pointer, which may be NULL, is specified as the second
parameter of the call to fpConvertStream().
eAnchorType
Graphic or block anchor as defined by the enumerated types in
KVXMLAnchorType. See “KVXMLAnchorType” on page 278.
pszAnchor
Pointer to location where a new anchor is stored. pszAnchor
is based on the call to GetAnchor().
pNewOutput
Pointer to a KVOutputStream structure that may be used to
write data to the current block.
Returns

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE. Processing is halted.

If GetAuxOutput() is NULL, the pszDefaultOutputDirectory member
of the instance of KVXMLOptions is used as the base storage location for
auxiliary output files. If pszDefaultOutputDirectory is also NULL,
auxiliary files are placed in the current working directory. See “KVXMLOptions”
on page 253

For each pszAnchor provided, create (malloc) an appropriate I/O structure.
Assign pNewOutput->pOutputStreamPrivateData to point to that
structure. Each remaining member of the KVOutputStream should then be
initialized by assigning a function pointer to the additional application-defined
Discussion
•
•
230 ••
•
•
XML Export SDK C Programming Guide
GetAuxOutput()
functions, cast to the appropriate function prototype for Create(), Write(),
Seek(), Tell(), and Close(). Memory allocated to the I/O structure must
be tracked and may be freed up within the call to Close(). See the
callback.c sample program.
XML Export SDK C Programming Guide
•
•
• 231
•
•
•
Chapter 9 XML Export API Callback Functions
UserCB()
This callback function is triggered by including the $USERCB token in a member of
KVXMLTemplate. For example, placing “$USERCB=my_callback “ in
pszFirstH1Start results in a callback at the point when pszFirstH1Start is
processed. The user callback function is identified by the text assigned to
$USERCB, which in this example is my_callback. This identifier is passed to the
argument pszUserCBid.
Syntax
BOOL (pascal *UserCB)
void
char
KVOutputStream
void
(
*pCallingContext,
*pszUserCBid,
*pNewOutput
*pReserved);
Arguments
pCallingContext
Pointer that gets passed back to the caller-provided callback
function. This pointer, which may be NULL, is specified as the
second parameter of the call to fpConvertStream().
pszUserCBid
Pointer to a string that identifies the source of the callback. The
identifier must be delimited by a trailing white space. For
example, "my_callback ".
pNewOutput
Pointer to a KVOutputStream structure that can be used to
write data to the current block.
pReserved
Reserved for future use.
Returns
•
•
232 ••
•
•

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE. Processing is halted.
XML Export SDK C Programming Guide
CHAPTER 10
XML Export API Structures
This section provides information on the structures used by the XML Export API.
These structures are defined in kvxml.h, kvtypes.h, and adinfo.h. It contains
the following topics:
 ADDOCINFO
 KVInputStream
 KVMemoryStream
 KVOutputStream
 KVSTR
 KVStreamInfo
 KVStructHead
 KVStyle
 KVSumInfoElemEx
 KVSummaryInfoEx
 KVXConfigInfo
 KVXMLCallbacks
 KVXMLHeadingInfo
 KVXMLInterface
 KVXMLOptions
 KVXMLTemplate
 KVXMLTOCOptions
XML Export SDK C Programming Guide
•
•
• 233
•
•
•
Chapter 10 XML Export API Structures
ADDOCINFO
This structure provides the format, file class, and version number of the source
document. It is defined in adinfo.h, and is initialized by calling the function
fpGetStreamInfo(). See “fpGetStreamInfo()” on page 196.
typedef struct
{
ENdocClass
eClass;
ENdocFmt
eFormat;
long
lVersion;
unsigned long
ulAttributes;
} ADDOCINFO, *ADDOCINFOPTR;
Member Descriptions
eClass
Source document’s file class (for example, spreadsheet, word
processor or encapsulation format) as defined by the enumerated
type ENDocClass.
eFormat
Source document’s major format (for example Microsoft Word XML
format, or Corel Presentation) as defined by the enumerated type
ENdocFmt in adinfo.h. The ENdocFmt type provides a unique ID
for each major format.
lVersion
Version number of the file format. The number is multiplied by 1,000,
so, for example, 1.02 is represented by 1020.
ulAttributes
Other attributes of the document as defined by the enumerated type
ENdocAttributes.
Discussion
As format detection is enhanced in future releases, new format IDs may be added
to the ENdocFmt enumerated type. When using this type, your code should ensure
binary compatibility with future releases. For example, if you use an array to
access format information based on a format ID, your code should check the
format ID is less than Max_Fmt before accessing the data. This ensures new
format codes are detected when you add KeyView binary files from new releases
to your existing installation.
•
•
234 ••
•
•
XML Export SDK C Programming Guide
KVInputStream
KVInputStream
This structure defines an input stream for the XML conversion.
typedef struct tag_InputStream
{
void *pInputStreamPrivateData;
long lcbFilesize;
BOOL (pascal *fpOpen) (struct tag_InputStream
UINT (pascal *fpRead) (struct tag_InputStream
BOOL (pascal *fpSeek) (struct tag_InputStream
long (pascal *fpTell) (struct tag_InputStream
BOOL (pascal *fpClose)(struct tag_InputStream
}
KVInputStream;
*);
*, BYTE *, UINT);
*, long, int);
*);
*);
Member Descriptions
All member functions are equivalent to their counterparts in the ANSI standard
library, except fpOpen(), which returns FALSE on failure. On fpOpen(), if the
size of the stream is known, assign that value to lcbFilesize. Otherwise, set
lcbFilesize to 0.
XML Export SDK C Programming Guide
•
•
• 235
•
•
•
Chapter 10 XML Export API Structures
KVMemoryStream
This structure defines an optional memory allocator to be used by XML Export. It
is initialized by calling fpInit(). See “fpInit()” on page 199.
typedef struct tag_MemoryStream
{
void
*pMemoryStreamPrivateData;
void * (pascal *fpMalloc)(struct tag_MemoryStream*,size_t);
void
(pascal *fpFree) (struct tag_MemoryStream*, void *);
void * (pascal *fpRealloc)(struct tag_MemoryStream*,void *,
size_t);
void * (pascal *fpCalloc)(struct tag_MemoryStream*, size_t,
size_t);
}
KVMemoryStream;
Member Descriptions
All member functions are equivalent to their counterparts in the ANSI standard
library.
Discussion
•
•
236 ••
•
•

fpRealloc() must handle a NULL pointer.

For systems that do not support fpRealloc(), refer to the xmlcallback
sample program, which demonstrates how to use the memory management
features.

If KVMemoryStream is not provided, then the default C run-time memory
allocation is used.
XML Export SDK C Programming Guide
KVOutputStream
KVOutputStream
This structure defines an output stream for the XML conversion.
typedef struct tag_OutputStream
{
void *pOutputStreamPrivateData;
BOOL (pascal *fpCreate)(struct tag_OutputStream
UINT (pascal *fpWrite) (struct tag_OutputStream
BOOL (pascal *fpSeek) (struct tag_OutputStream
long (pascal *fpTell) (struct tag_OutputStream
BOOL (pascal *fpClose) (struct tag_OutputStream
}
KVOutputStream;
*,TCHAR *);
*, BYTE *, UINT);
*, long, int);
*);
*);
Member Descriptions
All member functions are equivalent to their counterparts in the ANSI standard
library.
XML Export SDK C Programming Guide
•
•
• 237
•
•
•
Chapter 10 XML Export API Structures
KVSTR
This structure is used to identify string types (string text and byte count) for the
first three members of KVStyle. See “KVStyle” on page 241.
typedef struct tag_KVSTR
{
char
*pcString;
int
cbString;
}
KVSTR;
Member Descriptions
•
•
238 ••
•
•
pcString
Text string.
cbString
Length of pcString, excluding the terminating NULL(s). This allows
UNICODE or double bytes to be employed.
XML Export SDK C Programming Guide
KVStreamInfo
KVStreamInfo
This structure defines a document’s character set and format. The structure is
initialized by calling the function fpGetStreamInfo(). See “fpGetStreamInfo()” on
page 196.
typedef struct tag_KVStreamInfo
{
KVCharSet
charset;
ADDOCINFO
adInfo;
}
KVStreamInfo;
Member Descriptions
charset
Character set of the source document, if that information is ascertainable.
This member is an integer corresponding to the KVCharSet enumerated
type in kvtypes.h.
adInfo
File class, major format, and version of the source document. Pointer to
the ADDOCINFO structure. The structure of ADDOCINFO is defined in
adinfo.h. See “ADDOCINFO” on page 234.
 adInfo.eClass represents the source document’s class as defined
by the enumerated type ENDocClass.
 adInfo.eFormat represents the source document’s format as
defined by the enumerated type ENdocFmt.
 adInfo.lVersion represents the version number of the file format.
The number is multiplied by 1,000, so, for example, 1.02 is
represented by 1020.
 adInfo.ulAttributes represents other attributes of the document
as defined by the enumerated type ENdocAttributes.
Discussion
As format detection is enhanced in future releases, new format IDs may be added
to the ENdocFmt enumerated type. When using this type, your code should ensure
binary compatibility with future releases. For example, if you use an array to
access format information based on a format ID, your code should check the
format ID is less than Max_Fmt before accessing the data. This ensures new
format codes are detected when you add KeyView binary files from new releases
to your existing installation.
XML Export SDK C Programming Guide
•
•
• 239
•
•
•
Chapter 10 XML Export API Structures
KVStructHead
This structure contains the current KeyView version number and is the first
member of other structures. It enables Autonomy to modify the structures in future
releases, but to maintain backward compatibility. Before initializing a structure that
contains the KVStructHead structure, use the macro KVStructInit to initialize
KVStructHead. The structure and macro are defined in kvtypes.h.
typedef struct _KVStructHead
{
WORD
version;
WORD
size;
DWORD
reserved;
void
*internal;
} KVStructHeadRec, *KVStructHead;
Member Descriptions
version
The current KeyView version number. This is a symbolic constant
(KeyviewVersion) defined in kvxtract.h. This constant will
be updated for each KeyView release.
size
The size of the KVStructHeadRec.
reserved
Reserved for internal use.
internal
Reserved for internal use.
Example
KVStructInit(&openArg);
•
•
240 ••
•
•
XML Export SDK C Programming Guide
KVStyle
KVStyle
This structure defines the style mapping support for KVSTR-defined styles. The
first three members of KVStyle are KVSTR structures (see “KVSTR” on page 238).
Each KVSTR structure contains the text string and byte count for StyleName,
MarkUpStart, and MarkUpEnd. The structure is initialized by calling the function
fpSetStyleMapping().
See “fpSetStyleMapping()”. See “Map Styles” on page 104 for more information
on mapping styles.
XML Export supports both paragraph styles and character styles. It works on the
assumption that each style has a unique name. Only one paragraph style may be
active at one time; therefore, the opening of a new paragraph style automatically
closes the previous paragraph style. By contrast, several character styles may be
active at once. When XML Export receives an EndCharStyle token from the
format parser, the most recent character style is terminated.
typedef struct tag_KVStyles
{
KVSTR
StyleName;
KVSTR
MarkUpStart;
KVSTR
MarkUpEnd;
DWORD
dwFlags;
}
KVStyle;
XML Export SDK C Programming Guide
•
•
• 241
•
•
•
Chapter 10 XML Export API Structures
Member Descriptions
StyleName
The name of the word processing style (for example, “Heading
1”) to which style mapping applies. A pointer to the KVSTR
structure. See “KVSTR” on page 238.
Style names are case sensitive.
MarkUpStart
The markup added to the beginning of a paragraph or character
style. A pointer to the KVSTR structure. See “KVSTR” on
page 238.
MarkUpEnd
The markup added to the end of a paragraph or character style.
A pointer to the KVSTR structure. See “KVSTR” on page 238.
dwFlags
Instructions on how to process the content associated with a
paragraph or character style. The flag can be one of the types
defined in kvtypes.h. They are described in Table 10 on
page 107.
The value associated with each flag is a hexadecimal number.
You can set an option by either entering the converted decimal
value or entering the flag’s text (for example, KVSTYLE_PRE)
The value of Flags in the template files is passed to this
member of KVStyle.
Discussion
•
•
242 ••
•
•

This structure applies to word processing documents only.

By default, XML Export maps the heading style “Heading 1” to <h1></h1>,
and so on, for heading levels 1 through 6. If you use style mappings, the
default mapping is overridden. Therefore, you must supply markup for all
heading levels.

When the user-defined markup in KVStyle conflicts with other markup
generated by XML Export, the user-defined markup takes precedence.
XML Export SDK C Programming Guide
KVSumInfoElemEx
KVSumInfoElemEx
This structure defines the individual metadata elements.
typedef struct tag_KVSumInfoElemEx
{
int
isValid;
KVSumInfoType
type;
void
*data;
char
*pcType;
}
KVSumInfoElemEx;
Member Descriptions
isValid
Specifies whether the data value is present in the document. The setting
1 specifies the value is valid and exists.
type
Data type of the metadata element. The types are defined in the
structure KVSumInfoType in kvtypes.h. See “KVSumInfoType” on
page 285.
data
The content of the metadata field.
If the type member is KV_Int4 or KV_Bool, then this member
contains the actual value. Otherwise, this member is a pointer to the
actual value.
KV_DateTime and KV_IEEE8 point to an 8-byte value.
KV_String and KV_Unicode point to the beginning of the string
containing the text.
pcType
XML Export SDK C Programming Guide
Pointer to the name of the metadata field.
•
•
• 243
•
•
•
Chapter 10 XML Export API Structures
KVSummaryInfoEx
This structure provides a count of the number of metadata elements in a
document, and a pointer to the first element of the array of elements. The
structure is initialized by calling the function fpGetSummaryInfo(). See
“fpGetSummaryInfo()” on page 197.
typedef struct tag_KVSummaryInfoEx
{
int
nElem;
KVSumInfoElemEx
*pElem;
}
KVSummaryInfoEx;
Member Descriptions
•
•
244 ••
•
•
nElem
Number of metadata elements contained in the array. nElem may be zero.
This indicates that the document did not contain metadata, such as an
ASCII text document.
pElem
Points to the first element of the array of document metadata elements
defined by the structure KVSumInfoElemEx. See “KVSumInfoElemEx” on
page 243.
XML Export SDK C Programming Guide
KVXConfigInfo
KVXConfigInfo
This structure defines the document type of a source XML file, and the element
extraction settings for that type. The settings can be applied based on the file
format ID, or the file’s root element. This structure is in kvtypes.h and is
initialized by calling the function KVXMLConfig(). See “Convert XML Files” on
page 120 and “KVXMLConfig()” on page 205.
typedef struct TAG_KVXConfigInfo
{
ENdocFmt
eKVFormat;
char*
pszRoot;
char*
pszInMeta;
char*
pszExMeta;
char*
pszInContent;
char*
pszExContent;
char*
pszInAttribute;
}
KVXConfigInfo;
Member Descriptions
eKVFormat
The format ID as detected by the KeyView detection module. This
determines the file type to which these extraction settings apply.
The format ID is defined by the enumerated type ENdocFmt. See
“File Format Detection” on page 347 for more information on
format ID values.
If you are adding configuration settings for a custom XML
document type, this is not defined.
pszRoot
The file’s root element. When the format ID is not defined, the root
element is used to determine the file type to which these settings
apply.
To further qualify the element, specify its namespace. See
“Specify an Element’s Namespace and Attribute” on page 125.
pszInMeta
The elements extracted from the file as metadata. All other
elements are extracted as text. Multiple entries must be
separated by commas.
To further qualify the element, specify its namespace and/or
attributes. See “Specify an Element’s Namespace and Attribute”
on page 125.
XML Export SDK C Programming Guide
•
•
• 245
•
•
•
Chapter 10 XML Export API Structures
pszExMeta
The child elements in the included metadata elements that are
not extracted from the file as metadata. For example, the default
extraction settings for the Visio XML format, extracts the
DocumentProperties element as metadata. This element
includes child elements such as Title, Subject, Author,
Description, and so on. However, the child element
PreviewPicture is defined in pszExMeta because it is binary
data and should not be extracted.
You cannot exclude any metadata elements from the output for
StarOffice files. All metadata is extracted regardless of this
setting.
To further qualify the element, specify its namespace and/or
attributes. See “Specify an Element’s Namespace and Attribute”
on page 125.
pszInContent
The elements extracted from the file as content text. An asterisk
(*) extracts all elements including child elements.
To further qualify the element, specify its namespace and/or
attributes. See “Specify an Element’s Namespace and Attribute”
on page 125.
pszExContent
The child elements in the included content elements that are not
extracted from the file as content text.
To further qualify the element, specify its namespace and/or
attributes. See “Specify an Element’s Namespace and Attribute”
on page 125.
pszInAttribute
The attribute values extracted from the file. If attributes are not
defined, attribute values are not extracted. The namespace (if
used), element name and attribute name must be defined in the
following format:
namespace:[email protected]
For example:
Autonomy:[email protected]
•
•
246 ••
•
•
XML Export SDK C Programming Guide
KVXMLCallbacks
KVXMLCallbacks
This structure provides all callbacks that can result from a call to
fpConvertStream() or KVXMLConvertFile(). See “fpConvertStream()” on
page 186 and “KVXMLConvertFile()” on page 214. Any and all of the function
pointers can be NULL.
typedef BOOL (pascal
void
int
typedef BOOL (pascal
void
KVXMLAnchorType
char
Int
BYTE
UINT
typedef BOOL (pascal
void
KVXMLAnchorType
char
KVOutputStream
typedef BOOL (pascal
void
char
KVOutputStream
void
*KVXMLCB_CONTINUE)(
*pcallingContext,
nPercentDone);
*KVXMLCB_GETANCHOR)(
*pCallingContext,
eAnchorType,
*pszAnchor,
cbAnchorMax,
*pcHTML,
cbHTML);
*KVXMLCB_GETAUXOUTPUT)(
*pCallingContext,
eAnchorType,
*pszAnchor,
*pNewOutput);
*KVXMLCB_USERCB) (
*pCallingContext,
*psUserCBid,
*pOutput,
*pReserved);
typedef struct tag_KVXMLCallbacks
{
KVXMLCB_CONTINUE
fpContinue;
KVXMLCB_GETANCHOR
fpGetAnchor;
KVXMLCB_GETAUXOUTPUT
fpGetAuxOutput;
KVXMLCB_USERCB
fpUserCB;
}
KVXMLCallbacks;
Member Descriptions

The members of this structure are function pointers to the functions described
in “XML Export API Callback Functions” on page 225.

If fpGetAuxOutput() is NULL, the pszDefaultOutputDirectory member of
the instance of KVXMLOptions is used as the base storage location for
auxiliary output files. If pszDefaultOutputDirectory is also NULL, auxiliary
files are placed in the current working directory. See “KVXMLOptions” on
page 253.
XML Export SDK C Programming Guide
•
•
• 247
•
•
•
Chapter 10 XML Export API Structures
KVXMLHeadingInfo
This structure defines how XML Export creates heading information based on the
source document’s content and attributes. Source text is converted to a heading
and included in the table of contents if

it meets all the criteria defined by this structure, and

the headingCreateType member of KVXMLTOCOptions is set to allow
automatic heading generation.
XML Export evaluates the text against each member in the order in which the
members appear below.
See “KVXMLTOCOptions” on page 267 for more information on automatic
generation of headings.
typedef struct tag_KVXMLHeadingInfo
{
int
minParaLen;
int
maxParaLen;
int
fontSizeMin;
int
fontSizeMax;
BOOL
bMustBeBold;
BOOL
bMustBeItalic;
BOOL
bMustBeUnderlined;
BOOL
bNonZeroIndent;
BOOL
bNoTabs;
BOOL
bNoMultiSpaces;
int
nSpaceBefore;
int
nSpaceAfter;
}
KVXMLHeadingInfo;
•
•
248 ••
•
•
XML Export SDK C Programming Guide
KVXMLHeadingInfo
Member Descriptions
minParaLen
The minimum number of characters that a paragraph in the
source document can contain for the text to meet the criteria for
heading conversion.
Applies to word processing documents only.
The default is 3 for heading levels 1 to 3.
maxParaLen
The maximum number of characters that a paragraph in the
source document can contain for the text to meet the criteria for
heading conversion.
Applies to word processing documents only.
The default is 80 for heading levels 1 to 3.
fontSizeMin
The minimum font size of text in the source document for the
text to meet the criteria for heading conversion.
The default is 14 for heading level 1, and 12 for heading levels
2 and 3.
fontSizeMax
The maximum font size of text in the source document for the
text to meet the criteria for heading conversion.
The default is 20 for heading level 1, and 14 for heading levels
2 and 3.
bMustBeBold
If this is set to TRUE, the text in the source document must be
bold to meet the criteria for heading conversion.
The default is TRUE for heading levels 1 and 2, and FALSE for
heading level 3.
bMustBeItalic
If this is set to TRUE, the text in the source document must be
italic to meet the criteria for heading conversion.
The default is FALSE.
bMustBeUnderlined
If this is set to TRUE, the text in the source document must be
underlined to meet the criteria for heading conversion.
The default is FALSE.
bNonZeroIndent
If this is set to TRUE, the text in the source document must be
indented to meet the criteria for heading conversion. If set to
FALSE, the text must be aligned left.
The default is FALSE.
bNoTabs
If this is set to TRUE, the text in the source document must not
contain tabs to meet the criteria for heading conversion.
The default is FALSE.
XML Export SDK C Programming Guide
•
•
• 249
•
•
•
Chapter 10 XML Export API Structures
bNoMultiSpaces
If this is set to TRUE, the text in the source document must not
contain two or more contiguous white spaces to meet the
criteria for heading conversion.
The default is FALSE.
nSpaceBefore
The amount of space in TWIPS (20th of a point) that must come
before a paragraph in the source document for the text to meet
the criteria for heading conversion. If –1 is used, the amount of
space before the paragraph is not considered in the heading
generation.
The default is 0.
nSpaceAfter
The amount of space in TWIPS (20th of a point) that must follow
a paragraph in the source document for the text to meet the
criteria for heading conversion. If –1 is used, the amount of
space after the paragraph is not considered in the heading
generation.
The default is 0.
•
•
250 ••
•
•
XML Export SDK C Programming Guide
KVXMLInterface
KVXMLInterface
The members of this structure are pointers to the API functions described in “XML
Export API Functions” on page 183.
typedef void* (pascal *KVXML_INIT) (
KVMemoryStream
*pMemAllocator,
char
*pszKeyViewDir,
char
*pszDataFile,
KVErrorCode
*pError,
DWORD
dWord);
typedef void (pascal *KVXML_SHUTDOWN)(void*);
typedef BOOL (pascal *KVXML_CONVERT_STREAM) (
void *pContext,
void
*pCallingContext,
KVInputStream
*pInput,
KVOutputStream
*pOutput,
KVXMLTemplate
*pTemplates,
KVXMLOptions
*pOptions,
KVXMLTOCOptions
*pTOCCreateOptions,
KVXMLCallbacks
*pCallbacks,
BOOL
bIndex,
KVErrorCode
*pError);
typedef char** (pascal *KVXML_GET_FILE_LIST)(
void
*pContext,
int
*pnSize );
typedef BOOL (pascal *KVXML_GET_STREAM_INFO)(
void
*pContext,
KVInputStream
*pInput,
KVStreamInfo
*pStreamInfo );
typedef BOOL (pascal *KVXML_GET_ANCHOR) (
void
*pCallingContext,
KVXMLAnchorType
eAnchorType,
char
*pszAnchor,
int
cbAnchorMax,
BYTE
*pcHTML,
UINT
cbHTML);
typedef BOOL (pascal *KVXML_INPUTSTREAM_CREATE) (
void
*pContext,
char
*pszFileName,
KVInputStream
*pInput);
typedef BOOL (pascal *KVXML_INPUTSTREAM_FREE) (
void
*pContext,
KVInputStream
*pInput);
typedef BOOL (pascal *KVXML_OUTPUTSTREAM_CREATE) (
void
*pContext,
char
*pszFileName,
KVOutputStream
*pOutput );
XML Export SDK C Programming Guide
•
•
• 251
•
•
•
Chapter 10 XML Export API Structures
typedef BOOL (pascal
void
KVOutputStream
typedef KVLanguageID
typedef BOOL (pascal
void
KVInputStream
KVSummaryInfoEx
BOOL
typedef BOOL (pascal
void
KVStyle
int
BOOL
typedef BOOL (pascal
void *pContext,
KVOutputStream
KVXMLTemplate
KVXMLOptions
KVXMLTOCOptions
KVXMLCallbacks
KVMemoryStream
*KVXML_OUTPUTSTREAM_FREE)(
*pContext,
*pOutput );
(pascal *KVXML_LANGUAGE_ID)(void *pContext);
*KVXML_GET_SUMMARY_INFO)(
*pContext,
*pInput,
*pSummary,
bFree );
*KVXML_SET_STYLE_MAPPING) (
*pContext,
*pStyles,
iStyles,
bCopy);
*KVXML_VALIDATE_TEMPLATE)(
*pOutput,
*pTemplate,
*pOptions,
*pTOCOptions,
*pCallBalls,
*pMemStream)
typedef struct tag_KVXMLInterface
{
KVXML_INIT
KVXML_SHUTDOWN
KVXML_CONVERT_STREAM
KVXML_GET_FILE_LIST
KVXML_GET_STREAM_INFO
KVXML_GET_ANCHOR
KVXML_INPUTSTREAM_CREATE
KVXML_INPUTSTREAM_FREE
KVXML_OUTPUTSTREAM_CREATE
KVXML_OUTPUTSTREAM_FREE
KVXML_GET_SUMMARY_INFO
KVXML_SET_STYLE_MAPPING
KVXML_VALIDATE_TEMPLATE
}
KVXMLInterface;
fpInit;
fpShutDown;
fpConvertStream;
fpGetConvertFileList;
fpGetStreamInfo;
fpGetAnchor;
fpFileToInputStreamCreate;
fpFileToInputStreamFree;
fpFileToOutputStreamCreate;
fpFileToOutputStreamFree;
fpGetSummaryInfo;
fpSetStyleMapping;
fpValidateTemplate;
Member Descriptions
•
•
252 ••
•
•

The members of this structure are function pointers to the functions described
in “XML Export API Functions” on page 183.

KVXML_VALIDATE_TEMPLATE is currently not implemented.
XML Export SDK C Programming Guide
KVXMLOptions
KVXMLOptions
This structure defines the options that control the XML markup written in response
to the general style and attributes (font, color, and so on) of the document. The
structure is initialized by calling the function fpConvertStream() or
KVXMLConvertFile(). See “fpConvertStream()” on page 186 or
“KVXMLConvertFile()” on page 214.
typedef struct tag_KVXMLOptions
{
BOOL
bUseVerityDTD;
char
*pszVerityDTDPath;
KVXMLStyleSheetType
eStyleSheetType
BOOL
bUseExistingStyleSheet;
char
*pszStyleSheet;
BOOL
bIndexOnly;
KVCharSet
eOutputCharSet;
BOOL
bForceOutputCharSet;
KVCharSet
eSrcCharSet;
BOOL
bForceSrcCharSet;
KVLanguageID
eOutputLanguageID;
BOOL
bUseDocumentColors;
BOOL
bUseDocumentFontInfo;
BOOL
bNbspEmptyCells;
ENSATableBorder
eSATableBorder;
int
nTableBorderWidth;
char
*pszBaseURL;
char
*pszMainURL;
char
*pszDefaultOutputDirectory;
char
*pszPicPath;
char
*pszPicURL;
char
*pszJavaURL;
BOOL
bRemoveFileNameSpaces;
BOOL
bRasterizeFiles
KVXMLGraphicType
eOutputRasterGraphicType;
KVXMLGraphicType
eOutputVectorGraphicType;
int
cxVectorToRasterXRes;
int
cyVectorToRasterYRes;
int
nCompressionQuality;
BOOL
bGenerateURLs;
long
lcbMaxMemUsage;
BYTE
cReplaceChar;
BYTE
cRedact;
KVXMLEmptyParaType
eEmptyParaType;
KVXMLHardPageBreakType eHardPageBreakType;
BOOL
bSupportColumnHeadings;
BOOL
bSupportRowHeadings;
BOOL
bSupportCellSpan;
XML Export SDK C Programming Guide
•
•
• 253
•
•
•
Chapter 10 XML Export API Structures
BOOL
BOOL
BOOL
BOOL
BOOL
int
bSupportRowSpan;
bSupportColumnWidth;
bRemoveEmptyColumns;
bRemoveEmptyRows;
bEnableEmptyRows;
nRowsBeforeSplit;
}
KVXMLOptions;
Member Descriptions
bUseVerityDTD
Set to TRUE to generate XML based on the Verity DTD. For more
information, see “Use the Verity Document Type Definition (DTD)” on
page 63. This generates a valid XML document suitable as a general
interchange format. If FALSE, the XML is based on the source
document’s paragraph structure.
The default is TRUE.
pszVerityDTDPath
If you move the Verity DTD from the default tempout directory to
another output directory, set the string value of pszVerityDTDPath
to the new location. This path is added to the document type
declaration in the XML file.
The default is no path. That is, the DTD is assumed to be in the same
directory as the generated XML files.
eStyleSheetType
One of the enumerated options for processing style sheet information.
The options are defined in KVXMLStyleSheetType in kvxml.h. See
“KVXMLStyleSheetType” on page 277
STYLESHEET_DISABLED—Disables style sheet formatting.
XML_CSS—Enables Cascading Style Sheet (CSS) formatting, and
outputs the generated formatting data in an external CSS file
referenced in the XML output as a tag.
XML_XSL—Enables Extensible Stylesheet Language (XSL)
formatting, and uses an external XSL file referenced in a
<?xml-stylesheet...?> processing instruction.
The default is STYLESHEET_DISABLED.
•
•
254 ••
•
•
XML Export SDK C Programming Guide
KVXMLOptions
bUseExistingStyleSheet
Set to TRUE to apply an existing XSL style sheet or a CSS to an XML
document. The style sheet filename is inserted into the type
declaration at the beginning of the XML file. The location of the external
style sheet file is set by pszStyleSheet. If pszStyleSheet is not
specified and the style sheet type is XSL, then a default XSL style
sheet, appropriate for the source document type, is used. The default
XSL style sheets are:
 wp.xls (for word processing documents)
 ss.xls (for spreadsheets)
 pg.xls (for presentations)
If pszStyleSheet is not specified and the style sheet type is CSS,
then a CSS file is created.
Existing style sheets are not validated.
The default is FALSE.
pszStyleSheet
The path and filename of an external style sheet.
The default is no path.
bIndexOnly
Set this to TRUE to generate output with minimal markup (ID and style
paragraph attributes) and without images. Since the generated output
is minimized to textual content, it is suitable for an indexing engine. If
bIndexOnly is set to FALSE, embedded images in a document are
regenerated as separate files and stored in the output directory.
The template file named xml_index.ini and the xmlindex sample
program demonstrate the effect of setting bIndexOnly.
To generate output with verbose markup and without images, set the
nType argument of the function KVXMLConfig() to
KVCFG_SUPPRESSIMAGES. See “KVXMLConfig()” on page 205.
Applies to word processing documents and spreadsheets only.
The default is FALSE.
eOutputCharSet
The character set to use for textual output. To ensure the character set
defined here is used, you must set bForceOutputCharSet to
TRUE. The available character sets are enumerated in KVCharSet in
kvtypes.h. See “Convert Character Sets” on page 99.
The section “Supported Formats” on page 294 lists the file formats for
which character set information can be determined.
The default is KVCS_UNKNOWN.
XML Export SDK C Programming Guide
•
•
• 255
•
•
•
Chapter 10 XML Export API Structures
bForceOutputCharSet
Set to TRUE to use the output character set specified in
eOutputCharSet, regardless of the internal document information or
the source character set specified by eSrcCharSet. See “Convert
Character Sets” on page 99.
Forcing a character set to KVCS_UNKNOWN is always ignored.
The default is FALSE.
eSrcCharSet
Specifies the character set of the document. To ensure the character
set defined here is used, you must set bForceSrcCharSet to TRUE.
The available character sets are enumerated in KVCharSet in
kvtypes.h. See “Convert Character Sets” on page 99. The section
“Supported Formats” on page 294 lists the file formats for which
character set information can be determined.
The default is KVCS_UNKNOWN.
bForceSrcCharSet
Set to TRUE to use the source character set specified in
eSrcCharSet, regardless of the internal document information. See
“Convert Character Sets” on page 99.
Forcing a character set to KVCS_UNKNOWN is always ignored.
The default is FALSE.
eOutputLanguageID
The language for the textual output of language-specific data such as
time and date. eOutputLanguageID must be in the system locale. If
eOutputLanguageID is invalid or not supplied, the system default is
used. Language IDs are defined in KVLanguageID in kvtypes.h.
The default is Language_UNKNOWN.
bUseDocumentColors
Set to TRUE to retain the color attributes information contained in the
source document. If set to FALSE, no color attributes appear in the
<font> tags of the output.
The default is FALSE.
bUseDocumentFontInfo
Set to TRUE to retain the font information contained in the source
document. If set to FALSE, no font information appears in the <font>
tags in the output.
The default is FALSE.
bNbspEmptyCells
Set to TRUE to include a non-breaking space (<td>&nbsp;</td>) in
the markup for empty table cells in the source document. If this is set
to FALSE, <td></td> is generated for empty table cells.
Applies to word processing documents and spreadsheets only.
The default is TRUE.
•
•
256 ••
•
•
XML Export SDK C Programming Guide
KVXMLOptions
eSATableBorder
Specifies whether table borders are based on the setting in the source
document, are always on, or are always off. The options are
enumerated in ENSATableBorder in kvtypes.h. See
“ENSATableBorder” on page 271.
Applies to word processing documents only.
The default is SA_BaseOnDocument.
nTableBorderWidth
Sets the width of the table border in pixels.
Applies to word processing documents only.
The default is 1.
pszBaseURL
The base URL that replaces the $BASE token in the XML output.
The default is NULL.
pszMainURL
The main URL that replaces the $MAIN token in the XML output.
The default is NULL.
pszDefaultOutputDirectory
The default output directory for auxiliary files created during the
conversion.
The default is NULL, and the files are placed in the directory in which
your application is running.
pszPicPath
The output directory for graphic files created during the conversion. If
specified, this member can also be used by the callback functions
KVXMLGetAnchor and KVXMLGetAuxOutput.
Applies to word processing documents only.
The default is NULL, and the files are placed in the directory in which
your application is running.
pszPicURL
The URL of the graphic files created from embedded graphics in the
source document. To specify a complete image source, this element
must be combined with pszAnchor of the fpGetAnchor callback
function. See “GetAnchor()” on page 228.
For example, setting pszPicURL to ../cgi-bin/ and setting
pszAnchor to pic.jpg results in the following markup:
<a xmlns:xlink= xlink href="../cgi-bin/pic.jpg">
Applies to word processing documents only.
The default is NULL.
pszJavaURL
The URL where the Java rasterizer (kvvector.jar) is located.
The Java rasterizer is not currently enabled.
The default is NULL.
XML Export SDK C Programming Guide
•
•
• 257
•
•
•
Chapter 10 XML Export API Structures
bRemoveFileNameSpaces
Set to TRUE to remove spaces from generated output filenames.
The default is FALSE.
bRasterizeFiles
Set to TRUE to rasterize slides from presentations into single images.
Set to FALSE to only extract text from presentation files. When this
member is set to FALSE graphics do not appear in the output.
Since XML Export only extracts textual components from
presentations, this member must be set to FALSE.
The default is FALSE.
eOutputRasterGraphicType
The output format of rasterized embedded graphics. There are six
options enumerated in KVXMLGraphicType in kvxml.h. See
“KVXMLGraphicType” on page 279.
The default is KVGFX_JPEG.
eOutputVectorGraphicType
The output format of vector graphics. The options are enumerated in
KVXMLGraphicType in kvxml.h. The default is JPEG. See
“KVXMLGraphicType” on page 279. For more information on
converting vector graphics on UNIX or Linux, see “Display Vector
Graphics on UNIX and Linux” on page 109.
The default is KVGFX_JPEG.
cxVectorToRasterXRes
Controls the X resolution (width in pixels) at which presentations and
graphics are converted. This is set in conjunction with
cyVectorToRasterYRes. To set this member, see “Setting the
Resolution of Presentations and Graphics” on page 261.
The default is 0, which means the original resolution is retained.
cyVectorToRasterYRes
Controls the Y resolution (height in pixels) at which presentations and
graphics are converted. This is set in conjunction with
cxVectorToRasterXRes. To set this member, see “Setting the
Resolution of Presentations and Graphics” on page 261.
The default value is 0, which means the original resolution is retained.
nCompressionQuality
Controls the output quality of graphics that support compression
quality (for example, JPEG). A value of 0 means default quality (85
compression); 1 is the lowest quality (highest compression and
therefore the smallest file size); 100 is the highest quality (no
compression and therefore the largest file size).
Applies to word processing documents only.
The default is 0.
bGenerateURLs
Set to TRUE to add anchor tags (<a xmlns:xlink= xlink
href=> </a>) to text starting with “www”, “http:” or “file:”.
Applies to word processing documents only.
The default is FALSE.
•
•
258 ••
•
•
XML Export SDK C Programming Guide
KVXMLOptions
lcbMaxMemUsage
The maximum memory allocated dynamically for token buffers during
file processing. If this maximum is reached, Export performs a
swap-to-disk operation internally, and then reuses the memory blocks.
Export maintains an internal minimum memory size.
Applies to word processing or text documents only.
The default is LONG_MAX. The unit is in bytes.
cReplaceChar
The character used when a character in the source document’s
character set cannot be mapped to the output character set.
The default replacement character is a question mark (?).
cRedact
The character that replaces tagged text that has been designated,
through style mapping, to be omitted from the output. This functionality
is useful when you need to hide confidential or sensitive information.
The specified character is used for all text that has been mapped to a
style processed with the KVSTYLE_REDACT flag (defined in
kvtypes.h). See “Map Styles” on page 104.
Applies to word processing documents only.
The default replacement character is “X”.
eEmptyParaType
Determines if paragraphs without content generate markup or ID
attributes in the output file. There are three options enumerated in
KVXMLEmptyParaType in kvxml.h. See “KVXMLEmptyParaType”
on page 281.
Applies to word processing documents only.
The default is KVEPT_SUPPRESS.
eHardPageBreakType
Determines if hard page breaks generate markup or ID attributes in the
output file. There are four options enumerated in
KVXMLEmptyParaType in kvxml.h. See
“KVXMLHardPageBreakType” on page 282.
Applies to word processing documents only.
The default is KVHPBT_SUPPRESS.
bSupportColumnHeadings
Set to TRUE to include column headings from the source spreadsheet
in the output.
Applies to spreadsheets only.
The default is FALSE.
bSupportRowHeadings
Set to TRUE to include row headings from the source spreadsheet in
the output.
Applies to spreadsheets only.
The default is FALSE.
XML Export SDK C Programming Guide
•
•
• 259
•
•
•
Chapter 10 XML Export API Structures
bSupportCellSpan
Set to TRUE to include colspan=”n” markup in the output.
Applies to spreadsheets only.
The default value is FALSE.
bSupportRowSpan
Set to TRUE to include row span data from the source spreadsheet in
the output.
Applies to spreadsheets only.
The default value is FALSE. Currently not supported.
bSupportColumnWidth
Set to TRUE to include column width data from the source spreadsheet
in the output.
Applies to spreadsheets only.
The default value is FALSE.
bRemoveEmptyColumns
Set to TRUE to remove spreadsheet columns that do not contain data
and to disable cell merging.
Applies to spreadsheets only.
The default is FALSE.
bRemoveEmptyRows
Set this to TRUE to remove spreadsheet rows that do not contain data
or color, and to disable cell merging.
Applies to spreadsheets only.
The default is FALSE.
bEnableEmptyRows
Set to TRUE to display empty rows in a spreadsheet format. If set to
FALSE, empty rows are not displayed. This only applies to 20 or more
consecutive empty rows.
Applies to spreadsheets only.
The default is FALSE.
nRowsBeforeSplit
The approximate number of spreadsheet rows to be processed before
splitting a table. This helps to prevent large spreadsheet tables from
occurring in a single document, which can cause speed and
processing problems for the browser.
Applies to spreadsheets only.
The default is 0.
•
•
260 ••
•
•
XML Export SDK C Programming Guide
KVXMLOptions
Discussion
A pointer to this structure is passed as an argument to fpConvertStream() and
KVXMLConvertFile(). If the pointer to the structure is not NULL, the values of the
members specified in the structure are used. If the pointer to the structure is
NULL, the default values are used.
Setting the Resolution of Presentations and Graphics
The members cxVectorToRasterXRes and cyVectorToRasterYRes are set in
conjunction to specify the resolution (width in pixels) at which presentations and
graphics are converted.
You can specify the resolution in one of two ways:

as a proportion of the original resolution

as a specified number of pixels
Setting the Resolution Proportionally
To set the resolution proportionally, set one of the members
(cxVectorToRasterXRes or cyVectorToRasterYRes) to a percentage of the
original resolution, and one to zero. For example, the following setting converts
the graphic at 50 percent of the original resolution:
cxVectorToRasterXRes=-50
cyVectorToRasterYRes=0
The following setting converts the graphic at 200 percent of the original resolution:
cxVectorToRasterXRes=0
cyVectorToRasterYRes=-200
The member that is set to zero is automatically adjusted to maintain the aspect
ratio. If both cxVectorToRasterXRes and cyVectorToRasterYRes are set to a
percentage, cyVectorToRasterYRes defaults to zero during the conversion.
Setting the Resolution in Pixels
To set the resolution in pixels, set one of the members (cxVectorToRasterXRes
or cyVectorToRasterYRes) to the number of pixels, and one to zero. For
example:
cxVectorToRasterXRes=0
cyVextorToRasterYRes=1500
The member that is set to zero is automatically adjusted to maintain the aspect
ratio. The maximum resolution is 4,000 pixels.
XML Export SDK C Programming Guide
•
•
• 261
•
•
•
Chapter 10 XML Export API Structures
KVXMLTemplate
This structure defines the overall framework of the XML output. Members in this
structure define the XML markup written at specific points in the output stream.
The pointers contain XML markup that may include embedded KeyView-defined
tokens. The XML markup contained in these strings should be well-formed. For
the generated document to be valid, the markup must conform to the Verity DTD.
The structure is initialized by calling the function fpConvertStream() or
KVXMLConvertFile(). See “fpConvertStream()” on page 186 or
“KVXMLConvertFile()” on page 214.
typedef struct tag_KVXMLTemplate
{
char
*pszMainTop;
char
*pszMainBottom;
char
*pszFirstH1Start;
char
*pszFirstH1End;
char
*pszMiddleH1Start;
char
*pszMiddleH1End;
char
*pszLastH1Start;
char
*pszLastH1End;
char
*pszH[2..6]XML;
char
*pszTOCH[1..6]Start;
char
*pszTOC_H[1..6];
char
*pszTOCH[1..6]End;
char
*pszXFile;
char
*pszXStartBlock;
char
*pszXEndBlock;
char
*pszStartBlock;
char
*pszEndBlock;
BOOL
bPutBlocksInSeparateFiles;
BOOL
bHardPageMakesNewBlock
long
lcbBlockSize;
char
*pszChunkTemplate;
char
*pszUserSummary;
char
*pszTOCH[1..6]LeafNode;
}
KVXMLTemplate;
•
•
262 ••
•
•
XML Export SDK C Programming Guide
KVXMLTemplate
Member Descriptions
pszMainTop
The markup and tokens inserted at the beginning of the main XML file.
Most of the sample template files feature <MetaData> tags with tokens
that store the input document’s metadata. This member does not include
the processing instructions or document type declarations that appears at
the beginning of an XML document. The document type declaration
<?xml version= ...> is automatically generated by XML Export. If
you are using style sheets or the Verity DTD, the processing instructions
<?xml stylesheet= ...> and <!DOCTYPE ...> are also
automatically generated by XML Export.
The default is NULL.
pszMainBottom
The markup and tokens inserted at the end of the main XML file.
The default is NULL.
pszFirstH1Start
The markup and tokens inserted at the beginning of the first created H1
XML block (that is, the block associated with the first H1 table of contents
entry).
The default is NULL.
pszFirstH1End
The markup and tokens inserted at the end of the first created H1 XML
block (that is, the block associated with the first H1 table of contents
entry).
The default is NULL.
pszMiddleH1Start
The markup and tokens inserted at the beginning of those H1 XML blocks
that are neither the first nor the last H1 blocks created (that is, blocks
associated with all but the first and last H1 table of contents entries).
The default is NULL.
pszMiddleH1End
The markup and tokens inserted at the end of those H1 XML blocks that
are neither the first nor the last H1 blocks created (that is, blocks
associated with all but the first and last H1 table of contents entries).
The default is NULL.
pszLastH1Start
The markup and tokens inserted at the beginning of the last created H1
XML block (that is, the block associated with the last H1 table of contents
entry).
The default is NULL.
pszLastH1End
The markup and tokens inserted at the end of the last created H1 XML
block (that is, the block associated with the last H1 table of contents
entry).
The default is NULL.
XML Export SDK C Programming Guide
•
•
• 263
•
•
•
Chapter 10 XML Export API Structures
pszH[2..6]XML
The markup and tokens inserted in an XML block for heading levels 2
through 6.
The default is NULL.
pszTOCH[1..6]Start
The markup and tokens inserted at the beginning of a table of contents
block for heading levels 1 through 6 entries. For example:
<ol list-style-type="upper-roman">
The default is NULL.
pszTOC_H[1..6]
The markup and tokens required to process the table of contents entries
for heading levels 1 through 6. For example:
<a xmlns:xlink="http://www.w3.org/TR/xlink" xlink href=
"#$ANCHOR"> $TOCTE</a>
If the table of contents heading contains special characters, such as an
ampersand (&) or parentheses, you must use the $TOCPE token in the
pszTOC_H[1..6] markup. This token retains character entities and
prevents validity errors. See “Export Tokens” on page 329 for more
information on table of contents tokens.
The default is NULL.
pszTOCH[1..6]End
The markup and tokens inserted at the end of a table of contents block for
heading levels 1 through 6 entries. For example:
</ol>
The default is NULL.
pszXFile
The markup and tokens generated and placed in an extra XML file. This
file holds content from the source document. To process this file, you
would use the $XANCHOR token. See “Export Tokens” on page 329 for
more information on Export tokens.
The default is NULL.
pszXStartBlock
The markup and tokens inserted at the beginning of each XML block
generated by the $XANCHOR token. If either this member or
pszXEndBlock is defined, both pszStartBlock and pszEndBlock
are ignored. See “Export Tokens” on page 329 for more information on
Export tokens.
The default is NULL.
pszXEndBlock
The markup and tokens to output at the end of each XML block generated
by the $XANCHOR token. If either this member or pszXStartBlock is
defined, both pszStartBlock and pszEndBlock are ignored. See
“Export Tokens” on page 329 for more information on Export tokens.
The default is NULL.
•
•
264 ••
•
•
XML Export SDK C Programming Guide
KVXMLTemplate
pszStartBlock
The markup and tokens inserted at the beginning of each block created
as a result of lcbBlockSize or bHardPageMakesNewBlock.
The default is NULL.
pszEndBlock
The markup and tokens inserted at the end of each block created as a
result of lcbBlockSize or bHardPageMakesNewBlock.
The default is NULL.
bPutBlocksInSeparateFiles
Set to TRUE to create a separate XML file for each heading level 1 block.
Each new block uses the markup defined in pszStartBlock and
pszEndBlock. If set to FALSE, then each heading level 1 block is placed
sequentially in the same file, after the initial markup is written.
The default is FALSE.
bHardPageMakesNewBlock
Set to TRUE to have hard page breaks in the source document generate
new XML files during the conversion process. The member
pszchunktemplate provides the appropriate table of contents entry for
the new block.
Applies to word processing documents and spreadsheets only.
The default is FALSE.
lcbBlockSize
The maximum size (in bytes) of heading level 1 XML output files. This
number is used as a guideline and may be exceeded to break content at
a logical location (for example, a row boundary).
The default. This means the size is undefined and unlimited.
pszChunkTemplate
If an H1 XML block is subdivided into separate files as a result of the size
limitations specified in lcbBlockSize, this member provides a template
for creating a table of contents entry for the new file. The block number
can be made a part of this template by inserting the token
$SPLITBLOCKNUMBER. For example:
Page $SPLITBLOCKNUMBER
The default is NULL.
pszUserSummary
The markup and tokens generated when the tokens $USERSUMMARY or
$SUMMARY are used. For example:
<MetaData name=”$NAME” content=”$CONTENT”/>
The default is NULL.
pszTOCH[1..6]LeafNode
The markup that replaces pszTOC_H[1..6] entries for leaf nodes in the
table of contents. A leaf node is a node that has no children.
The default is NULL.
XML Export SDK C Programming Guide
•
•
• 265
•
•
•
Chapter 10 XML Export API Structures
Discussion
A pointer to this structure is passed as an argument to fpConvertStream() and
KVXMLConvertFile(). If the pointer to the structure is not NULL, the values of the
members specified in the structure are used. If the pointer to the structure is
NULL, the default values are used.
•
•
266 ••
•
•
XML Export SDK C Programming Guide
KVXMLTOCOptions
KVXMLTOCOptions
This structure defines whether a heading is included in the table of contents.
Source text is converted to a heading in the XML output if

it meets all the criteria defined by the members of KVXMLHeadingInfo, and

the headingCreateType member of KVXMLTOCOptions is set to allow
automatic heading generation.
The structure is initialized by calling the function fpConvertStream() or
KVXMLConvertFile(). See “fpConvertStream()” on page 186 or
“KVXMLConvertFile()” on page 214.
See “KVXMLOptions” on page 253 for more information on the criteria used to
determine whether a heading is included in the table of contents.
Typedef struct tag_KVXMLTOCOptions
{
BOOL
bAllowHeadingsInTables;
KVHeadingCreateOptions headingCreateType;
KVXMLHeadingInfo
*pH1;
KVXMLHeadingInfo
*pH2;
KVXMLHeadingInfo
*pH3;
KVXMLHeadingInfo
*pH4;
KVXMLHeadingInfo
*pH5;
KVXMLHeadingInfo
*pH6;
}
KVXMLTOCOptions;
XML Export SDK C Programming Guide
•
•
• 267
•
•
•
Chapter 10 XML Export API Structures
Member Descriptions
bAllowHeadingsInTables
Determines if the text in tables is considered for automatic heading
generation. If set to TRUE, the text in tables is included in the
determination of headings and table of contents entries.
Applies to word processing documents and spreadsheets only.
The default is FALSE.
headingCreateType
Determines how XML Export subdivides the source document into table of
contents entries. This can be set to one of the two options enumerated in
KVHeadingCreateOptions in kvxml.h. See
“KVHeadingCreateOptions” on page 280.
The determination of table of contents entries is based on whether the
source document contains heading styles or whether text attributes
conform to the criteria defined in the structure KVXMLHeadingInfo. See
“KVXMLHeadingInfo” on page 248.
Heading styles are predefined style tags, such as “Heading 1” and
“Heading 2” tags in a Microsoft Word document. Text attributes are bold,
underlined, italic, and so on.
Applies to word processing documents only.
The default is KVCS_DocHeadingsOnly.
KVXMLHeadingInfo
Pointer to the structure KVXMLHeadingInfo. See “KVXMLHeadingInfo”
on page 248.
When the table of contents entries are not based on the source documents
heading styles, the table of contents entries are determined by whether
text attributes (such as bold, underlined, and italic text) in the source
document meet all the criteria defined in KVXMLHeadingInfo.
Discussion
A pointer to this structure is passed as an argument to fpConvertStream() and
KVXMLConvertFile(). If the pointer to the structure is not NULL, the values of the
members specified in the structure are used. If the pointer to the structure is
NULL, the default values are used.
•
•
268 ••
•
•
XML Export SDK C Programming Guide
CHAPTER 11
Enumerated Types
This section provides information on some of the enumerated types used by the
XML Export API. It contains the following topics:

Introduction

ENSATableBorder

KVCredKeyType

KVErrorCode

KVErrorCodeEx

KVXMLStyleSheetType

KVXMLAnchorType

KVXMLGraphicType

KVHeadingCreateOptions

KVXMLEmptyParaType

KVXMLHardPageBreakType

KVMetadataType

KVMetaNameType

KVSumInfoType

KVSumType

LPDF_DIRECTION
XML Export SDK C Programming Guide
•
•
• 269
•
•
•
Chapter 11 Enumerated Types
Introduction
The enumerated types are in adinfo.h, kvtypes.h,kvxml.h, and
kvxtract.h. These header files are in the include directory. The first entry in
an enumerated type structure should be set to zero (0). Each subsequent entry is
increased by 1. For example, the first five entries of KVCharSet in kvtypes.h
are:
KVCS_UNKNOWN
KVCS_SJIS
KVCS_GB
KVCS_BIG5
KVCS_KSC
They would be set in the following way:
Enumerated Type
Setting
KVCS_UNKNOWN
0
KVCS_SJIS
1
KVCS_GB
2
KVCS_BIG5
3
KVCS_KSC
4
Many enumerated types may also be set by entering the appropriate symbolic
constant, or TRUE/FALSE.
Programming Guidelines
As KeyView is enhanced in future releases, some enumerated types may be
expanded. For example, new format IDs may be added to the ENdocFmt
enumerated type, or new error codes may be added to the KVErrorCodeEx
enumerated type. When using these expandable types, your code should ensure
binary compatibility with future releases.
For example, if you use an array to access error messages based on an error
code, your code should check the error code is less than KVError_Last before
accessing the data. This ensures new error codes are detected when you add
KeyView binary files from new releases to your existing installation.
The following enumerated types are expandable:
KVErrorCodeEx
KVMetadataType
KVCharSet
•
•
270 ••
•
•
XML Export SDK C Programming Guide
ENSATableBorder
KVLanguageID
KVSubfileType
ENdocFmt
ENSATableBorder
This enumerated type defines the type of border to display around tables. It is
defined in kvtypes.h.
Definition
typedef enum tag_ENSATableBorder
{
SA_BaseOnDocument,
SA_NoBorder,
SA_Border
}
ENSATableBorder;
Enumerators
SA_BaseOnDocument
Border type is based on the document.
SA_NoBorder
Table borders are always off.
SA_Border
Table borders are always on.
KVCredKeyType
This enumerated type defines the type of credential used to open a protected file.
See “KVCredentialComponent” on page 161. It is defined in kvxtract.h.
Definition
typedef enum tag_KVCredKeyType
{
KVCredKeyType_UserName,
KVCredKeyType_UserIdFile,
KVCredKeyType_Password,
}
KVCredKeyType;
XML Export SDK C Programming Guide
•
•
• 271
•
•
•
Chapter 11 Enumerated Types
Enumerators
KVCredKeyType_UserName
The credential in KVCredentialComponent is a
user name.
KVCredKeyType_UserIdFile
The credential in KVCredentialComponent is a
path to a file containing user IDs.
KVCredKeyType_Password
The credential in KVCredentialComponent is a
password.
KVErrorCode
This enumerated type defines the type of error generated if Export fails. It is
defined in kvtypes.h.
Definition
typedef enum tag_KVErrorCode
{
KVERR_Success,
/* 0
KVERR_DLLNotFound,
/* 1
KVERR_OutOfCore,
/* 2
KVERR_processCancelled,
/* 3
KVERR_badInputStream,
/* 4
KVERR_badOutputType,
/* 5
KVERR_General,
/* 6
KVERR_FormatNotSupported, /* 7
KVERR_PasswordProtected,
/* 8
KVERR_ADSNotFound,
/* 9
KVERR_AutoDetFail,
/* 10
KVERR_AutoDetNoFormat,
/* 11
KVERR_ReaderInitError,
/* 12
KVERR_NoReader,
/* 13
format*/
KVERR_CreateOutputFileFailed,
file*/
KVERR_CreateTempFileFailed,
file*/
KVERR_ErrorWritingToOutputFile,
file*/
KVERR_CreateProcessFailed, /* 17
KVERR_WaitForChildFailed, /* 18
KVERR_ChildTimeOut,
/* 19
•
•
272 ••
•
•
Success*/
DLL or shared library not found*/
memory allocation failure*/
fpContinue() returns FALSE*/
Invalid/corrupt input stream*/
Invalid output type requested*/
General error....
*/
Format not supported*/
File is Password Protected*/
Adobe Document Server not found*/
Autodetect error*/
Unable to detect file format*/
Error initializing the reader*/
No reader available for this
/* 14 Unable to create output
/* 15 Unable to create temp
/* 16 Error writing to output
Error creating a child process*/
Wait for child process failed*/
Child process hung / timed out*/
XML Export SDK C Programming Guide
KVErrorCode
KVERR_ArchiveFileNotFound, /* 20 Attempt to extract nonexistent
file*/
KVERR_ArchiveFatalError
/* 21 Fatal error processing archive should abort*/
}
KVErrorCode;
Enumerators
KVERR_SUCCESS
Function completed successfully.
KVERR_DLLNotFound
A DLL or shared library was not found.
KVERR_OutOfCore
Memory allocation failure.
KVERR_processCancelled
Callback function fpContinue() returns FALSE.
KVERR_badInputStream
Invalid or corrupt input stream.
KVERR_badOutputType
Invalid output is requested.
KVERR_General
General error.
KVERR_FormatNotSupported
File format is not supported.
KVERR_PasswordProtected
File is encrypted or password-protected. KeyView only supports
secure PST files.
KVERR_ADSNotFound
Adobe Document Server not found. This error is obsolete.
KVERR_AutoDetFail
Autodetect error.
KVERR_AutoDetNoFormat
Unable to detect file format.
KVERR_ReaderInitError
Error initializing the reader.
KVERR_NoReader
No reader available for this format.
KVERR_CreateOutputFileFailed
Unable to create output file.
If the overwrite flag in KVExtractSubFileArg is FALSE, and
a sub file has the same name as a file in the target path, this
error is generated. See “KVExtractSubFileArg” on page 163.
KVERR_CreateTempFileFailed
Unable to create temporary file.
KVERR_ErrorWritingToOutputFile
Error writing to output file.
KVERR_CreateProcessFailed
Error creating a child process.
KVERR_WaitForChildFailed
Wait for child process failed.
XML Export SDK C Programming Guide
•
•
• 273
•
•
•
Chapter 11 Enumerated Types
KVERR_ChildTimeOut
Child process hung/timed out.
KVERR_ArchiveFileNotFound
Attempt to extract nonexistent file.
KVERR_ArchiveFatalError
Fatal error processing an archive file.
KVErrorCodeEx
This enumerated type defines extended error codes. It is defined in kvtypes.h.
Definition
typedef enum tag_KVErrorCodeEx
{
KVError_OpenStreamFailure = KVERR_ArchiveFatalError + 1, /* 22
KVOpen stream failure */
KVError_InterfaceFunctionNotFound, /* 23 Interface function not
found */
KVError_InputFileNotFound,
/* 24 Cannot find input file*/
KVError_OpenOutputFileFailed, /* 25 Cannot open output file*/
KVError_MemoryLeak,
/* 26 Memory leak*/
KVError_MemoryOverwrite,
/* 27 Memory overwrite*/
KVError_GPF,
/* 28 Exception during oop
filtering*/
KVError_OopCore,
/* 29 Core dump in child process*/
KVError_KVoopLogFailed,
/* 30 Creation of oop error log
failed*/
KVError_OverNestedFileLimit, /* 31 File exceeds nested file
limit*/
KVError_PSTAccessFailed,
/* 32 Access failed on PST files*/
KVError_PasswordRequired,
/* 33 Password required to access
file*/
KVError_InvalidArgs
/* 34 Input argument/structure is
invalid*/
KVError_ReaderUsageDenied,
/* 35 Reader requires a valid
license*/
KVError_OopBadConfig,
/* 36 Config buffer data was
incomplete*/
KVError_OopBrokenPipe,
/* 37 Read/write to/from pipe
failed*/
KVError_OopPipeOEF,
/* 38 Pipe was closed prior to read/
write*/
KVError_IPCTimeOut,
/* 39 Pipe/socket timed out on poll/
select*/
KVError_InvalidOopDriverSignature, /* 40 Client sent request to
OOP server but context driver does not exist on the server*/
•
•
274 ••
•
•
XML Export SDK C Programming Guide
KVErrorCodeEx
KVError_InvalidOopServiceSignature, /* 41 Client sent request to
OOP service that does not exist*/
KVError_ZeroFile,
/* 42 Input file is empty or zero bytes */
KVError_CompressionNotSupported
/* 43 File or subfile is
compressed with unsupported method */
KVError_Last
/* 44 */
}
KVErrorCodeEx;
Enumerators
KVError_OpenStreamFailure =
KVERR_ArchiveFatalError +1
Failed to open a stream during out-of-process filtering. This
is an extended error for the code KVERR_General. This is
used by KeyView Filter.
KVError_InterfaceFunctionNotFound
An interface function was not found during out-of-process
filtering. This is an extended error for the code
KVERR_General. This is used by KeyView Filter.
KVError_InputFileNotFound
Could not find the input file during out-of-process filtering.
This is an extended error for the code KVERR_General.
This is used by KeyView Filter.
KVError_OpenOutputFileFailed
Could not open the output file during out-of-process
filtering. This is an extended error for the code
KVERR_General. This is used by KeyView Filter.
KVError_MemoryLeak
Memory leak occurred during out-of-process filtering. This
is an extended error for the code KVERR_General. This is
used by KeyView Filter.
KVError_MemoryOverwrite
Memory overwrite occurred during out-of-process filtering.
This is an extended error for the code KVERR_General.
This is used by KeyView Filter.
KVError_GPF
Exception occurred during out-of-process filtering. This is
an extended error for the code KVERR_General. This is
used by KeyView Filter.
KVError_OopCore
Memory dump was generated in a child process during
out-of-process filtering. This is an extended error for the
code KVERR_General. This is used by KeyView Filter.
KVError_KVoopLogFailed
Creation of out-of-process error log failed. This is an
extended error for the code KVERR_General. This is used
by KeyView Filter.
KVError_OverNestedFileLimit
The container file has more than the allowable number of
child documents. One or more child documents were not
converted. Currently, this is not used.
XML Export SDK C Programming Guide
•
•
• 275
•
•
•
Chapter 11 Enumerated Types
KVError_PSTAccessFailed
The PST file could not be converted. This error may be
returned when a call to fpOpenFile() returns NULL for
one of the following reasons:
 Microsoft Outlook client is not installed
 Microsoft Outlook client is installed, but is not the default
email client
 Microsoft Outlook client is installed, but is not configured
correctly
 PST file is corrupt
 PST file is read-only (PST files must allow read and
write access)
 MAPI call fails
KVError_PasswordRequired
To open the file, credentials must be provided. This error
may be returned when a call to fpOpenFile() returns
NULL.
KVError_InvalidArgs
The input argument or structure is invalid. This is generated
by the File Extraction APIs.
KVError_ReaderUsageDenied
The current license key does not enable the document
reader required to convert the file. This error may be
returned when a call to fpOpenFile() returns NULL.
Some document readers are considered advanced
features and are licensed separately from the KeyView
SDK (for example, the PST and MBX readers). Contact
your Autonomy sales representative to get an updated
license key.
KVError_OopBadConfig
Information in the kvxconfig.ini file is incomplete and
cannot be used to the XML file. This is used by KeyView
Filter.
KVError_OopBrokenPipe
Data was not transferred between the parent and child
processes during out-of-process filtering because either the
parent or child failed. This is used by KeyView Filter.
KVError_OopPipeOEF
Data was not transferred between the parent and child
processes during out-of-process filtering because the
parent process was shutdown. This is used by KeyView
Filter.
KVError_IPCTimeOut
Either the parent or child process is waiting for a reply or
request during out-of-process filtering. This is used by
KeyView Filter.
•
•
276 ••
•
•
XML Export SDK C Programming Guide
KVXMLStyleSheetType
KVError_InvalidOopDriverSignature
A client sent a request to an out-of-process server, but the
context driver does not exist on the server. This is used by
KeyView Filter.
KVError_InvalidOopServiceSignature
A client sent a request to a File Extraction service that does
not exist.
If this error is generated on the call to fpClose(), it can be
ignored. This is used by KeyView Filter.
KVError_ZeroFile
The input file is empty or zero bytes.
KVError_CompressionNotSupported
The file or subfile is compressed with an unsupported
compression method.
Discussion
 As error reporting is enhanced in future releases, new error messages may be
added to this enumerator type. When using this type, your code should ensure
binary compatibility with future releases. See “Programming Guidelines” on
page 270.

If an extended error code is called for a format to which the error does not
apply, the code KVError_Last is returned.
KVXMLStyleSheetType
This enumerated type defines the options for processing style sheet information. It
is defined in kvxml.h.
Definition
typedef enum tag_KVXMLStyleSheetType
{
STYLESHEET_DISABLED = 0,
XML_CSS,
XML_XSL,
}
KVXMLStyleSheetType;
XML Export SDK C Programming Guide
•
•
• 277
•
•
•
Chapter 11 Enumerated Types
Enumerators
STYLESHEET_DISABLED
Disables Cascading Style Sheet (CSS) formatting.
XML_CSS
Enables cascading style sheet (CSS) formatting and
generates an external file or uses an existing external file
which is referenced in a <?xml-stylesheet...?>
processing instruction.
XML_XSL
Enables Extensible Stylesheet Language (XSL) formatting
and uses an external XSL file which is referenced in a
<?xml-stylesheet...?> processing instruction.
KVXMLAnchorType
This enumerated type defines the anchor types for the output stream. It is defined
in kvxml.h.
Definition
typedef enum tag_KVXMLAnchorType
{
VectorPictureAnchor = 0,
RasterPictureAnchor,
H1Anchor,
H2Anchor,
H3Anchor,
H4Anchor,
H5Anchor,
H6Anchor,
XAnchor,
AnimatedGIFAnchor,
CSSAnchor,
XSLAnchor,
GeneralAnchor,
DBAnchor,
JPEGAnchor
}
KVXMLAnchorType;
•
•
278 ••
•
•
XML Export SDK C Programming Guide
KVXMLGraphicType
Enumerators
VectorPictureAnchor
Anchor for embedded vector graphics.
RasterPictureAnchor
Anchor for embedded raster graphics.
H1Anchor
Anchor for heading level H1 blocks.
H2Anchor
Anchor for heading level H2 blocks.
H3Anchor
Anchor for heading level H3 blocks.
H4Anchor
Anchor for heading level H4 blocks.
H5Anchor
Anchor for heading level H5 blocks.
H6Anchor
Anchor for heading level H6 blocks.
XAnchor
Anchor for an external file.
AnimatedGIFAnchor
Anchor for embedded animated GIF graphics.
CSSAnchor
Anchor for external CSS file.
XSLAnchor
Anchor for external XSL file.
GeneralAnchor
Reserved for future use.
DBAnchor
Used internally.
JPEGAnchor
Anchor for embedded JPEG graphic.
KVXMLGraphicType
This enumerated type defines graphic formats to which embedded graphics and
presentations are converted. It is defined in kvxml.h.
Definition
typedef enum tag_KVXMLGraphicType
{
KVGFX_GIF,
KVGFX_JPEG,
KVGFX_PNG,
KVGFX_CGM,
KVGFX_WMF,
KVGFX_JAVA
}
KVXMLGraphicType;
XML Export SDK C Programming Guide
•
•
• 279
•
•
•
Chapter 11 Enumerated Types
Enumerators
KVGFX_GIF
Specifies GIF (Graphics Interchange Format) as the graphic type.
KVGFX_JPEG
Specifies JPEG (Joint Photographic Experts Group) as the graphic
type.
KVGFX_PNG
Specifies PNG (Portable Network Graphics) as the graphic type.
KVGFX_CGM
Specifies CGM (Computer Graphics Metafile) as the graphic type.
KVGFX_WMF
Specifies WMF (Windows Metafile) as the graphic type.
KVGFX_JAVA
Deprecated.
Also see “Display Vector Graphics on UNIX and Linux” on page 109.
KVHeadingCreateOptions
This enumerated type defines whether Export generates blocks and block chunks
(see “Definition of Terms” on page 36) based only on the heading styles defined in
a source document (if they are available), or based on both the source
document’s heading styles and headings that are created automatically by Export.
Headings that are created automatically by Export are based on the text attributes
of the source document as defined by KVXMLHeadingInfo (see
“KVXMLHeadingInfo” on page 248). It is defined in kvxml.h.
Definition
typedef enum tag_KVHeadingCreateOptions
{
KVHC_DocHeadingsOnly,
KVHC_CreateHeadingsAlways
}
KVHeadingCreateOptions;
•
•
280 ••
•
•
XML Export SDK C Programming Guide
KVXMLEmptyParaType
Enumerators
KVHC_DocHeadingsOnly
This instructs Export to rely exclusively on
heading styles defined in the source document.
However, if the source document does not contain
heading styles, Export generates blocks on its
own using the criteria defined by the structure
KVHeadingInfo.
KVHC_CreateHeadingsAlways
This instructs Export to use the heading styles in
the source document when available, and to also
automatically create table of contents entries
based on the criteria defined by the structure
KVHeadingInfo.
KVXMLEmptyParaType
This enumerated type defines the options for paragraphs that do not contain
content. It is defined in kvxml.h.
Definition
typedef enum tag_KVXMLEmptyParaType
{
KVEPT_SUPPRESS,
/* No markup generated
KVEPT_EMPTY,
/* Use <p/>
KVEPT_VERBOSE
/* Use <p id="...>&nbsp;</p>
}
KVXMLEmptyParaType;
XML Export SDK C Programming Guide
*/
*/
*/
•
•
• 281
•
•
•
Chapter 11 Enumerated Types
Enumerators
KVEPT_SUPPRESS
paragraphs without content are ignored. They do not
contribute white space and do not affect the ID number
of subsequent paragraphs. This is the default value.
KVEPT_EMPTY
paragraphs without content are represented by an
“empty” paragraph element <p/>. These contribute
minimal white space, but do not affect the ID number of
subsequent paragraphs.
KVEPT_VERBOSE
paragraphs without content contain a fully-defined start
tag <p id=”...”> with all non-default attributes, a
&nbsp; character entity, and end tag </p>. These
contribute additional white space and affect the ID
number of subsequent paragraphs.
KVXMLHardPageBreakType
This enumerated type defines the options for hard page breaks. It is defined in
kvxml.h.
Definition
typedef enum tag_KVXMLHardPageBreakType
{
KVHPBT_SUPPRESS, /* No markup generated
KVHPBT_EMPTY,
/* Use <Page/>
KVHPBT_EMPTYID, /* Use <Page id="n"/>
KVHPBT_ID
/* Use <Page id="n"> ... </Page>
}
KVXMLHardPageBreakType;
*/
*/
*/
*/
Enumerators
KVHPBT_SUPPRESS
•
•
282 ••
•
•
No markup is generated for hard page breaks. This is the
default value.
XML Export SDK C Programming Guide
KVMetadataType
KVHPBT_EMPTY
An empty page element, <Page/>, without ID attributes is
generated for hard page breaks.
KVHPBT_EMPTYID
An empty page element, <Page id=”n”/>, with ID attributes
is generated for hard page breaks. The ID is incremented for
each subsequent hard page break.
KVHPBT_ID
A “non-empty” “Page” element is generated for hard page
breaks. The page tags enclose the contents immediately after
the <WP> tag. When subsequent hard page breaks are
encountered, the previous “Page” element is closed with a </
Page> tag, and a <Page id=”...”> opening tag is added.
The final “Page” element is closed immediately before the
closing </WP> tag.
KVMetadataType
This enumerated type defines the data type of metadata that can be extracted
from a sub file in a mail message or mail store. If a metadata field has a
corresponding KeyView type in KVMetadataType, the metadata is converted to
the KVMetadataElem structure, and the structure member isDataValid is 1.
See “KVMetadataElem” on page 171. It is defined in kvtypes.h.
Definition
typedef enum
{
KVMetadata_Unknown
KVMetadata_Bool
KVMetadata_Binary
KVMetadata_Int4
KVMetadata_UInt4
KVMetadata_Int8
KVMetadata_UInt8
KVMetadata_String
KVMetadata_Unicode
KVMetadata_DateTime
KVMetadata_Float
KVMetadata_Double
KVMetadata_Last
}
KVMetadataType;
XML Export SDK C Programming Guide
=
=
=
=
=
=
=
=
=
=
=
=
0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
•
•
• 283
•
•
•
Chapter 11 Enumerated Types
Enumerators
KVMetadata_Unknown
The value in the property is of an unknown type.
KVMetadata_Bool
The value in the property is a boolean. The corresponding
MAPI type is PT_BOOLEAN.
KVMetadata_Binary
The value in the property is a byte array. The
corresponding MAPI type is PT_BINARY.
KVMetadata_Int4
The value in the property is a signed 4-byte integer. The
corresponding MAPI types are PT_I2, PT_SHORT,
PT_I4, and PT_LONG.
KVMetadata_UInt4
The value in the property is an unsigned 4-byte integer.
This type is not currently supported.
KVMetadata_Int8
The value in the property is a signed 8-byte integer. This
type is not currently supported.
KVMetadata_UInt8
The value in the property is an unsigned 8-byte integer.
This type is not currently supported.
KVMetadata_String
The value in the property is a string. The corresponding
MAPI type is PT_STRING8.
KVMetadata_Unicode
The value in the property is a Unicode string. The
corresponding MAPI type is PT_UNICODE.
KVMetadata_DateTime
The value in the property is a date and time. The
corresponding MAPI type is PT_SYSTIME.
KVMetadata_Float
The value in the property is a 4-byte float. The
corresponding MAPI type is PT_FLOAT.
KVMetadata_Double
The value in the property is an 8-byte double. The
corresponding MAPI type is PT_DOUBLE.
Discussion
New types may be added to this enumerated type. When using this type, your
code should ensure binary compatibility with future releases. See “Programming
Guidelines” on page 270.
•
•
284 ••
•
•
XML Export SDK C Programming Guide
KVMetaNameType
KVMetaNameType
This enumerated type defines the type of metadata fields extracted from a sub file
in a mail message or mail store. See “KVMetaName” on page 172. It is defined in
kvxtract.h.
Definition
typedef enum
{
KVMetaNameType_Integer = 0,
KVMetaNameType_String = 1
}
KVMetaNameType;
Enumerators
KVMetaNameType_Integer
The metadata field is an integer.
KVMetaNameType_String
The metadata field is a string.
KVSumInfoType
This enumerated type defines the data type of the metadata field extracted from a
document. See “Extract Metadata” on page 96. It is defined in kvtypes.h.
Definition
typedef enum tag_KVSumInfoType
{
KV_String
= 0x1,
KV_Int4
= 0x2,
KV_DateTime
= 0x3,
KV_ClipBoard
= 0x4,
KV_Bool
= 0x5,
KV_Unicode
= 0x6,
KV_IEEE8
= 0x7,
KV_Other
= 0x8
}
KVSumInfoType;
XML Export SDK C Programming Guide
•
•
• 285
•
•
•
Chapter 11 Enumerated Types
Enumerators
KV_String
The value in the metadata field is a string.
KV_Int4
The value in the metadata field is an integer.
KV_DateTime
The value in the metadata field is a date and time. This type is a
64-bit value representing the number of 100-nanosecond intervals
since January 1, 1601 (Windows FILETIME EPOCH). You may
need to convert this value into another format.
KV_ClipBoard
Currently not supported.
KV_Bool
The value in the metadata field is a boolean.
KV_Unicode
The value in the metadata field is a Unicode string.
KV_IEEE8
The value in the metadata field is an IEEE 8-byte integer.
KV_Other
The value in the metadata field is user-defined.
KVSumType
This enumerated type defines the metadata fields that can be extracted from a
document.
 Types 0 to 34 and type 42 are office summary fields.
 Types 35 to 40 are computer-aided design (CAD) metadata fields.
 Type 41, KV_OrigAppVersion, is shared by office software and CAD.
Types 43 or greater are reserved for any non-standard metadata field defined in a
document. See “Extract Metadata” on page 96. It is defined in kvtypes.h.
Definition
typedef enum tag_KVSumType
{
KV_CodePage
=
KV_Title
=
KV_Subject
=
KV_Author
=
KV_Keywords
=
KV_Comments
=
KV_Template
=
KV_LastAuthor
=
•
•
286 ••
•
•
0,
1,
2,
3,
4,
5,
6,
7,
XML Export SDK C Programming Guide
KVSumType
KV_RevNumber
KV_EditTime
KV_LastPrinted
KV_Create_DTM
KV_LastSave_DTM
KV_PageCount
KV_WordCount
KV_CharCount
KV_ThumbNail
KV_AppName
KV_Security
KV_Category
KV_PresentationTarget
KV_Bytes
KV_Lines
KV_Paragraphs
KV_Slides
KV_Notes
KV_HiddenSlides
KV_MMClips
KV_ScaleCrop
KV_HeadingPairs
KV_TitlesofParts
KV_Manager
KV_Company
KV_LinksUpToDate
KV_HyperlinkBase
KV_Layouts
KV_Objects
KV_FileVersion
KV_LastFileVersion
KV_OrigFileVersion
KV_OrigFileType
KV_OrigAppVersion
KV_ContentStatus
KV_UserDefined
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43
}
KVSumType;
Enumerators
KV_CodePage
Code page of the document.
KV_Title
Contents of the “Title” property field taken from the source document.
KV_Subject
Contents of the “Subject” property field taken from the source document.
KV_Author
Contents of the “Author” property field taken from the source document.
XML Export SDK C Programming Guide
•
•
• 287
•
•
•
Chapter 11 Enumerated Types
•
•
288 ••
•
•
KV_Keywords
Contents of the “Keywords” property field taken from the source document.
KV_Comments
Contents of the “Comments” property field taken from the source document.
KV_Template
Contents of the “Template” property field taken from the source document.
KV_LastSavedby
Contents of the “Last saved by” property field taken from the source
document.
KV_RevNumber
Contents of the “Revision number” property field taken from the source
document.
KV_EditTime
Contents of the “Total editing time” property field taken from the source
document.
KV_LastPrinted
Contents of the “Printed” property field taken from the source document.
KV_Create_DTM
Contents of the “Created” property field taken from the source document.
KV_LastSave_DTM
Contents of the “Modified” property field taken from the source document.
KV_PageCount
Contents of the “Pages” property field taken from the source document. The
field provides the number of pages in the document.
KV_WordCount
Contents of the “Words” property field taken from the source document. The
field provides the number of words in the document.
KV_CharCount
Contents of the “Characters” property field taken from the source document.
The field provides the number of characters in the document.
KV_ThumbNail
Thumbnail image of a document.
KV_AppName
Contents of the “Type” property field taken from the source document. This
field identifies the application used to read the document.
KV_Security
Contents of the “Attributes” property field taken from the source document.
KV_Category
Contents of the “Category” property field taken from the source document.
KV_PresentationTarget
Target format for presentations (35mm, printer, video, and so forth).
KV_Bytes
Contents of the “Size” property field taken from the source document. The
field provides the size of the file in bytes.
KV_Lines
Contents of the “Lines” property field taken from the source document. The
field provides the number of lines in the document.
KV_Paragraphs
Contents of the “Paragraphs” property field taken from the source
document. The field provides the number of paragraphs in the document.
KV_Slides
Contents of the “Slides” property field taken from a presentation document.
The field provides the number of slides in the document.
KV_Notes
Contents of the “Notes” property field taken from a presentation document.
The field provides the number of notes in the document.
XML Export SDK C Programming Guide
KVSumType
KV_HiddenSlides
Contents of the “Hidden slides” property field taken from a presentation
document. The field provides the number of hidden slides in the document.
KV_MMClips
Contents of the “Multimedia clips” property field taken from a presentation
document. The field provides the number of multimedia clips in the
document.
KV_ScaleCrop
Boolean specifies whether thumbnails are cropped or scaled.
KV_HeadingPairs
Internally used property indicating the grouping of different document parts
and the number of items in each group.
KV_TitlesofParts
Contents of the “Document Contents” property field taken from the source
document. The field contains a list of the parts of the file, such as the names
of macro sheets in Microsoft Excel or the headings in Word.
KV_Manager
Contents of the “Manager” property field taken from the source document.
KV_Company
Contents of the “Company” property field taken from the source document.
KV_LinksUpToDate
Boolean specifies whether links in the document are resolved and current.
KV_HyperlinkBase
The base address used for all relative links in the file.
KV_Layouts
The number of layouts in the AutoCAD drawing.
KV_Objects
The approximate number of objects in the AutoCAD drawing.
KV_FileVersion
The AutoCAD version (for example, R13, R14) of the drawing.
KV_LastFileVersion
The AutoCAD version (for example, R13, R14) that the AutoCAD drawing
was last saved as.
KV_OrigFileVersion
The AutoCAD version (for example, R13, R14) of the original source file.
KV_OrigFileType
The AutoCAD file type (for example, DWG, DXF or DWB) of the original
source file.
KV_OrigAppVersion
The AutoCAD version (for example, R13, R14) of the application that
created the originally source file.
KV_ContentStatus
The status of the content, for example Draft, Reviewed, or Final.
KV_UserDefined
Contents of the first entry in the array of non-standard metadata. This could
be user-defined metadata, or metadata unique to a file type.
XML Export SDK C Programming Guide
•
•
• 289
•
•
•
Chapter 11 Enumerated Types
LPDF_DIRECTION
This enumerated type defines the paragraph direction of extracted paragraphs
from a PDF file when logical order is enabled. See “Convert PDF Files to a Logical
Reading Order” on page 112It is defined in kvtypes.h.
Definition
typedef enum{
LPDF_RAW = 0,
LPDF_LTR,
LPDF_RTL,
LPDF_AUTO
} LPDF_DIRECTION ;
Enumerators
•
•
290 ••
•
•
LPDF_RAW
Unstructured paragraph flow. This is the default behavior.
LPDF_LTR
Logical reading order and left-to-right paragraph direction.
LPDF_RTL
Logical reading order and right-to-left paragraph direction.
LPDF_AUTO
Logical reading order. The PDF reader determines the paragraph
direction for each PDF page, and then sets the direction accordingly.
This is the default when logical order is enabled.
XML Export SDK C Programming Guide
Appendixes
This section lists supported formats, supported character sets
and redistributed files, and provides information on format
detection. It contains the following appendixes:

Supported Formats

Files Required for Redistribution

Export Tokens

Character Sets

File Format Detection

File Formats and Extensions

Extract and Format Lotus Notes Sub Files

Password Protected Files
Appendixes
•
•
292 ••
•
•
XML Export SDK C Programming Guide
APPENDIX A
Supported Formats
This section lists information about the file formats that can be detected and
processed (either filtered, converted, or displayed) by the KeyView suite of
products. The KeyView suite includes KeyView Filter SDK, KeyView Export SDK,
and KeyView Viewing SDK.

Supported Formats
 Archive Formats
 Binary Format
 Computer-Aided Design Formats
 Database Formats
 Desktop Publishing
 Display Formats
 Graphic Formats
 Mail Formats
 Multimedia Formats
 Presentation Formats
 Spreadsheet Formats
 Text and Markup Formats
 Word Processing Formats

Supported Formats (Detected)
XML Export SDK C Programming Guide
•
•
• 293
•
•
•
Appendix A Supported Formats
Supported Formats
The tables in this section provide the following information:

The file formats supported by the Filter API, Export API, Viewing API, and File
Extraction API. The supported versions and the format’s extension are also
listed.
The formats listed in this section can also be detected by the KeyView format
detection module (kwad). The section “Supported Formats (Detected)” on
page 313 lists formats that can be detected, but cannot be filtered, converted,
or displayed.

The file formats for which KeyView can detect and extract the character set
and metadata information (properties such as title, author, and subject).
Even though a file format may be able to provide character set information,
some documents may not contain character set information. Therefore, the
document reader would not be able to determine the character set of the
document. In this case, either the operating system code page or the
character set specified in the API is used.

The document reader used to filter each format.
Table 17 Key to Support Tables
Symbol
Description
Y
Format is supported.
Metadata can be extracted for this format.
Character set can be determined for this format.
N
Format is not supported.
Metadata cannot be extracted for this format.
Character set cannot be determined for this format.
•
•
294 ••
•
•
P
Partial metadata is extracted from this format. Some non-standard fields
are not extracted.
T
Only text is extracted from this format. Formatting information is not
extracted.
M
Only metadata (title, subject, author, and so on) is extracted from this
format. Text and formatting information are not extracted.
XML Export SDK C Programming Guide
XML Export SDK C Programming Guide
Archive Formats
Table 18 Supported Archive Formats
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header
/Footer
7-Zip
4.57
z7zsr
7Z
N
N
Y
Y
N
n/a
N
AD1
n/a
ad1sr
AD1
N
N
Y
Y
N
n/a
N
BinHex
n/a
kvhqxsr
HQX
N
N
Y
Y
N
n/a
N
Bzip2
n/a
bzip2sr
BZ2
N
N
Y
Y
N
n/a
N
Expert Witness
Compression
Format (EnCase)
6
encasesr
E01, L01
N
N
Y
Y
N
n/a
N
7
encase2sr
Lx01
N
N
Y
Y
N
n/a
N
GZIP
2
kvgzsr
GZ
N
N
N
Y
N
n/a
N
kvgz
GZ
N
N
Y
N
N
n/a
N
n/a
isosr
ISO
N
N
Y
Y
N
n/a
N
Java Archive
n/a
unzip
JAR
N
N
Y
Y
N
n/a
N
Legato
EMailXtender
Archive
n/a
emxsr
EMX
N
N
Y
Y
N
n/a
N
MacBinary
n/a
macbinsr
BIN
N
N
Y
Y
N
n/a
N
Mac Disk Copy Disk
Image
n/a
dmgsr
DMG
N
N
Y
Y
N
n/a
N
Microsoft Backup
File
n/a
bkfsr
BKF
N
N
Y
Y
N
n/a
N
Microsoft Cabinet
format
1.3
cabsr
CAB
N
N
Y
Y
N
n/a
N
Microsoft Compiled
HTML Help
3
chmsr
CHM
N
N
Y
Y
N
n/a
N
•
•
•
• 295
•
•
Supported Formats
ISO
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header
/Footer
Microsoft
Compressed Folder
n/a
lzhsr
LZH
LHA
N
N
N
Y
N
n/a
N
PKZIP
through 9.0
unzip
ZIP
N
N
Y
Y
N
n/a
N
RAR archive
2.0 through
3.5
rarsr
RAR
N
N
N
Y
N
n/a
N
Tape Archive
n/a
tarsr
TAR
N
N
Y
Y
N
n/a
N
UNIX Compress
n/a
kvzeesr
Z
N
N
N
Y
N
n/a
N
kvzee
Z
N
N
Y
N
N
n/a
N
UUEncoding
all versions
uudsr
UUE
N
N
Y
Y
N
n/a
N
Windows Scrap File
n/a
olesr
SHS
N
N
N
Y
N
n/a
N
WinZip
through 10
unzip
ZIP
N
N
Y
Y
N
n/a
N
Binary Format
Table 19 Supported Binary Formats
XML Export SDK C Programming Guide
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header/
Footer
Executable
n/a
exesr
EXE
N
N
Y
N
N
n/a
N
Link Library
n/a
exesr
DLL
N
N
Y
N
N
n/a
N
Appendix A Supported Formats
•
•
296 ••
•
•
Table 18 Supported Archive Formats
XML Export SDK C Programming Guide
Computer-Aided Design Formats
Table 20 Supported CAD Formats
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header/
Footer
AutoCAD
Drawing
R13, R14,
R15/2000,
2004, 2007,
2010, 2013
kpODArdr
kpDWGrdr1
DWG
Y
Y2
Y2
N
Y
Y
N
AutoCAD
Drawing
Exchange
R13, R14,
R15/2000,
2004, 2007,
2010, 2013
kpODArdr
kpDXFrdr1
DXF
Y
Y3
Y1
N
Y
Y
N
CATIA formats
5
kpCATrdr
CAT4
Y
N
N
N
Y
N
N
Microsoft Visio
4, 5, 2000,
2002, 2003,
2007, 20105
vsdsr
VSD
Y
Y
Y
Y6
Y
Y
N
kpVSDrdr
VSD, VSS
VST
N
Y
Y
N
Y
Y
N
2013
ActiveX
components
VSDM
VSSM
VSTM
VSDX
VSSX
VSTX
N
N
Y7
N
Y
N
N
1. On Windows platforms, kpODArdr is used for all versions up to 2007 and graphic rendering is supported; for later versions, only text extraction
is supported through the kpDWGrdr or kpDXFrdr reader.
2. On non-Windows platforms, graphic rendering is supported through the kpDWGrdr reader for versions R13, R14, R15, and R18 (2004); for other
versions, only text extraction is supported.
4. All CAT file extensions, for example CATDrawing, CATProduct, CATPart, and so on.
5. Viewing and Export use the graphic reader, kpVSDrdr, for Microsoft Visio 2003, 2007, and 2010, and vsdsr for all earlier versions; image fidelity
in Viewing and Export is therefore only supported for versions 2003 and above. Filter uses vsdsr for all versions.
6. Extraction of embedded OLE objects is supported for Filter on Windows platforms only.
•
•
•
• 297
•
•
Supported Formats
3. On non-Windows platforms, graphic rendering is supported through the kpDXFrdr reader for versions R13, R14, R15, and R18 (2004); for other
versions, only text extraction is supported.
Database Formats
Table 21 Supported Database Formats
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header
/Footer
dBase Database
III+, IV
dbfsr
DBF
Y
Y
Y
N
N
N
N
Microsoft Access
95, 97, 2000,
2002, 2003,
2007, 2010,
2013
mdbsr
MDB,
ACCDB
Y
T
T
N
N
Y1
N
Microsoft Project
2000, 2002,
2003, 2007,
2010, 2013
mppsr
MPP
Y
Y
Y
Y
Y
Y
N
1. Charset is not supported for Microsoft Access 95 or 97.
Desktop Publishing
Table 22 Supported Desktop Publishing Formats
XML Export SDK C Programming Guide
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header/
Footer
Microsoft
Publisher
98 to 2013
mspubsr
PUB
Y
T
T
Y
Y
Y
N
Appendix A Supported Formats
•
•
298 ••
•
•
7. Visio 2013 is supported in Viewing only, with the support of ActiveX components from the Microsoft Visio 2013 Viewer. Image fidelity is supported
but other features, such as highlighting, are not.
XML Export SDK C Programming Guide
Display Formats
Table 23 Supported Display Formats
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header/
Footer
Adobe PDF
1.1 to 1.7
pdfsr
PDF
Y
Y
N
Y1
Y
Y
N
kppdfrdr
PDF
N
Y
Y
N
N
N
N
kppdf2rdr2
PDF
N
Y
Y
N
N
N
N
1. Includes support for extraction of subfiles from PDF Portfolio documents.
2. kppdf2rdr is an alternate graphic-based reader that produces high-fidelity output but does not support other features such as highlighting
or text searching.
Graphic Formats
Table 24 Supported Graphic Formats
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header
/Footer
Computer Graphics
Metafile
n/a
kpcgmrdr1
CGM
Y
Y
Y
N
N
N
N
CorelDRAW2
through
9.0
kpcdrrdr
CDR
N
Y
Y
N
N
N
N
10, 11,
12, X3
n/a
kpdcxrdr
DCX
N
Y
Y
N
N
N
N
Digital Imaging &
Communications in
Medicine (DICOM)
n/a
dcmsr
DCM
M
N
N
N
Y
N
N
Encapsulated
PostScript (raster)
TIFF
header
kpepsrdr
EPS
N
Y
Y
N
N
N
N
Enhanced Metafile
n/a
kpemfrdr
EMF
Y
Y
Y
N
Y
N
N
•
•
•
• 299
•
•
Supported Formats
DCX Fax System
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header
/Footer
GIF
87, 89
kpgifrdr
GIF
N
Y
Y
N
N
N
N
M
M
N
N
Y
N
N
gifsr
JBIG2
n/a
kpJBIG2rdr
JBIG2
N
Y
Y
N
N
N
N
JPEG
n/a
kpjpgrdr
JPEG
N
Y
Y
N
N
N
N
M
M
N
N
Y
N
N
N
Y
Y
N
N
N
N
jp2000sr
JP2, JPF,
J2K,
JPWL,
JPX, PGX
M
M
N
N
Y
N
N
jpgsr
JPEG 2000
n/a
kpjp2000rdr
XML Export SDK C Programming Guide
Lotus AMIDraw
Graphics
n/a
kpsdwrdr
SDW
N
Y
Y
N
N
N
N
Lotus Pic
n/a
kppicrdr
PIC
Y
Y
Y
N
N
N
N
Macintosh Raster
2
kppctrdr
PIC
PCT
N
Y
Y
N
N
N
N
MacPaint
n/a
kpmacrdr
PNTG
N
Y
Y
N
N
N
N
Microsoft Office
Drawing
n/a
kpmsordr
MSO
N
Y
Y
N
N
N
N
Omni Graffle
n/a
kpGFLrdr
GRAFFLE
Y
N
N
N
Y
Y
N
PC PaintBrush
3
kppcxrdr
PCX
N
Y
Y
N
N
N
N
Portable Network
Graphics
n/a
kppngrdr
PNG
N
Y
Y
N
N
N
N
pngsr
PNG
M
M
N
N
Y
N
N
SGI RGB Image
n/a
kpsgirdr
RGB
N
Y
Y
N
N
N
N
Sun Raster Image
n/a
kpsunrdr
RS
N
Y
Y
N
N
N
N
Appendix A Supported Formats
•
•
300 ••
•
•
Table 24 Supported Graphic Formats
XML Export SDK C Programming Guide
Table 24 Supported Graphic Formats
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header
/Footer
Tagged Image File
through
6.03
tifsr
TIFF
M
M
N
N
Y
N
N
kptifrdr
TIFF
N
Y
Y
N
N
N
N
Truevision Targa
2
kptrardr
TGA
N
Y
Y
N
N
N
N
Windows Animated
Cursor
n/a
kpanirdr
ANI
N
Y
Y
N
N
N
N
Windows Bitmap
n/a
kpbmprdr
BMP
N
Y
Y
N
N
N
N
bmpsr
BMP
M
M
N
N
Y
N
N
Windows Icon Cursor
n/a
kpicordr
ICO
N
Y
Y
N
N
N
N
Windows Metafile
3
kpwmfrdr
WMF
Y
Y
Y
N
N
N
N
WordPerfect Graphics 1
1
kpwpgrdr
WPG
N
Y
Y
N
N
N
N
WordPerfect Graphics 2
2, 7
kpwg2rdr
WPG
N
Y
Y
N
N
N
N
1. Files with non-partitioned data are supported.
2. CDR/CDR with TIFF header.
3. The following compression types are supported: no compression, CCITT Group 3 1-Dimensional Modified Huffman, CCITT Group 3 T4 1-Dimensional, CCITT Group 4 T6, LZW, JPEG (only Gray, RGB and CMYK color space are supported), and PackBits.
Supported Formats
•
•
•
• 301
•
•
Table 25 Supported Mail Formats
XML Export SDK C Programming Guide
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header
/Footer
Documentum
EMCMF
n/a
msgsr
EMCMF
N
N
Y
Y
Y
Y
N
Domino XML
Language1
n/a
dxlsr
DXL
N
N
Y
Y
Y
N
N
GroupWise FileSurf
n/a
gwfssr
GWFS
N
N
Y
Y
Y
N
N
Legato Extender
n/a
onmsr
ONM
N
N
Y
Y
Y
N
N
Lotus Notes
database
4, 5, 6.0, 6.5,
7.0, 8.0
nsfsr
NSF
N
N
Y
Y
Y
N
N
Mailbox2
Thunderbird
1.0, Eudora 6.2
mbxsr3
MBX
N
N
T
Y
Y
Y
N
Microsoft
Entourage
Database
2004
entsr
various
N
N
Y
Y
Y
Y
N
Microsoft Outlook
97, 2000, 2002,
2003, 2007,
2010, 2013
msgsr3
MSG,
OFT
Y
T
T
Y
Y
Y4
N
Microsoft Outlook
DBX
5.0, 6.0
dbxsr
DBX
N
N
Y
Y
Y
Y
N
Microsoft Outlook
Express
Windows 6
MacIntosh 5
emlsr3
EML
Y
T
T
Y
Y
Y
N
mbxsr3
EML
N
N
T
Y
Y
Y
N
Microsoft Outlook
iCalendar
1.0, 2.0
icssr
ICS, VCS
N
N
Y
Y
Y
Y
N
Microsoft Outlook
for Macintosh
2011
olmsr
OLM
N
N
Y
Y
N
Y
N
Appendix A Supported Formats
•
•
302 ••
•
•
Mail Formats
XML Export SDK C Programming Guide
Table 25 Supported Mail Formats
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header
/Footer
Microsoft Outlook
Offline Storage File
97, 2000, 2002,
2003, 2007,
2010, 2013
pffsr
OST
N
N
Y
Y
Y
Y
N
Microsoft Outlook
Personal Folder
97, 2000, 2002,
2003, 2007,
2010, 2013
pstsr3,5
PST
N
N
Y
Y
Y
N
N
97, 2000, 2002,
2003, 2007,
2010, 2013
pstnsr
PST
N
N
Y
Y
Y
Y
N
Microsoft Outlook
vCard Contact
2.1, 3.0, 4.0
vcfsr
VCF
Y
Y
T
N
Y
N
N
Text Mail (MIME)
n/a
emlsr3
various
Y
T
T
Y
Y
Y
N
mbxsr3
various
Y
T
T
Y
Y
Y
N
tnefsr
various
N
N
Y
Y
Y
Y
N
Transport Neutral
Encapsulation
Format
n/a
1. Only supports non-encrypted embedded files.
2. KeyView supports MBX files created by Eudora Email, and Mozilla Thunderbird. MBX files created by other common mail applications are typically filtered, converted, and displayed.
3. This reader supports both clear signed and encrypted S/MIME. KeyView supports S/MIME for PST, EML, MBX, and MSG files.
4. Returns “Unicode” character set for version 2003 and up, and “Unknown” character set for previous versions.
5. Uses Microsoft Messaging Application Programming Interface (MAPI).
Supported Formats
•
•
•
• 303
•
•
Viewing SDK plays some multimedia files using the Windows Media Control Interface (MCI). MCI is a set
of Windows APIs that communicate with multimedia devices.
Table 26 Supported Multimedia Formats
XML Export SDK C Programming Guide
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header/
Footer
Advanced Systems
Format
1.2
asfsr
ASF
WMA
WMV
N
N
N
N
Y
N
N
Audio Interchange
File Format
n/a
MCI
AIFF
N
N
Y
N
N
N
N
aiffsr
AIFF
M
N
N
N
Y
N
N
Microsoft Wave
Sound
n/a
MCI
WAV
N
N
Y
N
N
N
N
riffsr
WAV
M
N
N
N
Y
N
N
MIDI
n/a
MCI
MID
N
N
Y
N
N
N
N
MPEG-1 Audio
layer 3
ID3 v1 and v2
MCI
MP3
N
N
Y
N
N
N
N
mp3sr
MP3
M
M
Y
N
Y
N
N
MPEG-1 Video
2, 3
MCI
MPG
N
N
Y
N
N
N
N
MPEG-2 Audio
n/a
MCI
MPEGA
N
N
Y
N
N
N
N
MPEG-4 Audio
n/a
mpeg4sr
MP4
3GP
M
N
N
N
Y
N
N
NeXT/Sun Audio
n/a
MCI
AU
N
N
Y
N
N
N
N
QuickTime Movie
2, 3, 4
MCI
QT
MOV
N
N
Y
N
N
N
N
Windows Video
2.1
MCI
AVI
N
N
Y
N
N
N
N
Appendix A Supported Formats
•
•
304 ••
•
•
Multimedia Formats
XML Export SDK C Programming Guide
NOTE Depending on the default multimedia player installed on your computer, the View API may
not be able to play some supported multimedia formats. To play multimedia files, the View API uses
the Windows Media Control Interface (MCI) to communicate with the multimedia player installed on
your computer. If the player does not play a multimedia file that is supported by the Viewing SDK, the
View API will not be able to play the file.
If you cannot play a supported multimedia file using the View API, install a different multimedia player
or compressor/decompressor (codec) component.
Presentation Formats
Table 27 Supported Presentation Formats
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header
/Footer
Apple iWork Keynote
2, 3, ‘08,
‘09
kpIWPGrdr
GZ
Y
Y
Y
N
Y
Y
N
Applix Presents
4.0, 4.2,
4.3, 4.4
kpagrdr
AG
Y
Y
Y
N
N
N
N
Corel Presentations
6, 7, 8, 9,
10, 11, 12,
X3
kpshwrdr
SHW
Y
Y
Y
N
N
N
N
Extensible Forms
Description
Language
n/a
kpXFDLrdr
XFD
XFDL
Y
Y
Y
N
Y
Y
N
Lotus Freelance
Graphics
96, 97, 98,
R9, 9.8
kpprzrdr
PRZ
Y
Y
Y
N
N
N
N
Lotus Freelance
Graphics 2
2
kpprerdr
PRE
Y
Y
Y
N
N
N
N
Macromedia Flash
through 8.0
swfsr
SWF
Y
Y
Y
N
N
Y1
N
Microsoft OneNote
2007,
2010, 2013
kpONErdr
ONE
ONETOC2
Y
Y
Y
Y
N
Y
N
•
•
•
• 305
•
•
Supported Formats
Format
XML Export SDK C Programming Guide
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header
/Footer
Microsoft
PowerPoint
Macintosh
98
kpp40rdr
PPT
Y
Y
Y
N
N
N
N
2001, v.X,
2004
kpp97rdr
PPT
PPS
POT
Y
Y
Y
N
P
Y
N
Microsoft
PowerPoint PC
4
kpp40rdr
PPT
Y
Y
Y
N
P
N
N
Microsoft
PowerPoint
Windows
95
kpp95rdr
PPT
Y
Y
Y
N
P
Y
N
Microsoft
PowerPoint
Windows
97, 2000,
2002, 2003
kpp97rdr
PPT
PPS
POT
Y
Y
Y
Y
P
Y
Y2
Microsoft
PowerPoint
Windows XML
2007,
2010, 2013
kpppxrdr
PPTX
PPTM
POTX
POTM
PPSX
PPSM
PPAM
Y
Y
Y
Y
Y
Y
Y
OASIS Open
Document Format
1, 23
kpodfrdr
SXD
SXI
ODG
ODP
Y
Y
Y
Y4
Y
Y
N
OpenOffice Impress
1, 1.1
sosr
SXI
SXP
ODP
Y
T
T
N
Y
Y
N
StarOffice Impress
6, 7
sosr
SXI
SXP
ODP
Y
T
T
N
Y
Y
N
1. The character set cannot be determined for versions 5.x and lower.
Appendix A Supported Formats
•
•
306 ••
•
•
Table 27 Supported Presentation Formats
XML Export SDK C Programming Guide
2. Slide footers are supported for Microsoft PowerPoint 97 and 2003.
3. Generated by OpenOffice Impress 2.0, StarOffice 8 Impress, and IBM Lotus Symphony Presentation 3.0.
4. Supported using the embedded objects reader olesr..
Spreadsheet Formats
Table 28 Supported Spreadsheet Formats
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header/
Footer
Apple iWork Numbers
‘08, ‘09
iwsssr
GZ
Y
Y
Y
N
Y
Y
N
Applix Spreadsheets
4.2, 4.3, 4.4
assr
AS
Y
Y
Y
N
N
Y
N
Comma Separated
Values
n/a
csvsr
CSV
Y
Y
Y
N
N
N
N
Corel Quattro Pro
5, 6, 7, 8
qpssr
WB2
WB3
Y
Y
Y
N
P
Y
N
X4
qpwsr
QPW
Y
N
Y
N
P
Y
N
Data Interchange
Format
n/a
difsr
DIF
Y
Y
Y
N
N
N
N
Lotus 1-2-3
96, 97, R9,
9.8
l123sr
123
Y
Y
Y
N
P
Y
N
Lotus 1-2-3
2, 3, 4, 5
wkssr
WK4
Y
Y
Y
N
N
Y
N
Lotus 1-2-3 Charts
2, 3, 4, 5
kpchtrdr
123
N
Y
Y
N
N
N
N
Microsoft Excel Charts
2, 3, 4, 5, 6,
7
kpchtrdr
XLS
N
Y
Y
N
N
N
N
Microsoft Excel
Macintosh
98, 2001,
v.X, 2004
xlssr
XLS
Y
Y
Y
Y1
Y
Y
N
•
•
•
• 307
•
•
Supported Formats
Format
XML Export SDK C Programming Guide
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header/
Footer
Microsoft Excel
Windows
2.2 through
2003
xlssr
XLS
XLW
XLT
XLA
Y
Y
Y
Y2
Y
Y
Y
Microsoft Excel
Windows XML
2007, 2010,
2013
xlsxsr
XLSX
XLTX
XLSM
XLTM
XLAM
Y
Y
Y
Y
Y
Y
Y
Microsoft Excel Binary
Format
2007, 2010,
2013
xlsbsr
XLSB
Y
Y
Y
N
N
N
N
Microsoft Works
Spreadsheet
2, 3, 4
mwssr
S30
S40
Y
Y
Y
N
N
Y
N
OASIS Open Document
Format
1, 23
odfsssr
ODS
SXC
STC
Y
Y
Y
Y1
Y
Y
N
OpenOffice Calc
1, 1.1
sosr
SXC
ODS
OTS
Y
T
T
N
Y
Y
N
StarOffice Calc
6, 7
sosr
SXC
ODS
Y
T
T
N
Y
Y
N
1. Supported using the embedded objects reader olesr.
2. Supported for versions 97 and higher using the embedded objects reader olesr.
3. Generated by OpenOffice Calc 2.0, StarOffice 8 Calc, and IBM Lotus Symphony Spreadsheet 3.0.
Appendix A Supported Formats
•
•
308 ••
•
•
Table 28 Supported Spreadsheet Formats
XML Export SDK C Programming Guide
Text and Markup Formats
Table 29 Supported Text and Markup Formats
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header/
Footer
ANSI
n/a
afsr
TXT
Y
Y
Y
N
N
N
N
ASCII
n/a
afsr
TXT
Y
Y
Y
N
N
N
N
HTML
3, 4
htmsr
HTM
Y
Y
Y
N
P
Y
N
Microsoft Excel Windows
XML
2003
xmlsr
XML
Y
T
T
N
Y
Y
N
Microsoft Word Windows
XML
2003
xmlsr
XML
Y
T
T
N
Y
Y
N
Microsoft Visio XML
2003
xmlsr
VDX
VTX
Y
T
T
N
Y
Y
N
MIME HTML
n/a
mhtsr
MHT
Y
Y
Y
N
Y
Y
N
Rich Text Format
1 through
1.7
rtfsr
RTF
Y
Y
Y
N
P
Y
Y
Unicode HTML
n/a
unihtmsr
HTM
Y
Y
Y
N
Y
Y
N
Unicode Text
3, 4
unisr
TXT
Y
Y
Y
N
N
Y
N
XHTML
1.0
htmsr
HTM
Y
Y
Y
N
Y
Y
N
XML (generic)
1.0
xmlsr
XML
Y
T
T
N
Y
Y
N
Supported Formats
•
•
•
• 309
•
•
Table 30 Supported Word Processing Formats
XML Export SDK C Programming Guide
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header/
Footer
Adobe FrameMaker
Interchange Format
5, 5.5, 6, 7
mifsr
MIF
Y
Y
Y
N
N
Y
N
Apple iChat Log
1, AV 2
AV 2.1, AV 3
ichatsr
ICHAT
Y
Y
Y
N
N
N
N
Apple iWork Pages
‘08, ‘09
iwwpsr
GZ
Y
Y
Y
N
Y
Y
N
Applix Words
3.11, 4, 4.1,
4.2, 4.3, 4.4
awsr
AW
Y
Y
Y
N
N
Y
Y
Corel WordPerfect
Linux
6.0, 8.1
wp6sr
WPS
Y
Y
Y
N
P
Y
N
Corel WordPerfect
Macintosh
1.02, 2, 2.1,
2.2, 3, 3.1
wpmsr
WPM
Y
Y
Y
N
N
Y
N
Corel WordPerfect
Windows
5, 5.1
wosr
WO
Y
Y
Y
N
P
Y
Y
Corel WordPerfect
Windows
6, 7, 8, 9, 10,
11, 12, X3
wp6sr
WPD
Y
Y
Y
N
P
Y
Y
DisplayWrite
4
dw4sr
IP
Y
Y
Y
N
N
Y
N
Folio Flat File
3.1
foliosr
FFF
Y
Y
Y
N
Y
Y
Y
Founder Chinese
E-paper Basic
3.2.1
cebsr1
CEB
Y
N
N
N
N
N
N
Fujitsu Oasys
7
oa2sr
OA2
Y
Y
Y
N
P
N
N
Haansoft Hangul
97
hwpsr
HWP
Y
N
N
N
N
Y
N
2002, 2005,
2007, 2010
hwposr
HWP
Y
T
T
Y
Y
Y
N
2.0
hl7sr
HL7
Y
Y
Y
N
Y
Y
N
Health level7
Appendix A Supported Formats
•
•
310 ••
•
•
Word Processing Formats
XML Export SDK C Programming Guide
Table 30 Supported Word Processing Formats
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header/
Footer
IBM DCA/RFT
(Revisable Form Text)
SC23-07581
dcasr
DC
Y
Y
Y
N
N
Y
N
JustSystems Ichitaro
8 through
2013
jtdsr
JTD
Y
Y
Y
N
P
N
Y
Lotus AMI Pro
2, 3
lasr
SAM
Y
Y
Y
N
P
Y
Y
Lotus AMI Professional
Write Plus
2.1
lasr
AMI
Y
Y
Y
N
N
N
Y
Lotus Word Pro
96, 97, R9
lwpsr
LWP
Y
Y
Y
N
P
N
Y
Lotus SmartMaster
96, 97
lwpsr
MWP
Y
Y
Y
N
N
N
N
Microsoft Word
Macintosh
4, 5, 6, 98
mbsr
DOC
Y
Y
Y
N
Y
N
Y
2001, v.X,
2004
mw8sr
DOC
DOT
Y
Y
Y
Y2
Y
Y
N
Microsoft Word PC
4, 5, 5.5, 6
mwsr
DOC
Y
Y
Y
N
N
N
Y
Microsoft Word
Windows
1.0 and 2.0
misr
DOC
Y
Y
Y
N
N
N
Y
Microsoft Word
Windows
6, 7, 8, 95
mw6sr
DOC
Y
Y
Y
N
Y
Y
Y
Microsoft Word
Windows
97, 2000,
2002, 2003
mw8sr
DOC
DOT
Y
Y
Y
Y2
Y
Y
Y
Microsoft Word
Windows XML
2007, 2010,
2013
mwxsr
DOCM
DOCX
DOTX
DOTM
Y
Y
Y
Y
Y
Y
Y
Microsoft Works
1, 2, 3, 4
mswsr
WPS
Y
Y
Y
N
N
N
Y
Microsoft Works
6, 2000
msw6sr
WPS
Y
Y
Y
N
N
N
Y
•
•
•
• 311
•
•
Supported Formats
Format
XML Export SDK C Programming Guide
Format
Version
Reader
Extension
Filter
Export
View
Extract
Metadata
Charset
Header/
Footer
Microsoft Windows
Write
1, 2, 3
mwsr
WRI
Y
Y
Y
N
N
Y
N
OASIS Open
Document Format
1, 23
odfwpsr
ODT
SXW
STW
Y
Y
Y
Y2
Y
Y
Y
Omni Outliner
v3, OPML,
OOutline
oo3sr
OO3
OPML
OOUTLINE
Y
Y
Y
N
N
Y
N
OpenOffice Writer
1, 1.1
sosr
SXW
ODT
Y
T
T
N
Y
Y
N
Open Publication
Structure eBook
2.0, 3.0
epubsr
EPUB
Y
Y
Y
N
Y
Y
N
StarOffice Writer
6, 7
sosr
SXW
ODT
Y
T
T
N
Y
Y
N
Skype Log
3
skypesr
DBB
Y
Y
Y
N
N
N
N
WordPad
through
2003
rtfsr
RTF
Y
Y
Y
N
P
Y
N
XML Paper
Specification
n/a
xpssr
XPS
Y
T
T
N
N
N
N
XyWrite
4.12
xywsr
XY4
Y
Y
Y
N
N
N
N
Yahoo! Instant
Messenger
n/a
yimsr4
DAT
Y
Y
Y
N
N
N
N
1. This reader is only supported on Windows 32-bit platforms.
2. Supported using the embedded objects reader olesr.
3. Generated by OpenOffice Writer 2.0, StarOffice 8 Writer, and IBM Lotus Symphony Documents 3.0.
4. To successfully use this reader, you must set the KV_YAHOO_ID environment variable to the Yahoo user ID. You can optionally set the
KV_OTHER_YAHOO_ID environment variable to the other Yahoo user ID. If you do not set it, “Other” is used by default. If you enter incorrect
values for the environment variables, erroneous data is generated.
Appendix A Supported Formats
•
•
312 ••
•
•
Table 30 Supported Word Processing Formats
Supported Formats (Detected)
Supported Formats (Detected)
The file formats listed in this section can be detected by the KeyView format
detection module (kwad), but cannot be filtered, converted, or displayed. The
detection module determines a file’s format and reports the information to the
developer’s application.
The formats listed in “Supported Formats” on page 294 can be detected as well as
filtered, exported, and viewed.
Ability Office (SS, DB, GR, WP, COM)
AC3 audio
ACT
Adobe FrameMaker
Adobe FrameMaker Markup Language
AES Multiplus Comm
Aldus Freehand (Macintosh)
Aldus PageMaker (DOS)
Aldus PageMaker (Macintosh)
Amiga IFF-8SVX sound
Amiga MOD sound
Apple Binary Property List
Apple Double
Apple Photoshop Document
Apple Single
Apple XML Property List
Appleworks
Applix Alis
Applix Asterix
Applix Graphics
ARC/PAK Archive
ARJ Archive
ASCII-armored PGP encoded
ASCII-armored PGP Public Keyring
ASCII-armored PGP signed
AutoDesk Animator FLIC Animation
AutoDesk Animator Pro FLIC Animation
AutoDesk WHIP
AutoShade Rendering
BlackBerry Activation File
CADAM Drawing
CADAM Drawing Overlay
CCITT Group 3 1-Dimensional (G31D)
COMET TOP Word
Compactor/Compact Pro Archive
Convergent Tech DEF Comm.
Corel Draw CMX
cpio Archive (UNIX/VAX/SUN)
CPT Communication
Creative Voice (VOC) sound
Curses Screen Image (UNIX/VAX/SUN)
Data Point VISTAWORD
DCX Fax
DEC WPS PLUS
XML Export SDK C Programming Guide
•
•
• 313
•
•
•
Appendix A Supported Formats
•
•
314 ••
•
•
DECdx
Desktop Color Separation (DCS)
Device Independent file (DVI)
Digital Imaging and Communications in
Medicine (DICOM)
DG CEOwrite
DG Common Data Stream (CDS)
DIF Spreadsheet
Digital Document Interchange Format
(DDIF)
Disk Doubler Compression
EBCDIC Text
ENABLE
ENABLE Spreadsheet (SSF)
eFax
Envoy (EVY)
Executable UNIX/VAX/SUN
FileMaker (Macintosh)
Framework
Framework II
Freehand 11
FTP Session Data
GEM Bit Image
Ghost Disk Image
Google SketchUp
Graphics Environment Manager (GEM
VDI)
Harvard Graphics
Hewlett-Packard
Honey Bull DSA101
HP Graphics Language (HP-GL)
HP Graphics Language (Plotter)
HP PCL and PJL Languages
IBM 1403 Line Printer
IBM DCA-FFT
IBM DCF Script
Informix SmartWare II
Informix SmartWare II Communication File
Informix SmartWare II Database
Informix SmartWare Spreadsheet
Interleaf
Java Class file
JPEG File Interchange Format (JFIF)
KW ODA G31D (G31)
KW ODA G4 (G4)
KW ODA Internal G32D (G32)
KW ODA Internal Raw Bitmap (RBM)
Lasergraphics Language
Link Library UNIX/VAX/SUN
Lotus Notes Bitmap
Lotus Notes CDF
Lotus Screen Cam
Lyrix
Macromedia Director
MacWrite
MacWrite II
MASS-11
XML Export SDK C Programming Guide
Supported Formats (Detected)
MATLAB MAT Format
Micrografx Designer
Microsoft Access 2007
Microsoft Access 2007 Template
Microsoft Common Object File Format
(COFF)
Microsoft Compiled HTML Help
Microsoft Device Independent Bitmap
Microsoft Document Imaging (MDI)
Microsoft Excel 2007 Macro-Enabled
Spreadsheet Template
Microsoft Excel 2007 Spreadsheet
Template
Microsoft Exchange Server Database File
Microsoft Object File Library
Microsoft Office Drawing
Microsoft Office Groove
Microsoft Outlook Restricted Permission
Message File
Microsoft Windows Cursor (CUR)
Graphics
Microsoft Windows Group File
Microsoft Windows Help File
Microsoft Windows Icon (ICO)
Microsoft Windows NT Event Log
Microsoft Windows OLE 2 Encapsulation
Microsoft Windows Vista Event Log
Microsoft Word (UNIX)
Microsoft Works (Macintosh)
Microsoft Works Communication
(Macintosh)
Microsoft Works Communication
(Windows)
Microsoft Works Database (Macintosh)
Microsoft Works Database (PC)
Microsoft Works Database (Windows)
Microsoft Works Spreadsheet (Macintosh)
Microstation
Milestone Document
MORE Database Outliner (Macintosh)
MPEG-PS container with CDXA stream
MS DOS Batch File format
MS DOS Device Driver
MultiMate 4.0
Multiplan Spreadsheet
Navy DIF
NBI Async Archive Format
NBI Net Archive Format
Netscape Bookmark file
Nero Encrypted File
NeWS font file (SUN)
NIOS TOP
Nota Bene
NURSTOR Drawing
Object Module UNIX/VAX/SUN
ODA/ODIF
ODA/ODIF (FOD 26)
Office Writer
OLE DIB object
OLIDIF
Open PGP (new format packets)
XML Export SDK C Programming Guide
•
•
• 315
•
•
•
Appendix A Supported Formats
•
•
316 ••
•
•
OS/2 PM Metafile Graphics
PaperPort image file
Paradox (PC) Database
PC COM executable (detected in file mode
only)
PC Library Module
PC Object Module
PC True Type Font
PCD Image
PeachCalc Spreadsheet
Persuasion Presentation
PEX Binary Archive (SUN)
PGP Compressed Data
PGP Encrypted Data
PGP Public Keyring
PGP Secret Keyring
PGP Signature Certificate
PGP Signed and Encrypted Data
PGP Signed Data
Philips Script
Plan Perfect
Portable Bitmap Utilities (PBM)
Portable Greymap Utilities (PGM)
Portable Pixmap Utilities (PPM)
PostScript File
PostScript Type 1 Font File
PRIMEWORD
Program Information File
Q & A for DOS
Q & A for Windows
Quadratron Q-One (V1.93J)
Quadratron Q-One (V2.0)
Quark Express (Macintosh)
QuickDraw 3D Metafile (3DMF)
Real Audio
RealLegal E-Transcript
RealMedia Streaming Media
Reflex Database
RIFF Device Independent Bitmap
RIFF MIDI
RIFF Multimedia Movie
SAMNA Word IV
Samsung Electronics JungUm Global
format
SEG-Y Seismic Data format
Serialized Object Format (SOF)
Encapsulation
SGML
Simple Vector Format (SVF)
SMTP document
SolidWorks
Stuff It Archive (Macintosh)
SUN vfont definition
Supercalc Spreadsheet
SYLK Spreadsheet
Symphony Spreadsheet
Targon Word (V 2.0)
XML Export SDK C Programming Guide
Supported Formats (Detected)
Ultracalc Spreadsheet
Uniplex (V6.01)
Uniplex Ucalc Spreadsheet
UNIX SHAR Encapsulation
Usenet format
VRML
VRML 2.0
Volkswriter
Wang Office GDL Header Encapsulation
WANG PC
Wang WITA
WANG WPS Comm.
Web ARChive (WARC)
Windows C++ Object Storage
Windows Journal
Windows Micrografx Draw (DRW)
Windows Palette
Windows scrap file (SHS)
Word Connection
WordERA (V 1.0)
WordMARC word processor
WordPerfect General File
WordStar
WordStar 2000
WordStar 6.0
WriteNow
Writing Assistant word processor
X Bitmap (XBM)
X Image
X Pixmap (XPM)
Xerox 860 Comm.
Xerox DocuWorks
Xerox Writer word processor
Yahoo! Messenger chat log
XML Export SDK C Programming Guide
•
•
• 317
•
•
•
Appendix A Supported Formats
•
•
318 ••
•
•
XML Export SDK C Programming Guide
APPENDIX B
Files Required for
Redistribution
This section lists the Export files that may be redistributed in your applications
under the licensing agreement. These files are in the directory install\OS\
bin, where install is the pathname of the Export installation directory and OS
is the name of the operating system. This section contains the following topics:

Core Files

Support Files

Document Readers and Writers

Document Type Definition Files
NOTE On Windows systems, the libraries are .dll files. On
UNIX systems, the libraries are .so, .a, or .sl files.
XML Export SDK C Programming Guide
•
•
• 319
•
•
•
Appendix B Files Required for Redistribution
Core Files
The following core files may be redistributed with your application:
•
•
320 ••
•
•
File
Description
formats_e.ini
Initialization file. For more information on this file, see “Determine
Format Support” on page 348.
htmlexport.*
Required by the Java API.
xmlcnv.*
XML converter for the document token stream.
kpifcnvt.*
Graphic conversion routines.
kpifutil.*
Graphic utility routines.
kvxtract.*
File Extraction interface.
kvxml.*
XML Export C API.
kvexport.*
Export C API. Interface to the HTML and XML Export C APIs.
kvolefio.*
Embedded OLE object writer.
kvutil.*
Internal KeyView utility functions.
kvxpgsa.*
Interface between presentations or graphic readers and the
Export API.
kvxsssa.*
Interface between spreadsheet readers and the Export API.
kvxwpsa.*
Interface between word processing readers and the Export API.
kwad.*
File auto-recognition module.
regsvr32.exe
A Microsoft Windows program used to register in-process COM
objects.
txtcnv.*
Converter for document token stream.
xmlexport.*
Required by the Java API.
XML Export SDK C Programming Guide
Support Files
Support Files
The following support files may be redistributed with your application:
File
Description
bentofio.*
Required by l123sr.* and kpprzrdr.*.
cbmap.map
Character mappings for Adobe Portable Document Format
(PDF).
chartbls.ux
Character mapping tables.
chmdll.*
Required by chmsr.
kp3dwrld.*
Required for 3D charts.
kpchtrdr.*
Required for all spreadsheets (chart support).
kpjavwrt.*
Java utility routines.
kpjpeg.*
JPEG file interchange format shared routines.
kppng.*
Portable Network Graphics (PNG) utilities.
kvxconfig.ini
Contains element extraction settings for source XML files.
kvgraph.*
Required for all spreadsheets (chart support).
kvpie.*
Required for all spreadsheets (chart support).
kvradar.*
Required for all spreadsheets (chart support).
kv.lic
Contains license information for KeyView products. This file is
opened and validated when a KeyView API is used.
kvraster.class
Java program used to convert vector graphics on UNIX and
Linux.
kvVector.class
Java applet used to convert vector graphics on UNIX and Linux.
kvvector.jar
Java applet used to convert vector graphics on UNIX and Linux.
This must reside in the output directory.
mscomctl.ocx
Microsoft Common Control (for example, labels, dialog boxes).
Required for Visual Basic programs and COM objects.
msvbvm60.*
Microsoft Visual Basic Runtime library V6.0.
MSVCP60.*
Microsoft Visual C++ Runtime Library V6.0.
msvcrt.*
Microsoft Visual C Runtime library.
oleaut32.*
Microsoft OLE Automation Controls.
XML Export SDK C Programming Guide
•
•
• 321
•
•
•
Appendix B Files Required for Redistribution
File
Description
olepro32.*
Microsoft OLE property support library.
servant.exe
Executable required for out-of-process conversions.
wpmap.*
Extended character mapping for WordPerfect and Corel
Presentation.
xmlsh.*
Contains a library of content handlers for each XML file type.
Required by the Expat XML parser.
Document Readers and Writers
The following readers and writers may be redistributed with your application:
.
•
•
322 ••
•
•
File
Description
ad1sr.*
AD1 Evidence file reader
afsr.*
ASCII reader
assr.*
Applix spreadsheet reader
awsr.*
Applix Words reader
bkfsr.*
Microsoft Backup File reader
bzip2sr.*
Bzip2 reader
cabsr.*
Microsoft Cabinet format reader
cebsr.*
Founder Chinese E-paper Basic reader
chmsr.*
Microsoft Compiled HTML Help reader
csvsr.*
Comma Separated Values reader
dbfsr.*
dBase Database reader
dbxsr.*
Microsoft Outlook Express DBX reader
dcasr.*
Document Content Architecture/Revisable Form Text (DCA/RFT)
reader
difsr.*
Data Interchange Format reader
dmgsr.*
Mac Disk Copy Disk Image File reader
dw4sr.*
DisplayWrite 4 reader
dxlsr.*
Domino XML Language reader
XML Export SDK C Programming Guide
Document Readers and Writers
File
Description
emlsr.*
Microsoft Outlook Express (EML) reader. This is used to convert
EML files when the MBX reader is not licensed.
emxsr.*
Legato EMailXtender archive (EMX) reader
encasesr.*
Expert Witness Compression Format (EnCase) v6 reader
encase2sr.*
Expert Witness Compression Format (EnCase) v7 reader
entsr.*
Microsoft Entourage Database Format reader
epubsr.*
Open Publication Structure eBook reader
foliosr.*
Folio Flat File reader
gwfssr.*
GroupWise FileSurf reader
hl7sr.*
Health level7 reader (metadata only)
htmsr.*
HTML and XHTML reader
hwposr.*
Hangul 2002, 2005, 2007 reader
ichatsr.*
Apple iChat Log reader
icssr.*
Microsoft Outlook iCalendar reader
isosr.*
ISO-9660 CD Disc Image Format reader
iwsssr.*
Apple iWork Numbers reader
iwwpsr.*
Apple iWork Pages reader
jp2000sr.*
JPEG 2000 metadata reader
jtdsr.*
JustSystems Ichitaro reader
kpagrdr.*
Applix Presents reader
kpanirdr.*
Animated cursor reader
kpbmprdr.*
Windows Bitmap reader
kpbmpwrt.*
Windows Bitmap writer
kpcdrrdr.*
Corel Draw
kpcgmrdr.*
Computer Graphics Metafile reader
kpcgmwrt.*
Computer Graphics Metafile writer
kpdcxrdr.*
DCX (fax) reader
kpDWGrdr.*
AutoCAD Drawing format reader
kpDXFrdr.*
AutoCAD Drawing Exchange format reader
XML Export SDK C Programming Guide
•
•
• 323
•
•
•
Appendix B Files Required for Redistribution
•
•
324 ••
•
•
File
Description
kpemfrdr.*
Enhanced Metafile reader
kpepsrdr.*
Encapsulated PostScript (EPS) reader
kpgifrdr.*
Graphic Interchange Format (GIF) reader
kpicordr.*
Windows Icon reader
kpiwpgrdr.*
Apple iWork Keynote reader
kpjbig2rdr.*
JBIG2 reader
kpjp2000rdr.*
JPEG 2000 reader
kpjpgrdr.*
JPEG file interchange format reader
kpjpgwrt.*
JPEG file interchange format writer
kpnbmprdr.*
Lotus Notes Bitmap reader (for embedded images in DXL files)
kpmacrdr.*
MacPaint reader
kpmsordr.*
Microsoft Office Drawing Objects (office 97, 2000, and XP) reader
kpodfrdr.*
Oasis Open Document Format presentation (ODP) reader
kpODArdr.*
AutoCAD reader (Windows only)
kpONErdr.*
Microsoft OneNote reader
kppdfrdr.*
Adobe Portable Document File (PDF) graphic-based reader
kppdf2rdr.*
High-fidelity Adobe Portable Document File (PDF) graphic-based
reader
kpp40rdr.*
Microsoft PowerPoint PC 4.0 and PowerPoint Mac reader
kpp95rdr.*
Microsoft PowerPoint 95 reader
kpp97rdr.*
Microsoft PowerPoint 97 and higher reader
kppctrdr.*
Macintosh Quick Draw Picture (PICT) reader
kppcxrdr.*
PC Paintbrush (PCX) reader
kppicrdr.*
Pictor PC Paint format (PIC) reader
kppngrdr.*
Portable Network Graphics (PNG) reader
kppngwrt.*
Portable Network Graphics (PNG) writer
kpppxrdr.*
Microsoft PowerPoint XML reader 2007
kpprerdr.*
Lotus Freelance Graphics for Windows V2.0 reader
kpprzrdr.*
Lotus Freelance Graphics 96/97/98 reader
XML Export SDK C Programming Guide
Document Readers and Writers
File
Description
kpsdwrdr.*
Lotus Ami Pro Graphics reader
kpsgirdr.*
SGI RGB reader
kpshwrdr.*
Corel Presentations reader
kpsunrdr.*
Sun Raster reader
kptgardr.*
Truevision Targa reader
kptifrdr.*
Tagged Image File Format (TIFF) reader
kpvsdrdr.dll
Microsoft Visio reader
kpwg2rdr.*
WordPerfect Graphics 2 reader
kpwmfrdr.*
Windows Metafile reader
kpwmfwrt.*
Windows Metafile writer
kpwpgrdr.*
WordPerfect Graphics 1 reader
kpxfdlrdr.*
Extensible Forms Description Language reader
kvgzsr.*
GZIP reader
kvhqxsr.*
BinHex reader
kvzeesr.*
UNIX Compress reader
l123sr.*
Lotus 123 v96/97/98 reader
lasr.*
Lotus AMI Pro reader
ltbenn30.dll
Lotus Word Pro support (supported on Windows x86 platform only)
ltscsn10.dll
Lotus Word Pro support (supported on Windows x86 platform only)
lwpapin.dll
Lotus Word Pro support (supported on Windows x86 platform only)
lwppann.dll
Lotus Word Pro support (supported on Windows x86 platform only)
lwpsr.dll
Lotus Word Pro reader (supported on Windows x86 platform only)
macbinsr.*
MacBinary reader
mbsr.*
Microsoft Word Macintosh reader
mbxsr.*
Mailbox (MBX)1 and Microsoft Outlook Express (EML) reader
mdbsr.*
Microsoft Access reader.
mifsr.*
Adobe Maker Interchange Format reader
misr.*
Microsoft Word 2 reader
XML Export SDK C Programming Guide
•
•
• 325
•
•
•
Appendix B Files Required for Redistribution
•
•
326 ••
•
•
File
Description
mp3sr.*
MP3 reader for metadata extraction
mppsr.*
Microsoft Project reader
msgsr.*
Microsoft Outlook (MSG) reader
mspubsr.*
Microsoft Publisher reader
msw6sr.*
Microsoft Works 6 and 2000 reader
mswsr.*
Microsoft Works V1 and 2 reader
mw6sr.*
Microsoft Word 95 reader
mw8sr.*
Microsoft Word 97, 2000, and XP reader
mwsr.*
Microsoft Word for DOS and Microsoft Write reader
mwssr.*
Microsoft Works Spreadsheet reader
mwxsr.*
Microsoft Word 2007 XML reader
nsfsr.*
Lotus Notes Database reader1
oa2sr.*
Fujitsu Oasys reader
odfsssr.*
Oasis Open Document Format spreadsheets (ODS) reader
odfwpsr.*
Oasis Open Document Format word processing (ODT) reader
olesr.*
Embedded OLE object reader.
olmsr.*
Microsoft Outlook for Macintosh reader
oo3sr.*
Omni Outliner reader
pdfsr.*
Adobe Portable Document File (PDF) reader
pffsr.*
Microsoft Outlook Offline Storage File reader
pstsr.dll
Microsoft Outlook Personal Folders file MAPI-based reader
(supported on Windows platform only)1
pstnsr.*
Microsoft Outlook Personal Folders file native reader1
qpssr.*
Quattro Pro spreadsheet reader
rarsr.*
RAR Archive reader
rtfsr.*
Microsoft Rich Text Format reader
skypesr.*
Skype log file reader
sosr.*
StarOffice/OpenOffice reader
swfsr.*
Macromedia Flash reader
XML Export SDK C Programming Guide
Document Readers and Writers
File
Description
tarsr.*
Tape archive reader
tnefsr.*
Transfer Neutral Encapsulation Format reader
unihtmsr.*
Unicode HTML reader
unisr.*
Unicode reader
unzip.*
Zip file reader
uudsr.*
UUEncoding reader
vsdsr.*
Microsoft Visio reader
vcfsr.*
Microsoft Outlook vCard Contact reader
wkssr.*
Lotus 123 v2.0 through 5.0 reader
wosr.*
WordPerfect 5.x reader
wp6sr.*
WordPerfect 6.0 through 10.0 reader
wpmsr.*
WordPerfect for Macintosh reader
xlsbsr.*
Microsoft Office 2007 Excel Binary Format reader
xlssr.*
Microsoft Excel reader
xlsxsr.*
Microsoft Excel 2007 XML reader
xmlsr.*
Generic XML reader
xpssr.*
XML Paper Specification reader
xywsr.*
XYWrite reader
yimsr.*
Yahoo! Instant Messenger reader
z7zsr.*
7-Zip reader
1. This reader is an advanced feature and is sold and licensed separately from KeyView Export
SDK.
XML Export SDK C Programming Guide
•
•
• 327
•
•
•
Appendix B Files Required for Redistribution
Document Type Definition Files
The following files related to the verity.dtd may be redistributed with your
application:
•
•
328 ••
•
•
File
Description
Verity.dtd
The document type definition file that defines the structure of
an XML document. XML document validity is based on the
Verity.dtd. The Verity.dtd is required and must be in
the same directory as the output XML file.
HTMLlat1x.ent
The file defining Latin characters. This file is referenced in the
verity.dtd. This file is required and must be in the same
directory as the Verity.dtd.
HTMLspecialx.ent
The file defining special characters. This file is referenced in
the verity.dtd. This file is required and must be in the
same directory as the Verity.dtd.
HTMLsymbolx.ent
The file defining symbols. This file is referenced in the
verity.dtd. This file is required and must be in the same
directory as the Verity.dtd.
wp.xsl
The default style sheet for word processing documents. This
file is optional and must be in the same directory as the output
XML file.
pg.xsl
The default style sheet for presentation graphics. This file is
optional and must be in the same directory as the output XML
file.
ss.xsl
The default style sheet for spreadsheets. This file is optional
and must be in the same directory as the output XML file.
XML Export SDK C Programming Guide
APPENDIX C
Export Tokens
This section contains an alphabetized list of the Export tokens.
Tokens are special strings inserted into the KVXMLTemplate structure,
XmlTemplateInfo class, and template files. They are placeholders for markup
that appears in the XML output. For example, the $CHARSET token marks the
place in the XML output where the name of the source document’s character set
is inserted. It would be used in the tag <charset=$CHARSET>.
See the template files for examples of how to use tokens.
Table 31 Export Tokens
Token
Description
$ANCHOR
Inserts an anchor for a heading level (h2-h6) for the current block.
$BASE
Inserts the base URL for the XML file. Use in the tag
<base href=xx>.
$CHARSET
Inserts the character set of the source document, if that information is
ascertainable. The section “Supported Formats” on page 294 lists the
file formats for which character set information can be determined.
XML Export SDK C Programming Guide
•
•
• 329
•
•
•
Appendix C Export Tokens
Table 31 Export Tokens
Token
Description
$CONTENT
Inserts the content of the metadata field specified by the $NAME token.
This token is used in conjunction with the $SUMMARY, $USERSUMMARY,
and $NAME tokens to insert source document metadata into the XML
output. An example of this token’s use is:
pszUserSummary=<MetaData name="$NAME"
content="$CONTENT">
The section “Supported Formats” on page 294 lists file formats that
support metadata.
$ENDNOTE
Inserts endnotes from the current page of the document at this point in
the output stream. Currently only implemented for Microsoft Word
documents.
$FOOTER
Inserts the footer from the current page of the document at this point in
the output stream.
$FOOTNOTE
Inserts footnotes from the current page of the document at this point in
the output stream. Currently only implemented for Microsoft Word
documents.
$FOOTNOTEALL
Inserts all footnotes from the current document at this point in the output
stream. Currently only implemented for Microsoft Word documents.
$HEADER
Inserts the header from the page of the document at this point in the
output stream.
$MAINURL
Inserts the URL to the file containing the start of the generated XML,
that is, the main output stream.
$NAME
Inserts the name of a metadata field. This token is used in conjunction
with the $SUMMARY, $USERSUMMARY, and $CONTENT tokens to insert
source document metadata into the XML output. An example of this
token’s use is:
pszUserSummary=<MetaData name="$NAME"
content="$CONTENT">
The section “Supported Formats” on page 294 lists file formats that
support metadata.
•
•
330 ••
•
•
$NEXT
Inserts the anchor to the next block. If this is the last block, a link to the
first block is inserted.
$PREV
Inserts the anchor to the previous block. If the current block is the first
block, a link to the last block is inserted.
$STYLESHEET
Inserts the path to the style sheet.
XML Export SDK C Programming Guide
Table 31 Export Tokens
Token
Description
$SUMMARY
Inserts the data from standard metadata fields using the markup
provided in the pszUserSummary member of the structure
KVXMLTemplate. Standard fields are enumerated from 0 to 33 in
KVSumType in kvtypes.h. See the tokens $USERSUMMARY, $NAME,
and $CONTENT.
The section “Supported Formats” on page 294 lists file formats that
support metadata.
$SUMMARYNN
Inserts the data from a specified metadata field. NN is a number from 0
through 33 enumerated in the KVSumType structure in kvtypes.h. An
example of this token’s use is:
pszMainTop=$SUMMARY01
The section “Supported Formats” on page 294 lists file formats that
support metadata.
$SPLITBLOCKNUMBER
Inserts the page number for each block generated as a result of
bHardPageMakesNewBlock or lcbBlockSize.
$TOC
Inserts the table of contents at this point in the current output stream.
This token is typically embedded in pszMainTop.
$TOCB
Inserts the table of contents at this point for the current block.
$TOCBE
Inserts the beginning entry for the table of contents at this point in the
current output stream.
$TOCE
Inserts a table of contents entry at this point in the current output
stream.
$TOCTE
Inserts a text entry without XML markup at this point in the current
output stream.
$TOCPE
Inserts a partial table of contents entry at this point in the current output
stream. XML tags are removed; however, character entities are
retained. This allows angle brackets to appear in the table of contents
entries (for example, <text>). Without this token, <text> would be
interpreted as a non-valid XML tag and would be ignored by the
browser.
$TOPANCHOR
Inserts the anchor for the top heading level (h1) for the current block.
XML Export SDK C Programming Guide
•
•
• 331
•
•
•
Appendix C Export Tokens
Table 31 Export Tokens
Token
Description
$USERCB
Triggers the callback function UserCB() and identifies the callback
used in the function.
$USERSUMMARY
Inserts the data from every valid non-standard metadata field using the
markup provided in the pszUserSummary member of the structure
KVXMLTemplate. Non-standard metadata are any fields not listed from
0 to 33 in KVSumType, such as user-defined fields (for example, custom
property fields in Word documents), or fields that are unique to a
particular file type (for example, “Artist” or “Genre” fields in MP3 files).
See the tokens $SUMMARY, $NAME, and $CONTENT.
The section “Supported Formats” on page 294 lists file formats that
support metadata.
$XANCHOR
•
•
332 ••
•
•
Inserts the anchor to an extra file into the XML output. The contents of
the extra file is defined by pszXFile, and the block generated by this
token is defined by pszXStartBlock and pszXEndBlock.
XML Export SDK C Programming Guide
APPENDIX D
Character Sets
This section provides information on the handling of character sets in the KeyView
suite of products, which includes KeyView Filter SDK, KeyView Export SDK, and
KeyView Viewing SDK. It contains the following topics.

Multi-Byte and Bi-Directional Support

Coded Character Sets
Multi-Byte and Bi-Directional Support
The KeyView SDKs can process files containing multi-byte characters. A
multi-byte character encoding represents a single character with consecutive
bytes. KeyView can also process text from files that contain bi-directional text.
Bi-directional text contains both Latin-based text which is read from left to right,
and text that is read from right to left (Hebrew and Arabic).
Table 32 indicates which character encodings are supported by KeyView for each
format.
Table 32 Multi-byte and bi-directional support
Format
Single-byte
Multi-byte
Bi-directional
7-Zip (7Z)
n/a
n/a
n/a
AD1 Evidence file
n/a
n/a
n/a
Archive
XML Export SDK C Programming Guide
•
•
• 333
•
•
•
Appendix D Character Sets
Table 32 Multi-byte and bi-directional support
Format
Single-byte
Multi-byte
Bi-directional
BinHex (HQX)
n/a
n/a
n/a
Bzip2 (BZ2)
n/a
n/a
n/a
EnCase – Expert Witness Compression
Format (E01)
n/a
n/a
n/a
GZIP (GZ)
n/a
n/a
n/a
ISO (ISO)
n/a
n/a
n/a
Java Archive (JAR)
n/a
n/a
n/a
Legato EMailXtender Archive (EMX)
n/a
n/a
n/a
MacBinary (BIN)
n/a
n/a
n/a
Mac Disk Copy Disk Image (DMG)
n/a
n/a
n/a
Microsoft Backup File (BKF)
n/a
n/a
n/a
Microsoft Cabinet format (CAB)
n/a
n/a
n/a
Microsoft Compiled HTML Help (CHM)
n/a
n/a
n/a
Microsoft Compressed Folder (LZH)
n/a
n/a
n/a
PKZip (ZIP)
n/a
n/a
n/a
Microsoft Outlook DBX (DBX)
Y
Y
Y
Microsoft Outlook Offline Storage File (OST)
Y
Y
Y
RAR Archive (RAR)
n/a
n/a
n/a
Tape Archive (TAR)
n/a
n/a
n/a
UNIX Compress (Z)
n/a
n/a
n/a
UUEncoding (UUE)
n/a
n/a
n/a
Windows Scrap File (SHS)
n/a
n/a
n/a
WinZip (ZIP)
n/a
n/a
n/a
Executable (EXE)
n/a
n/a
n/a
Link Library (DLL)
n/a
n/a
n/a
Y
Y
Y
Binary
Computer-aided Design
AutoCAD Drawing (DWG)
•
•
334 ••
•
•
XML Export SDK C Programming Guide
Multi-Byte and Bi-Directional Support
Table 32 Multi-byte and bi-directional support
Format
Single-byte
Multi-byte
Bi-directional
AutoCAD Drawing Exchange (DXF)
Y
Y
Y
CATIA formats (CAT)
Y
N
N
Microsoft Visio (VSD)
Y
Y
Y
dBase Database
Y
N
N
Microsoft Access (MDB)
Y
Y
N
Microsoft Project (MPP)
Y
Y
N
N
Y
N
Y
Y1
Y
Computer Graphics Metafile (CGM)
Y
N
N
Corel DRAW (CDR)
n/a
n/a
n/a
DCX Fax System (DCX)
Y
N
N
DICOM – Digital Imaging and
Communications in Medicine (DCM)
n/a
n/a
n/a
Encapsulated PostScript (EPS)
Y
N
N
Enhanced Metafile (EMF)
Y
Y
N
Graphic Interchange Format (GIF)
n/a
n/a
n/a
JBIG2
n/a
n/a
n/a
JPEG
n/a
n/a
n/a
JPEG 2000
n/a
n/a
n/a
Lotus AMIDraw Graphics (SDW)
n/a
n/a
n/a
Lotus Pic (PIC)
n/a
n/a
n/a
Macintosh Raster (PICT/PCT)
n/a
n/a
n/a
MacPaint (PNTG)
n/a
n/a
n/a
Microsoft Office Drawing (MSO)
n/a
n/a
n/a
Database
Desktop Publishing
Microsoft Publisher
Display
Adobe Portable Document Format (PDF)
Graphics
XML Export SDK C Programming Guide
•
•
• 335
•
•
•
Appendix D Character Sets
Table 32 Multi-byte and bi-directional support
Format
Single-byte
Multi-byte
Bi-directional
Omni Graffle (GRAFFLE)
Y
N
N
PC PaintBrush (PCX)
n/a
n/a
n/a
Portable Network Graphics (PNG)
n/a
n/a
n/a
SGI RGB Image (RGB)
n/a
n/a
n/a
Sun Raster Image (RS)
n/a
n/a
n/a
Tagged Image File (TIFF)
Y
N
N
Truevision Targa (TGA)
n/a
n/a
n/a
Windows Animated Cursor (ANI)
n/a
n/a
n/a
Windows Bitmap (BMP)
n/a
n/a
n/a
Windows Icon Cursor (ICO)
n/a
n/a
n/a
Windows Metafile (WMF)
Y
Y
N
WordPerfect Graphics 1 (WPG)
Y
N
N
WordPerfect Graphics 2 (WPG)
Y
N
N
Documentum EMCMF Format
Y
Y
Y
Domino XML Language (DXL)
Y
Y
N
GroupWise FileSurf
Y
N
N
Legato Extender (ONM)
Y
Y
N
Lotus Notes database (NSF)
Y
Y
Y
Mailbox (MBX)
Y
Y
Y
Microsoft Entourage Database
Y
Y
Y
Microsoft Outlook (MSG)
Y
Y
Y
Microsoft Outlook Express (EML)
Y
Y
Y
Microsoft Outlook iCalendar
Y
Y
Y
Microsoft Outlook for Macintosh
Y
Y
Y
Microsoft Outlook Offline Storage File
Y
Y
Y
Microsoft Outlook Personal File Folders
(PST)
Y
Y
Y
Mail
•
•
336 ••
•
•
XML Export SDK C Programming Guide
Multi-Byte and Bi-Directional Support
Table 32 Multi-byte and bi-directional support
Format
Single-byte
Multi-byte
Bi-directional
Text Mail (MIME)
Y
Y
Y
Transport Neutral Encapsulation Format
Y
Y
Y
Advanced Systems Format (ASF)
n/a
n/a
n/a
Audio Interchange File Format (AIFF)
n/a
n/a
n/a
Microsoft Wave Sound (WAV)
n/a
n/a
n/a
MIDI (MID)
n/a
n/a
n/a
MPEG 1 Audio Layer 3 (MP3)
n/a
n/a
n/a
MPEG 1 Video (MPG)
n/a
n/a
n/a
MPEG 2 Audio (MPEGA)
n/a
n/a
n/a
MPEG 4 Audio (MP4)
n/a
n/a
n/a
NeXT/Sun Audio (AU)
n/a
n/a
n/a
QuickTime Movie (QT/MOV)
n/a
n/a
n/a
Windows Video (AVI)
n/a
n/a
n/a
Apple iWork Keynote (GZ)
Y
Y
N
Applix Presents (AG)
character set
1252 only
N
N
Corel Presentations (SHW)
character set
1252 only
N
N
Extensible Forms Description Language
(XFD)
Y
Y
N
Lotus Freelance Graphics 2 (PRE)
character set
850 only
N
N
Lotus Freelance Graphics (PRZ)
Y
Japanese, Simple
Chinese,
Traditional Chinese,
Thai only
N
Macromedia Flash (SWF)
Y
Y
N
Microsoft Outlook vCard Contact
Multimedia
Presentations
XML Export SDK C Programming Guide
•
•
• 337
•
•
•
Appendix D Character Sets
Table 32 Multi-byte and bi-directional support
Format
Single-byte
Multi-byte
Bi-directional
Microsoft OneNote
Y
Y
N
Microsoft PowerPoint PC (PPT)
character set
1252 only
Traditional Chinese
only
N
Microsoft PowerPoint Windows (PPT)
Y
Japanese, Simple
Chinese,
Traditional Chinese,
Korean only
Hebrew only
Microsoft PowerPoint Macintosh (PPT)
Y
N
N
Microsoft PowerPoint Windows XML 2007
and 2010 (PPTX)
Y
Y
Y
OASIS Open Document (ODP)
Y
Y
N
OpenOffice Impress (ODP)
Y
Y
N
StarOffice Impress (ODP)
Y
Y
N
Apple iWork Numbers (GZ)
Y
Y
N
Applix Spreadsheets (AS)
character set
1252 only
N
N
Comma Separated Values (CSV)
character set
1252 only
N
N
Corel Quattro Pro (QPW/WB3)
Y
N
N
Data Interchange Format (DIF)
Y
Y
Y2
Lotus 1-2-3 (123)
Y
Y
Y
Lotus 1-2-3 (WK4)
Y
Y
N
Lotus 123 Charts (123)
Y
Y
N
Microsoft Excel Charts (XLS)
Y
Y
N
Microsoft Excel Macintosh (XLS)
Y
N
N
Microsoft Excel Windows (XLS)
Y
Y
Y2
Microsoft Excel Windows XML 2007 (XLSX)
Y
Y
N
Microsoft Office Excel Binary Format (XLSB)
Y
Y
N
Microsoft Works Spreadsheet (S30/S40)
Y
N
N
Spreadsheets
•
•
338 ••
•
•
XML Export SDK C Programming Guide
Multi-Byte and Bi-Directional Support
Table 32 Multi-byte and bi-directional support
Format
Single-byte
Multi-byte
Bi-directional
OASIS Open Document (ODS)
Y
Y
N
OpenOffice Calc (ODS)
Y
Y
N
StarOffice Calc (ODS)
Y
Y
N
ANSI (TXT)
Y
Y
Y2
ASCII (TXT)
Y
Y
Y2
HTML (HTM)
Y
Y
Y2, 3
Microsoft Excel Windows XML 2003
Y
Y
Y
Microsoft Word for Windows XML 2003
Y
Y
Y
Microsoft Visio XML 2003
Y
Y
Y
Rich Text Format (RTF)
Y
Y
Y3
Unicode HTML
Y
Y
Y2, 3
Unicode Text (TXT)
Y
Y
Y2
XHTML
Y
Y
Y3
XML
Y
Y
Y
Adobe Maker Interchange Format (MIF)
character set
1252 only
N
N
Apple iChat Log (ICHAT)
Y
Y
N
Apple iWork Pages (GZ)
Y
Y
N
Applix Words (AW)
character set
1252 only
N
N
DisplayWrite (IP)
character set
500, 1026 only
N
N
Folio Flat File (FFF)
character set
1252 only
N
N
Founder Chinese E-paper Basic (CEB)
Y
Y
N
Fujitsu Oasys (OA2)
Y
Y
N
Hangul (HWP)
Y
Y
N
Text and Markup
Word Processing
XML Export SDK C Programming Guide
•
•
• 339
•
•
•
Appendix D Character Sets
Table 32 Multi-byte and bi-directional support
•
•
340 ••
•
•
Format
Single-byte
Multi-byte
Bi-directional
Health level7 (HL7)
Y
Y
Y
IBM DCA/RTF (DC)
character sets
500, 1026 only
N
N
JustSystems Ichitaro (JTD)
Y
Y
N
Lotus AMI Pro (SAM)
Y
Simple Chinese,
Traditional Chinese,
Japanese, Thai only
Y
Lotus AMI Professional Write Plus (AMI)
Y
Simple Chinese,
Traditional Chinese,
Japanese, Thai only
N
Lotus Word Pro (LWP)
Y
Y
Y3
Lotus SmartMaster (MWP)
Y
Y
N
Microsoft Word PC (DOC)
character set
1252 only
N
N
Microsoft Word Windows V1-2 (DOC)
Y
N
N
Microsoft Word Windows V6, 7, 8, 95 (DOC)
Y
Y
Hebrew only3
Microsoft Word Windows V97 through 2003
(DOC)
Y
Y
Y3
Microsoft Word Windows XML 2007 and
2010 (DOCX)
Y
Y
Y3
Microsoft Word Macintosh (DOC)
Y
N
Y3
Microsoft Works (WPS)
Y
Japanese only
N
Microsoft Write (WRI)
Y
Japanese only
N
OASIS Open Document (ODT)
Y
Y
N
Omni Outliner (OO3)
Y
Y
N
OpenOffice Writer (ODT)
Y
Y
N
Open Publication Structure eBook (EPUB)
Y
Y
Y
StarOffice Writer (ODT)
Y
Y
N
Skype Log (DBB)
Y
Y (null-terminated
charsets)
N
WordPad (RTF)
Y
Y
Y
XML Export SDK C Programming Guide
Coded Character Sets
Table 32 Multi-byte and bi-directional support
Format
Single-byte
Multi-byte
Bi-directional
WordPerfect Linux (WPS)
Y
N
N
WordPerfect Macintosh (WPS)
Y
N
N
WordPerfect Windows (WO)
Y
N
N
XML Paper Specification (XPS)
Y
Y
N
XYWrite Windows (XY4)
character set
1252 only
N
N
Yahoo! Instant Messenger (DAT)
Y
Y (null-terminated
charsets)
N
1. Multi-byte PDFs are supported, provided the PDF document is created using either Character ID-keyed (CID) fonts,
predefined CJK CMap files, or ToUnicode font encodings, and does not contain embedded fonts. See the Adobe
website and the Adobe Acrobat documentation for more information. Any multi-byte characters that are not supported
are displayed using the replacement character. By default, the replacement character is a question mark (?).
To determine the type of font encodings that are used in a PDF, open the PDF in Adobe Acrobat, and select File |
Document Info | Fonts. If the Encoding column lists Custom or Embedded encodings, you may encounter problems converting the PDF.
2. Text direction in the output file may not be correct.
3. In Export SDK, a bi-directional right-to-left (RTL) tag is extracted from this format and included in the direction element
(<dir=RTL>) of the output.
Coded Character Sets
Table 33 lists which character set can be used to specify the target character set.
The coded character sets are enumerated in kvtypes.h and defined in the
Export class.
Table 33 Code Character Sets
Coded Character Set
Description
Can be set as
target charset?
KVCS_UNKNOWN
Unknown character set
N
KVCS_SJIS
Japanese (uses multi-byte encoding), cp932
Y
KVCS_GB
Simplified Chinese (China, Singapore, Malaysia) cp936
Y
KVCS_BIG5
Traditional Chinese (Taiwan, Hong Kong, Macaw) cp950
Y
KVCS_KSC
Korean, cp949
Y
XML Export SDK C Programming Guide
•
•
• 341
•
•
•
Appendix D Character Sets
Table 33 Code Character Sets
•
•
342 ••
•
•
Coded Character Set
Description
Can be set as
target charset?
KVCS_1250
Windows Latin 2 (Central Europe)
Y
KVCS_1251
Windows Cyrillic (Slavic)
Y
KVCS_1252
Windows Latin 1 (ANSI)
Y
KVCS_1253
Windows Greek
Y
KVCS_1254
Windows Latin 5 (Turkish)
Y
KVCS_1255
Windows Hebrew
Y
KVCS_1256
Windows Arabic
Y
KVCS_1257
Windows Baltic Rim
Y
KVCS_1258
Windows Vietnamese
Y
KVCS_8859_1
ISO 8859-1 Latin 1 (Western Europe, Latin America)
Y
KVCS_8859_2
ISO 8859-2 Latin 2 (Central Eastern Europe)
Y
KVCS_8859_3
ISO 8859-3 Latin 3 (S.E. Europe)
Y
KVCS_8859_4
ISO 8859-4 Latin 4 (Scandinavia/Baltic)
Y
KVCS_8859_5
ISO 8859-5 Latin/Cyrillic
Y
KVCS_8859_6
ISO 8859-6 Latin/Arabic
Y
KVCS_8859_7
ISO 8859-7 Latin/Greek
Y
KVCS_8859_8
ISO 8859-8 Latin/Hebrew
Y
KVCS_8859_9
ISO 8859-9 Latin/Turkish
Y
KVCS_8859_14
ISO 8859-14
Y
KVCS_8859_15
ISO 8859-15
Y
KVCS_437
DOS Latin US
Y
KVCS_737
DOS Greek
Y
KVCS_775
DOS Baltic Rim
Y
KVCS_850
DOS Latin 1
Y
KVCS_851
DOS Greek
Y
KVCS_852
DOS Latin 2
Y
KVCS_855
DOS Cyrillic
Y
XML Export SDK C Programming Guide
Coded Character Sets
Table 33 Code Character Sets
Coded Character Set
Description
Can be set as
target charset?
KVCS_857
DOS Turkish
Y
KVCS_860
DOS Portuguese
Y
KVCS_861
DOS Icelandic
Y
KVCS_862
DOS Hebrew
Y
KVCS_863
DOS Canadian French
Y
KVCS_864
DOS Arabic
Y
KVCS_865
DOS Nordic
Y
KVCS_866
DOS Cyrillic Russian
Y
KVCS_869
DOS Greek 2
Y
KVCS_874
Thai
Y
KVCS_PDFMACDOC
PDF MAC DOC
N
KVCS_PDFWINDOC
PDF WIN DOC
N
KVCS_STDENC
Adobe Standard Encoding
N
KVCS_PDFDOC
Adobe standard PDF character set
N
KVCS_037
EBCDIC code page 037
Y
KVCS_1026
EBCDIC code page 1026
Y
KVCS_500
EBCDIC code page 500
Y
KVCS_875
EBCDIC code page 875
Y
KVCS_LMBCS
Lotus multibyte character set Group 1 and Group 2
N
KVCS_UNICODE
Unicode, UCS-2
N
KVCS_UTF16
16-bit Unicode transformation format
N
KVCS_UTF8
8-bit Unicode transformation format
Y
KVCS_UTF7
7-bit Unicode transformation format
Y
KVCS_2022_JP
ISO 2022-JP, Japanese mail and news safe encoding (JIS-7)
N
KVCS_2022_CN
ISO 2022-CN, Chinese mail and news safe encoding
N
KVCS_2022_KR
ISO 2022-KR, Korean mail and news safe encoding
N
KVCS_WP6X
Word Perfect 6.x and higher character mapping
N
XML Export SDK C Programming Guide
•
•
• 343
•
•
•
Appendix D Character Sets
Table 33 Code Character Sets
•
•
344 ••
•
•
Coded Character Set
Description
Can be set as
target charset?
KVCS_10000
Western European (Macintosh)
Y
KVCS_KSC5601
Unified Hangul
Y
KVCS_GB2312
Simplified Chinese (China, Singapore, Hong Kong)
Y
KVCS_GB12345
Traditional Chinese (China) - analogue of GB2312
Y
KVCS_CNS11643
Traditional Chinese - Taiwan. Supplement to Big5
Y
KVCS_JIS0201
Japanese - contains ASCII character set (JIS-Roman)
N
KVCS_JIS0212
Japanese. Supplement to JIS0208.
Y
KVCS_EUC_JP
Japanese Extended UNIX Code
Y
KVCS_EUC_GB
Simplified Chinese Extended UNIX Code
Y
KVCS_EUC_BIG5
Traditional Chinese Extended UNIX Code
N
KVCS_EUC_KSC
Korean Extended UNIX Code
N
KVCS_424
EBCDIC Hebrew
N
KVCS_856
PC Hebrew (old)
N
KVCS_1006
IBM AIX Pakistan (Urdu)
N
KVCS_KOI8R
Cyrillic (Russian)
Y
KVCS_PDF_JAPAN1
Adobe-Japan1-2 character collection
N
KVCS_PDF_KOREA1
Adobe-Korea1-0 character collection
N
KVCS_PDF_GB1
Adobe-GB1-3 character collection
N
KVCS_PDF_CNS1
Adobe-CNS1-2 character collection
N
KVCS_2022_JP_8
ISO 2022-JP, Japanese mail and news safe encoding (JIS8)
N
KVCS_720
Arabic DOS-720
Y
KVCS_VISCII
Vietnamese VISCII
Y
KVCS_8859_10
ISO 8859-10 (Latin 6 Nordic)
Y1
KVCS_8859_13
ISO 8859-13 (Latin 7 Baltic)
Y1
KVCS_57002
ISCII Devanagari (x-iscii-de)
Y1
KVCS_57003
ISCII Bengali (x-iscii-be)
Y1
KVCS_57004
ISCII Tamil (x-iscii-ta)
Y1
XML Export SDK C Programming Guide
Coded Character Sets
Table 33 Code Character Sets
Coded Character Set
Description
Can be set as
target charset?
KVCS_57005
ISCII Telugu (x-iscii-te)
Y1
KVCS_57006
ISCII Assamese (x-iscii-as)
Y1
KVCS_57007
ISCII Oriya (x-iscii-or)
Y1
KVCS_57008
ISCII Kannada (x-iscii-ka)
Y1
KVCS_57009
ISCII Malayalam (x-iscii-ma)
Y1
KVCS_57010
ISCII Gujarathi (x-iscii-gu)
Y1
KVCS_57011
ISCII Panjabi (x-iscii-pa)
Y1
KVCS_GB18030b2
Reserved for internal use
n/a
KVCS_GB18030
GB18030 (Chinese 4-byte character set)
Y
KVCS_8859_11
ISO 8859-11 (Thai)
Y
KVCS_8859_16
ISO 8859-16 (Latin-10 South-Eastern Europe)
Y
KVCS_ARABICMAC
Arabic Mac (x-mac-arabic)
Y
KVCS_KOI8U
Cyrillic (KOI8U Ukrainian)
Y
KVCS_HZGB2312
The 7-bit representation of GB 2312 / RFC 1842
n/a
1. Character set cannot be forced as output in Export SDK and Viewing SDK because the character set is not supported
by the major browsers.
XML Export SDK C Programming Guide
•
•
• 345
•
•
•
Appendix D Character Sets
•
•
346 ••
•
•
XML Export SDK C Programming Guide
APPENDIX E
File Format Detection
This section describes how file formats are detected in the KeyView Export SDK.
It contains the following topics:

Introduction

Extract Format Information

Determine Format Support

Translate Format Information

Determine a Document Reader

Category Values in formats_e.ini
Introduction
The KeyView format detection module (kwad) detects a file’s format, and reports
the information to the API, which in turn reports the information to the developer’s
application. If the detected format is supported by the KeyView SDK, the detection
module also loads the appropriate structured access layer and document reader
for further processing.
For a list of supported formats, see “Supported Formats” on page 293.
XML Export SDK C Programming Guide
•
•
• 347
•
•
•
Appendix E File Format Detection
Extract Format Information
You can extract format information from a document using the
fpGetStreamInfo() function. If required, this format information can then be
reported to the developer’s application. The fpGetStreamInfo() function
extracts format information, such as file class, format and version, and populates
the ADDOCINFO structure. This structure is defined in the header file adinfo.h.
For information on how to translate the extracted format information, see
“Translate Format Information” on page 350.
Determine Format Support
Once the file format is extracted, the detection module then uses the
formats_e.ini file to determine whether the format is supported by KeyView,
and the appropriate structured access layer and reader to load.
The formats_e.ini file is in the directory install\OS\bin, where install
is the pathname of the Export installation directory and OS is the name of the
operating system. It contains the following information:

Coded format information. To translate this information, see “Translate Format
Information” on page 350.

Reader associated with each format. See “Determine a Document Reader” on
page 352.

Configuration parameters for out-of-process conversions.

Locale settings for internal use.
Below are some entries from the formats_e.ini file:
123=mw
152=xyw
178=wp6
189=mw6
2=af
200=pdf
205=mb
210=htm
251=htm
NOTE The formats_e.ini file applies to all formats except graphics.
Detection of graphics formats are handled by an internal module named
KeyView Picture Interchange Format (KPIF).
•
•
348 ••
•
•
XML Export SDK C Programming Guide
Determine Format Support
Refine Detection of Text Files
During text detection, KeyView analyses the first 1kB and last 1kB of data in a
document, and if less than 10% of that data consists of non-ASCII characters,
KeyView detects the document as a text file.
However, depending on the type of documents you are working with, the default
settings may not provide the desired level of accuracy. Configuration flags allow
you to change the amount of data to read at the end of a file, the percentage of
non-ASCII characters permitted in a text file, and whether to use or ignore the file
extension to determine the document format.
Change the Amount of File Data to Read
During file detection, KeyView reads characters from the beginning and end of a
file—by default, it reads the first and last 1024 bytes of data. Large text files may
contain many irrelevant characters at the end of a file, so KeyView may not
accurately detect the file format. You can set a configuration flag to increase the
amount of data to read from the end of a file during detection.
To change the amount of data to read during detection
 In the formats_e.ini file, set the following flag in the detection_flags
section:
[detection_flags]
non_ascii_chars_end_block_size=kB
where kB is the number of kilobytes to read from the end of the file, from 0 to
10. The default value is 1.
NOTE The file size must be greater than the value
specified in the flag. If the flag value is greater than the
file size, KeyView does not use the flag.
Change the Percentage of Allowed Non-ASCII Characters
By default, if less than 10% of the analyzed data in a document consists of
non-ASCII characters, it is detected as a text file. Depending on the type of files
you are working with, changing the default percentage may increase detection
accuracy.
To change the percentage of non-ASCII characters allowed in text files
 In the formats_e.ini file, set the following flag in the detection_flags
section:
[detection_flags]
XML Export SDK C Programming Guide
•
•
• 349
•
•
•
Appendix E File Format Detection
non_ascii_chars_in_text=N
where N is the percentage of non-ASCII characters to allow in text files. Files
that contain a lower percentage of non-ASCII characters than N are detected
as text files. The default value is 10.
Use the File Extension for Detection
Sometimes KeyView detects certain file formats, such as CSV, as ASCII because
of the content of the documents. In such cases, you can configure KeyView to use
the file extension to determine the document format. Using the file extension can
improve detection of formats such as CSV, but might not detect text files
successfully if they have incorrect file extensions.
To use the file extension for ASCII files during detection
 In the formats.ini file, set the following flag in the detection_flags
section:
[detection_flags]
use_extension_for_ascii=1
The default is 0 (do not use the file extension).
Translate Format Information
Format information can include file attributes in the following categories:

Major Format

File Class

Minor Format

Major Version

Minor Version
Not all categories are required. Many formats only include major format and file
class, or major format only.
The format information has the following structure:
MajorFormat.FileClass.MinorFormat.MajorVersion.MinorVersion
For example:
81.2.0.9.0
•
•
350 ••
•
•
XML Export SDK C Programming Guide
Translate Format Information
Each number in the format information represents a file attribute. The entry
81.2.0.9.0 represents a Lotus 1-2-3 Spreadsheet file version 9.0, where
81 = Lotus 1-2-3 Spreadsheet (major format)
2 = Spreadsheet (file class)
0 = not defined (minor format)
9 = 9 (major version)
0 = 0 (minor version)
The example above applies to formats_e.ini file. When extracting format
information using the fpGetStreamInfo() function, the same format
information is represented as 294.2.0.9.
NOTE The format values returned by fpGetStreamInfo() differ from
those in formats_e.ini because the former defines a unique ID for each
major format, while the latter uses a major version, minor version and minor
format to distinguish between formats.
Distinguish Between Formats
The ADDOCINFO structure provides a unique ID for each major format. For
example, a call to fpGetStreamInfo() would return 351.1.0 for a Microsoft
Word 2003 XML format. The major format 351 is unique to this format.
Unlike ADDOCINFO, the formats_e.ini file distinguishes between formats
using the major version number. For example, in formats_e.ini, a Microsoft
Word 2003 XML format is defined as 285.1.0.100.0. The major format 285
and file class 1 are the same values for generic XML. The major version 100
distinguishes the format as Microsoft Word 2003 XML.
The major version is used in formats_e.ini to specify the following formats:

The Microsoft Office 2003 XML format has the same major format and file
class as generic XML (285.1). It is distinguished from generic XML using the
following major versions:
 Word: 100
 Excel: 101
 Visio: 110

The XHTML format has the same major format and file class as HTML
(210.1). It is distinguished from HTML using the major version 100.
XML Export SDK C Programming Guide
•
•
• 351
•
•
•
Appendix E File Format Detection
Determine a Document Reader
The format detection module uses the formats_e.ini file to determine whether
a format is supported and which reader should be used to parse a format. The
entries in the formats_e.ini file lists each format’s coded value, and an
abbreviation for the format’s reader. For example:
81.2.0.9.0=l123
The reader abbreviation is a truncated version of the reader’s library name.
Adding “sr” to the end of an abbreviation creates the name of the reader. The
example entry above specifies that a Lotus 1-2-3 Spreadsheet file version 9.0 is
parsed by the Lotus 1-2-3 reader, l123sr.
“Files Required for Redistribution” on page 319 lists the document readers
provided with KeyView.
Category Values in formats_e.ini
This section lists the possible category values for format information in the
formats_e.ini file. The corresponding values for the format information
extracted from a call to fpGetStreamInfo() are listed in the header file
adinfo.h.

Major Formats

File Classes

Minor Formats
Table 34 Major Formats
•
•
352 ••
•
•
Number
Format
File Class
1
AES Multiplus Comm Format
Word processor
2
ASCII File word processor/MS DOS Batch File
format
Word processor
3
Applix Asterix
Word processor
4
Microsoft Windows Bitmap image (BMP)
Raster image
5
Convergent Tech DEF Comm. format
Word processor
6
Corel Draw (CDR)
Vector graphic
7
Keyword COM.FILE (KSIF)
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Table 34 Major Formats
Number
Format
File Class
8
Computer Graphics Metafile (CGM)
Vector graphic
9
Word Connection
Word processor
10
COMET TOP Word
Word processor
11
DG CEOwrite
Word processor
12
Honey Bull DSA101
Word processor
13
IBM DCA-RFT
Word processor
14
DDIF
Word processor
15
Dummy File (Internal)
16
DG Common Data Stream (CDS)
17
Dummy Print File (Internal)
18
Windows Micrografx Draw (DRW)
Vector graphic
19
Data Point VISTAWORD
Word processor
20
DECdx
Word processor
21
Enable
Word processor
22
Encapsulated PostScript (EPS)
Raster image
23
DOS/Windows Executable (EXE, DLL)
Executable
24
CCITT Group 3 1-Dimensional (G31D)
Raster image
25
Graphics Interchange format (GIF)
Raster image
26
Hewlett-Packard
Word processor
27
IBM 1403 Line Printer
Word processor
28
IBM DCF Script
Word processor
29
IBM DCA-FFT
Word processor
30
Interleaf
Word processor
31
GEM Bit Image
Raster image
32
IBM Display Write 4
Word processor
33
Raster Graphics
Raster image
34
Keywords PICL
XML Export SDK C Programming Guide
Word processor
•
•
• 353
•
•
•
Appendix E File Format Detection
Table 34 Major Formats
•
•
354 ••
•
•
Number
Format
File Class
35
Lotus AMI Pro
Word processor
36
MORE Database Outliner (Mac)
Outline/planning
37
Lyrix
Word processor
38
MASS-11
Word processor
39
MacPaint
Raster image
40
Microsoft Word Mac
Word processor
41
Informix SmartWare II Communication File
Communications
42
Microsoft Word for Windows
Word processor
43
MultiMate 4.0
Word processor
44
Multiplan Spreadsheet
Spreadsheet
45
Microsoft Rich Text Format (RTF)
Word processor
46
Microsoft Word 5.0 (PC)
Word processor
47
NBI Async Archive Format
Word processor
48
Navy DIF
Word processor
49
NBI Net Archive Format
Word processor
50
NIOS TOP
Word processor
51
FileMaker (Mac)
Database
52
ODA/ODIF
Word processor
53
OLIDIF
Word processor
54
Keyword OSM
55
Office Writer
Word processor
56
PC Paint Brush Graphics (PCX)
Raster image
57
CPT Communication Format
Word processor
58
Lotus PIC
Vector graphic
59
Macintosh Quick Draw Picture Format (PICT)
Raster image
60
Philips Script
Word processor
61
PostScript File
Vector graphic
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Table 34 Major Formats
Number
Format
File Class
62
PRIMEWORD
Word processor
63
Quadratron Q-One (V1.93J)
Word processor
64
Quadratron Q-One (V2.0)
Word processor
65
SAMNA Word IV
Word processor
66
Lotus AMI Pro Draw (SDW)
Raster image
67
SYLK Spreadsheet
Spreadsheet
68
Informix SmartWare II
Word processor
69
Symphony Spreadsheet
Spreadsheet
70
Truevision Targa
Raster image
71
Tagged Image File (TIFF)
Raster image
72
Targon Word (V 2.0)
Word processor
73
Uniplex Ucalc Spreadsheet
Spreadsheet
74
Uniplex (V6.01)
Word processor
75
Microsoft Word (UNIX)
Word processor
76
WANG PC
Word processor
77
WordERA (V 1.0)
Word processor
78
WANG WPS Comm. format
Word processor
79
WordPerfect Mac
Word processor
80
WordPerfect 5.2
Word processor
81
Lotus 1-2-3 Spreadsheet
Spreadsheet
82
WordMARC word processor
Word processor
83
Microsoft Windows Metafile (WMF) Graphics
Raster image
84
Informix SmartWare II Database
Database
85
WordPerfect Graphics V1.0 (WPG)
Raster image
86
WordPerfect
Word processor
87
WordStar
Word processor
88
Wang WITA
Word processor
XML Export SDK C Programming Guide
•
•
• 355
•
•
•
Appendix E File Format Detection
Table 34 Major Formats
•
•
356 ••
•
•
Number
Format
File Class
89
Xerox 860 Comm. format
Word processor
90
Microsoft Excel Spreadsheet
Spreadsheet
91
Xerox Writer word processor
Word processor
92
DIF Spreadsheet
Spreadsheet
93
ENABLE Spreadsheet
Spreadsheet
94
Supercalc Spreadsheet
Spreadsheet
95
Ultracalc Spreadsheet
Spreadsheet
96
Informix SmartWare Spreadsheet
Spreadsheet
97
Serialized Object Format (SOF) Encapsulation
format
Encapsulation
98
Microsoft PowerPoint (PC)
Presentation
99
Microsoft PowerPoint (Mac)
Presentation
100
Aldus PageMaker (Mac)
Desktop
Publishing
101
Aldus PageMaker (DOS)
Desktop
Publishing
103
Microsoft Works (Mac)
Word processor
104
Microsoft Works Database (Mac)
Database
105
Microsoft Works Spreadsheet (Mac)
Spreadsheet
106
Microsoft Works Communication (Mac)
Communication
107
Microsoft Works (PC)
Word processor
108
Microsoft Works Database (PC)
Database
109
Microsoft Works Spreadsheet (PC)
Spreadsheet
111
PC Library Module
Library module
112
MacWrite
Word processor
113
MacWrite II
Word processor
114
Aldus Freehand Mac
Vector graphic
115
Disk Doubler Compression format
Encapsulation
116
HP Graphics Language (HP-GL)
Vector graphic
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Table 34 Major Formats
Number
Format
File Class
117
Adobe Maker Interchange Format (MIF)
Desktop
Publishing
118
JPEG File Interchange Format (JFIF)
Raster image
119
Reflex Database
Database
120
Framework II
Mixed format
121
Paradox (PC) Database
Database
123
Microsoft Windows Write
Word processor
124
Quattro Pro Spreadsheet (DOS)
Spreadsheet
126
Persuasion Presentation
Presentation
127
Corel Presentation
Presentation
128
Microsoft Windows Icon Format (ICO) Graphics
Raster image
129
Microsoft Project
Time scheduling
131
Harvard Graphics
Desktop
publishing
132
Zip Archive Format
Encapsulation
133
Microsoft Windows Cursor (CUR) Graphics
Raster image
134
Quark Express (Mac)
Desktop
publishing
135
ARC/PAK Archive format
Encapsulation
136
Adobe FrameMaker
Desktop
publishing
137
Microsoft Publisher
Desktop
publishing
138
Plan Perfect
Time scheduling
139
WordPerfect General File Format
Miscellaneous
140
Lotus Freelance
Presentation
141
Microsoft Wave Sound File
Sound
142
MIDI Sound File
Sound
143
AutoCAD DXF Graphics
Vector graphic
XML Export SDK C Programming Guide
•
•
• 357
•
•
•
Appendix E File Format Detection
Table 34 Major Formats
•
•
358 ••
•
•
Number
Format
File Class
144
dBase Database
Database
145
OS/2 PM Metafile Graphics
Vector graphic
146
Lasergraphics Language
Vector graphic
147
AutoShade Rendering File Format
Vector graphic
148
Graphics Environment Manager (GEM VDI)
Vector graphic
149
Microsoft Windows Help File
Miscellaneous
150
Volkswriter
Word processor
151
Ability Office (SS, DB, GR, WP, COM)
152
XyWrite/Nota Bene
Word processor
153
Comma Separated Values (CSV)
Spreadsheet
154
Writing Assistant word processor
Word processor
155
WordStar 2000
Word processor
156
WordStar 6.0
Word processor
157
HP Printer Control Language (PCL)
Vector graphic
158
(UNIX/VAX/SUN) Executable
Executable
159
(UNIX/VAX/SUN) Object Module
Object module
160
(UNIX/VAX/SUN) Link Library
Library module
161
NeXT SUN Audio Data
Sound
162
NeWS font file (SUN)
Font
163
cpio Archive Format (UNIX/VAX/SUN)
Encapsulation
164
PEX Binary Archive (SUN)
Encapsulation
165
SUN vfont definition
Font
166
Curses Screen Image (UNIX/VAX/SUN)
Raster image
167
UU Encoded Encryption File
Encapsulation
168
WriteNow
Word processor
169
PC Object Module
Object module
170
Microsoft Windows Group File
Miscellaneous
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Table 34 Major Formats
Number
Format
File Class
171
PC True Type Font
Font
172
Program Information File
Miscellaneous
173
PC COM executable file
Executable
174
Adobe FrameMaker Markup Language
Desktop
publishing
175
Stuff It Archive (Mac)
Encapsulation
176
PeachCalc Spreadsheet
Spreadsheet
177
Wang Office GDL Header Encapsulation
Encapsulation
178
WordPerfect 6.0
Word processor
179
Q & A for DOS
Word processor
180
Q & A for Windows
Word processor
181
DEC WPS PLUS
Word processor
182
DCX Fax format
Fax
183
Microsoft Windows OLE 2 Encapsulation
Encapsulation
184
Quattro Pro for Windows
Spreadsheet
185
Keyword Viewer Markup Format
186
EBCDIC Text
Word processor
187
DCS
Word processor
188
Microsoft Excel Spreadsheet 95, 2000
Spreadsheet
189
Microsoft Word for Windows 95
Word processor
190
UNIX SHAR Encapsulation
Encapsulation
191
Lotus Notes Bitmap
Raster image
192
UNIX Compress Encapsulation
Encapsulation
193
Lotus Notes CDF
Word processor
194
UNIX TAR Encapsulation
Encapsulation
195
WordPerfect Graphics V2.0 (WPG2)
Raster image
Vector graphic
196
XML Export SDK C Programming Guide
ODA/ODIF (FOD 26)
Word processor
•
•
• 359
•
•
•
Appendix E File Format Detection
Table 34 Major Formats
Number
Format
File Class
197
ALIS
Word processor
198
GZ Compress Encapsulation
Encapsulation
199
Envoy (EVY)
Word processor
200
Adobe Portable Document Format (PDF)
Word processor
201
KW ODA Internal Raw Bitmap (RBM)
Raster image
202
KW ODA G4 (G4)
Raster image
203
KW ODA G31D (G31)
Raster image
204
KW ODA Internal G32D (G32)
Raster image
205
Microsoft Word for Mac V 4.x/5.x
Word processor
206
BinHex 4.0 encoded file
Encapsulation
207
SMTP document
Encapsulation
208
MIME format - Microsoft Outlook Express (EML)/
Mailbox (MBX)
Encapsulation
209
SGML document
Word processor
210
HTML document
Word processor
XHTML1
•
•
360 ••
•
•
211
ACT Format
Word processor
212
Microsoft PowerPoint 95
Presentation
213
Portable Network Graphics (PNG)
Raster image
214
Video for Windows
Movie
215
Windows Animated Cursor
Raster image
216
Windows C++ Object Storage
Mixed format
217
Windows Palette
Raster image
218
RIFF Device Independent Bitmap
Raster image
219
RIFF MIDI
Sound
220
RIFF Multimedia Movie
Movie
221
MPEG Movie
Movie
222
QuickTime Movie
Movie
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Table 34 Major Formats
Number
Format
File Class
223
Audio Interchange File Format (AIFF) Sound
Sound
224
Amiga MOD Sound
Sound
225
Amiga IFF (8SVX) Sound
Sound
226
Creative Voice (VOC) Sound
Sound
227
Microsoft Works (Windows)
Word processor
228
Microsoft Works Spreadsheet (Windows)
Spreadsheet
229
AutoDesk Animator FLIC Animation
Animation
230
AutoDesk Animator Pro FLIC Animation
Animation
231
Microsoft Works Database (Windows)
Database
232
Microsoft Works Communication (Windows)
Communications
233
Compactor / Compact Pro Archive
Encapsulation
234
VRML
Vector graphic
235
QuickDraw 3D Metafile (3DMF)
Vector graphic
236
PGP Secret Keyring
Encapsulation
237
PGP Public Keyring
Encapsulation
238
PGP Encrypted Data
Encapsulation
239
PGP Signed Data
Encapsulation
240
PGP Signed and Encrypted Data
Encapsulation
241
PGP Signature Certificate
Encapsulation
242
ASCII-armored PGP Public Keyring
Encapsulation
243
ASCII-armored PGP encoded
Encapsulation
244
ASCII-armored PGP signed
Encapsulation
245
OLE DIB object
Raster image
246
PGP Compressed Data
Encapsulation
247
SGI Image
Raster image
248
Lotus Screen Cam
Animation
249
MPEG Audio
Sound
XML Export SDK C Programming Guide
•
•
• 361
•
•
•
Appendix E File Format Detection
Table 34 Major Formats
•
•
362 ••
•
•
Number
Format
File Class
250
FTP Session Data
Communications
251
Netscape Bookmark file
Word processor
252
Corel Draw CMX
Vector image
253
AutoCAD Drawing (DWG)
Vector graphic
254
AutoDesk WHIP
Vector graphic
255
Macromedia Director
Animation
256
Real Audio
Sound
257
MS DOS Device Driver
Executable
258
Micrografx Designer
Vector graphic
259
Simple Vector format (SVF)
Vector graphic
260
WordPerfect Office document (WPD)
261
Applix Words
Word processor
262
Applix Graphics
Presentation
263
Microsoft Access
Database
264
Usenet format
Word processor
265
MacBinary
Encapsulation
266
Apple Single
Encapsulation
267
Apple Double
Encapsulation
268
Lotus Word Pro
Word processor
269
Microsoft Word 97, 2000
Word processor
270
Enhanced Window Metafile
Vector graphic
271
Microsoft Office Drawing
Vector graphic
272
Microsoft PowerPoint 97, 2000
Presentation
273
Extended or Custom XML
Word processor
274
Device Independent file (DVI)
Vector graphic
275
Unicode
Word processor
276
Framework
Mixed
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Table 34 Major Formats
Number
Format
File Class
277
KPIF Chart Stream
278
Applix Spreadsheet
Spreadsheet
279
Microsoft Device Independent Bitmap
Raster image
280
KeyView GPF Filter
281
Microsoft Project 98, 2000, 2002
Time scheduling
282
Folio Flat file
Word processor
283
HWP (Arae-Ah Hangul)
Word processor
284
JustSystems Ichitaro
Word processor
285
Generic XML format
Word processor
Microsoft Office 2003 XML format2
286
Fujitsu Oasys
Word processor
287
Portable Bitmap Utilities (PBM)
Raster image
288
Portable Greymap Utilities (PGM)
Raster image
289
Portable Pixmap Utilities (PPM)
Raster image
290
X Bitmap (XBM)
Raster image
291
X Pixmap (XPM)
Raster image
292
X Image
Raster image
293
PCD Image
Raster image
294
Microsoft Visio
Presentation
295
Microsoft Outlook (MSG)
Encapsulation
296
XHTML document
Word processor
297
Microsoft Outlook Personal Folders file (PST)
Encapsulation
298
WinRAR Compressed Archive format (RAR)
Encapsulation
299
Lotus Notes Database (NSF)
Legato Extender ONM
Encapsulation
300
Macromedia Flash
Word processor
301
Microsoft Word 2007 (XML format)
Word processor
302
Microsoft Excel 2007 (XML format)
Spreadsheet
XML Export SDK C Programming Guide
•
•
• 363
•
•
•
Appendix E File Format Detection
Table 34 Major Formats
•
•
364 ••
•
•
Number
Format
File Class
303
Microsoft PowerPoint 2007 (XML format)
Presentation
304
Open PGP (new format packets only)
Encapsulation
305
Intergraph version 7 DGN
Vector graphic
306
Microstation version 8 DGN
Vector graphic
307
Microsoft Word 2007 Macro
Word processor
308
Microsoft Excel 2007 Macro
Spreadsheet
309
Microsoft PowerPoint Macro
Presentation
310
Microsoft Compression folder (LZH)
Encapsulation
311
Office 2007 Document
Miscellaneous
312
XML Paper Specification
Word processor
313
Lotus Domino Extensible Language
Encapsulation
314
OASIS Open Document (ODT)
Word processor
315
OASIS Open Document (ODS)
Spreadsheet
316
OASIS Open Document (ODP)
Presentation
317
Legato EMailXtender Native Message
Word Processor
319
Transfer Neutral Encapsulation Format (TNEF)
Encapsulation
320
CADAM Drawing
Vector graphic
321
CADAM Drawing Overlay
Vector graphic
322
NURSTOR Drawing
Vector graphic
323
HP Graphics Language (Plotter)
Vector graphic
324
Advanced Systems Format
Miscellaneous
325
Windows Media Audio Format
Sound
326
Windows Media Video Format
Movie
327
Legato EMailXtender Archive
Encapsulation
328
7-Zip
Encapsulation
329
Microsoft Office 2007 Excel Binary Format
Spreadsheet
330
Microsoft Cabinet File
Encapsulation
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Table 34 Major Formats
Number
Format
File Class
331
CATIA formats
Vector graphic
332
Yahoo! Instant Messenger
Word processor
333
Founder Chinese E-paper Basic
Word processor
334
Corel Quattro Pro X4
Spreadsheet
335
MIME HTML
Word processor
336
Microsoft Document Imaging Format
Raster image
337
Microsoft Office Groove File Format
Word processor
338
Apple iWorks Pages
Word processor
339
Apple iWorks Numbers
Spreadsheet
340
Apple iWorks Keynote
Presentation
341
Microsoft Backup File
Encapsulation
342
Microsoft Access 2007
Database
343
Microsoft Entourage Database
Encapsulation
344
Mac Disk Copy Disk Image File
Encapsularion
345
Appleworks File
Word processor
346
Omni Outliner (OO3) File
Word processor
347
Omni Outliner (OPML) File
Word processor
348
Omni Graffle XML File
Vector graphic
349
Apple Photoshop Document
Raster image
350
Apple Binary Property List
Miscellaneous
351
Apple iChat Format
Word processor
352
Omni Outliner (OOUTLINE) File
Word processor
353
Bzip 2 Compressed File
Encapsulation
354
ISO-9660 CD Disc Image Format
Encapsulation
355
Xerox DocuWorks
Word processor
356
RealMedia Streaming Media
Movie
357
AC3 Audio File Format
Sound
XML Export SDK C Programming Guide
•
•
• 365
•
•
•
Appendix E File Format Detection
Table 34 Major Formats
•
•
366 ••
•
•
Number
Format
File Class
358
Nero Encrypted File
Encapsulation
359
SolidWorks
Vector graphic
366
Extensible Forms Description Language
Presentation
367
Apple XML Property List
Miscellaneous
368
OneNote Note Format
Presentation
370
Digital Imaging and Communications in Medicine
(DICOM)
Raster image
371
Expert Witness Compression Format
Encapsulation
372
Shell Scrap Object File
Encapsulation
373
Microsoft Project 2007
Time scheduling
374
Microsoft Publisher 98–
Desktop
publishing
375
Skype Log File
Word processor
376
Lotus Notes Bitmap Format (DXL embedded
images)
Raster image
377
Health level7 message
Word processor
378
Microsoft Outlook Offline Storage File
Encapsulation
379
Open Publication Structure eBook
Word processor
380
Microsoft Outlook Express DBX
Encapsulation
381
BlackBerry Activation File
Word processor
382
Disk Image
Encapsulation
383
Milestone
Raster Image
384
RealLegal E-Transcript File
Word processor
385
PostScript Type 1 Font
Font
386
Ghost Disk Image File
Encapsulation
387
JPEG-2000 JP2 File Format Syntax (ISO/IEC
15444-1)
Raster Image
388
Unicode HTML
Word processor
389
Microsoft Compiled HTML Help
Encapsulation
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Table 34 Major Formats
Number
Format
File Class
390
Documentum EMCMF
Encapsulation
393
JBIG2 File
Raster image
395
AD1 Evidence file
Encapsulation
397
Group Wise File Surf email
Encapsulation
409
Microsoft Outlook for Macintosh
Encapsulation
412
Microsoft Outlook vCard Contact
Word processor
414
Microsoft Outlook iCalendar
Encapsulation
1. If the major version is 100, the file format is XHTML.
2. The major version determines whether the Microsoft Office XML file is a Word, Excel or Visio
document. The major version for each format is as follows:
Word: 100
Excel: 101
Visio: 110
XML Export SDK C Programming Guide
•
•
• 367
•
•
•
Appendix E File Format Detection
Table 35 File Classes
•
•
368 ••
•
•
Attribute
Number
File Class
0
No file class
01
Word processor
02
Spreadsheet
03
Database
04
Raster image
05
Vector graphic
06
Presentation
07
Executable
08
Encapsulation
09
Sound
10
Desktop publishing
11
Outline/planning
12
Miscellaneous
13
Mixed format
14
Font
15
Time scheduling
16
Communications
17
Object module
18
Library module
19
Fax
20
Movie
21
Animation
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Table 36 Minor Formats
Attribute
Number
Minor Format
00
Minor format not defined
01
Standard
02
Book
03
Chart
04
Macro
05
Text
06
Binary
07
PC
08
Windows
09
DOS
10
Macintosh
11
RGB
12
TIFF
13
IFF
14
Experimental
15
Format Information
16
RLE
17
Symbol
18
Old
19
Footnote
20
Style
21
Palette
22
Configuration
23
Activity
24
Resource
25
Calculation
XML Export SDK C Programming Guide
•
•
• 369
•
•
•
Appendix E File Format Detection
Table 36 Minor Formats
Attribute
Number
Minor Format
26
Glossary
27
Spelling
28
Thesaurus
29
Hyphenation
30
Miscellaneous
31
UNIX
32
VAX
33
Driver
34
Archive

•
•
370 ••
•
•
XML Export SDK C Programming Guide
APPENDIX F
File Formats and Extensions
This section lists the KeyView file format numbers and their associated file
extensions. It contains the following topics:

File Format and Extension Table
File Format and Extension Table
Table 37 lists the KeyView file format codes and the file extensions they are most
commonly associated with.
NOTE Table 37 is not a complete list of file extensions. KeyView returns
format codes based on file content, which cannot always be predicted from
the file extension. Some file extensions may also be associated with
multiple format numbers.
XML Export SDK C Programming Guide
•
•
• 371
•
•
•
Appendix F File Formats and Extensions
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Format Description
Associated File
Extension
AES_Multiplus_Comm_Fmt
1
Multiplus (AES)
PTF
ASCII_Text_Fmt
2
Text
MSDOS_Batch_File_Fmt
3
MS-DOS Batch File
BAT
Applix_Alis_Fmt
4
APPLIX ASTERIX
AX
BMP_Fmt
5
Windows Bitmap
BMP
CT_DEF_Fmt
6
Convergent Technologies DEF
Comm. Format
Corel_Draw_Fmt
7
Corel Draw
CDR
CGM_ClearText_Fmt
8
Computer Graphics Metafile
(CGM)
CGM1
CGM_Binary_Fmt
9
Computer Graphics Metafile
(CGM)
CGM1
CGM_Character_Fmt
10
Computer Graphics Metafile
(CGM)
CGM1
Word_Connection_Fmt
11
Word Connection
CN
COMET_TOP_Word_Fmt
12
COMET TOP
CEOwrite_Fmt
13
CEOwrite
DSA101_Fmt
14
DSA101 (Honeywell Bull)
DCA_RFT_Fmt
15
DCA-RFT (IBM Revisable
Form)
CDA_DDIF_Fmt
16
CDA / DDIF
DG_CDS_Fmt
17
DG Common Data Stream
(CDS)
CDS
Micrografx_Draw_Fmt
18
Windows Draw (Micrografx)
DRW
Data_Point_VistaWord_Fmt
19
Vistaword
DECdx_Fmt
20
DECdx
DX
Enable_WP_Fmt
21
Enable Word Processing
WPF
EPSF_Fmt
22
Encapsulated PostScript
EPS1
Preview_EPSF_Fmt
23
Encapsulated PostScript
EPS1
•
•
372 ••
•
•
CW
RFT
XML Export SDK C Programming Guide
File Format and Extension Table
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Format Description
Associated File
Extension
MS_Executable_Fmt
24
MSDOS/Windows Program
EXE
G31D_Fmt
25
CCITT G3 1D
GIF_87a_Fmt
26
Graphics Interchange Format
(GIF87a)
GIF1
GIF_89a_Fmt
27
Graphics Interchange Format
(GIF89a)
GIF1
HP_Word_PC_Fmt
28
HP Word PC
HW
IBM_1403_LinePrinter_Fmt
29
IBM 1403 Line Printer
I4
IBM_DCF_Script_Fmt
30
DCF Script
IC
IBM_DCA_FFT_Fmt
31
DCA-FFT (IBM Final Form)
IF
Interleaf_Fmt
32
Interleaf
GEM_Image_Fmt
33
GEM Bit Image
IMG
IBM_Display_Write_Fmt
34
Display Write
IP
Sun_Raster_Fmt
35
Sun Raster
RAS
Ami_Pro_Fmt
36
Lotus Ami Pro
SAM
Ami_Pro_StyleSheet_Fmt
37
Lotus Ami Pro Style Sheet
MORE_Fmt
38
MORE Database MAC
Lyrix_Fmt
39
Lyrix Word Processing
MASS_11_Fmt
40
MASS-11
M1
MacPaint_Fmt
41
MacPaint
PNTG
MS_Word_Mac_Fmt
42
Microsoft Word for Macintosh
DOC1
SmartWare_II_Comm_Fmt
43
SmartWare II
MS_Word_Win_Fmt
44
Microsoft Word for Windows
DOC1
Multimate_Fmt
45
MultiMate
MM1
Multimate_Fnote_Fmt
46
MultiMate Footnote File
FNX1
Multimate_Adv_Fmt
47
MultiMate Advantage
Multimate_Adv_Fnote_Fmt
48
MultiMate Advantage Footnote
File
XML Export SDK C Programming Guide
•
•
• 373
•
•
•
Appendix F File Formats and Extensions
Table 37 KeyView file formats and extensions
•
•
374 ••
•
•
Format Name
Format
Number
Format Description
Associated File
Extension
Multimate_Adv_II_Fmt
49
MultiMate Advantage II
MM1
Multimate_Adv_II_Fnote_Fmt
50
MultiMate Advantage II
Footnote File
FNX1
Multiplan_PC_Fmt
51
Multiplan (PC)
Multiplan_Mac_Fmt
52
Multiplan (Mac)
MS_RTF_Fmt
53
Rich Text Format (RTF)
RTF
MS_Word_PC_Fmt
54
Microsoft Word for PC
DOC1
MS_Word_PC_StyleSheet_Fmt
55
Microsoft Word for PC Style
Sheet
DOC1
MS_Word_PC_Glossary_Fmt
56
Microsoft Word for PC Glossary
DOC1
MS_Word_PC_Driver_Fmt
57
Microsoft Word for PC Driver
DOC1
MS_Word_PC_Misc_Fmt
58
Microsoft Word for PC
Miscellaneous File
DOC1
NBI_Async_Archive_Fmt
59
NBI Async Archive Format
Navy_DIF_Fmt
60
Navy DIF
ND
NBI_Net_Archive_Fmt
61
NBI Net Archive Format
NN
NIOS_TOP_Fmt
62
NIOS TOP
FileMaker_Mac_Fmt
63
Filemaker MAC
FP5, FP7
ODA_Q1_11_Fmt
64
ODA / ODIF
OD1
ODA_Q1_12_Fmt
65
ODA / ODIF
OD1
OLIDIF_Fmt
66
OLIDIF (Olivetti)
Office_Writer_Fmt
67
Office Writer
OW
PC_Paintbrush_Fmt
68
PC Paintbrush Graphics (PCX)
PCX
CPT_Comm_Fmt
69
CPT
Lotus_PIC_Fmt
70
Lotus PIC
PIC
Mac_PICT_Fmt
71
QuickDraw Picture
PCT
Philips_Script_Word_Fmt
72
Philips Script
PostScript_Fmt
73
PostScript
PS
XML Export SDK C Programming Guide
File Format and Extension Table
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Format Description
PRIMEWORD_Fmt
74
PRIMEWORD
Quadratron_Q_One_v1_Fmt
75
Q-One V1.93J
Q11, QX1
Quadratron_Q_One_v2_Fmt
76
Q-One V2.0
Q11, QX1
SAMNA_Word_IV_Fmt
77
SAMNA Word
SAM
Ami_Pro_Draw_Fmt
78
Lotus Ami Pro Draw
SDW
SYLK_Spreadsheet_Fmt
79
SYLK
SmartWare_II_WP_Fmt
80
SmartWare II
Symphony_Fmt
81
Symphony
WR1
Targa_Fmt
82
Targa
TGA
TIFF_Fmt
83
TIFF
TIF, TIFF
Targon_Word_Fmt
84
Targon Word
TW
Uniplex_Ucalc_Fmt
85
Uniplex Ucalc
SS
Uniplex_WP_Fmt
86
Uniplex
UP
MS_Word_UNIX_Fmt
87
Microsoft Word UNIX
DOC1
WANG_PC_Fmt
88
WANG PC
WordERA_Fmt
89
WordERA
WANG_WPS_Comm_Fmt
90
WANG WPS
WF
WordPerfect_Mac_Fmt
91
WordPerfect MAC
WPM, WPD1
WordPerfect_Fmt
92
WordPerfect
WO, WPD1
WordPerfect_VAX_Fmt
93
WordPerfect VAX
WPD1
WordPerfect_Macro_Fmt
94
WordPerfect Macro
WordPerfect_Dictionary_Fmt
95
WordPerfect Spelling
Dictionary
WordPerfect_Thesaurus_Fmt
96
WordPerfect Thesaurus
WordPerfect_Resource_Fmt
97
WordPerfect Resource File
WordPerfect_Driver_Fmt
98
WordPerfect Driver
WordPerfect_Cfg_Fmt
99
WordPerfect Configuration File
XML Export SDK C Programming Guide
Associated File
Extension
•
•
• 375
•
•
•
Appendix F File Formats and Extensions
Table 37 KeyView file formats and extensions
•
•
376 ••
•
•
Format Name
Format
Number
Associated File
Extension
WordPerfect_Hyphenation_Fmt
100
WordPerfect Hyphenation
Dictionary
WordPerfect_Misc_Fmt
101
WordPerfect Miscellaneous
File
WPD1
WordMARC_Fmt
102
WordMARC
WM, PW
Windows_Metafile_Fmt
103
Windows Metafile
WMF1
Windows_Metafile_NoHdr_Fmt
104
Windows Metafile (no header)
WMF1
SmartWare_II_DB_Fmt
105
SmartWare II
WordPerfect_Graphics_Fmt
106
WordPerfect Graphics
WPG, QPG
WordStar_Fmt
107
WordStar
WS
WANG_WITA_Fmt
108
WANG WITA
WT
Xerox_860_Comm_Fmt
109
Xerox 860
Xerox_Writer_Fmt
110
Xerox Writer
DIF_SpreadSheet_Fmt
111
Data Interchange Format (DIF)
DIF
Enable_Spreadsheet_Fmt
112
Enable Spreadsheet
SSF
SuperCalc_Fmt
113
Supercalc
CAL
UltraCalc_Fmt
114
UltraCalc
SmartWare_II_SS_Fmt
115
SmartWare II
SOF_Encapsulation_Fmt
116
Serialized Object Format
(SOF)
SOF
PowerPoint_Win_Fmt
117
PowerPoint PC
PPT1
PowerPoint_Mac_Fmt
118
PowerPoint MAC
PPT1
PowerPoint_95_Fmt
119
PowerPoint 95
PPT1
PowerPoint_97_Fmt
120
PowerPoint 97
PPT1
PageMaker_Mac_Fmt
121
PageMaker for Macintosh
PageMaker_Win_Fmt
122
PageMaker for Windows
MS_Works_Mac_WP_Fmt
123
Microsoft Works for MAC
MS_Works_Mac_DB_Fmt
124
Microsoft Works for MAC
Format Description
XML Export SDK C Programming Guide
File Format and Extension Table
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Format Description
MS_Works_Mac_SS_Fmt
125
Microsoft Works for MAC
MS_Works_Mac_Comm_Fmt
126
Microsoft Works for MAC
MS_Works_DOS_WP_Fmt
127
Microsoft Works for DOS
WPS1
MS_Works_DOS_DB_Fmt
128
Microsoft Works for DOS
WDB1
MS_Works_DOS_SS_Fmt
129
Microsoft Works for DOS
MS_Works_Win_WP_Fmt
130
Microsoft Works for Windows
WPS1
MS_Works_Win_DB_Fmt
131
Microsoft Works for Windows
WDB1
MS_Works_Win_SS_Fmt
132
Microsoft Works for Windows
S30, S40
PC_Library_Fmt
133
DOS/Windows Object Library
MacWrite_Fmt
134
MacWrite
MacWrite_II_Fmt
135
MacWrite II
Freehand_Fmt
136
Freehand MAC
Disk_Doubler_Fmt
137
Disk Doubler
HP_GL_Fmt
138
HP Graphics Language
HPGL
FrameMaker_Fmt
139
FrameMaker
FM, FRM
FrameMaker_Book_Fmt
140
FrameMaker
BOOK
Maker_Markup_Language_Fmt
141
Maker Markup Language
Maker_Interchange_Fmt
142
Maker Interchange Format
(MIF)
MIF
JPEG_File_Interchange_Fmt
143
Interchange Format
JPG, JPEG
Reflex_Fmt
144
Reflex
Framework_Fmt
145
Framework
Framework_II_Fmt
146
Framework II
FW3
Paradox_Fmt
147
Paradox
DB
MS_Windows_Write_Fmt
148
Windows Write
WRI
Quattro_Pro_DOS_Fmt
149
Quattro Pro for DOS
Quattro_Pro_Win_Fmt
150
Quattro Pro for Windows
XML Export SDK C Programming Guide
Associated File
Extension
WB2, WB3
•
•
• 377
•
•
•
Appendix F File Formats and Extensions
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Format Description
Persuasion_Fmt
151
Persuasion
Windows_Icon_Fmt
152
Windows Icon Format
ICO
Windows_Cursor_Fmt
153
Windows Cursor
CUR
MS_Project_Activity_Fmt
154
Microsoft Project
MPP1
MS_Project_Resource_Fmt
155
Microsoft Project
MPP1
MS_Project_Calc_Fmt
156
Microsoft Project
MPP1
PKZIP_Fmt
157
ZIP Archive
ZIP
Quark_Xpress_Fmt
158
Quark Xpress MAC
ARC_PAK_Archive_Fmt
159
PAK/ARC Archive
ARC, PAK
MS_Publisher_Fmt
160
Microsoft Publisher
PUB1
PlanPerfect_Fmt
161
PlanPerfect
WordPerfect_Auxiliary_Fmt
162
WordPerfect auxiliary file
WPW
MS_WAVE_Audio_Fmt
163
Microsoft Wave
WAV
MIDI_Audio_Fmt
164
MIDI
MID, MIDI
AutoCAD_DXF_Binary_Fmt
165
AutoCAD DXF
DXF1
AutoCAD_DXF_Text_Fmt
166
AutoCAD DXF
DXF1
dBase_Fmt
167
dBase
DBF
OS_2_PM_Metafile_Fmt
168
OS/2 PM Metafile
MET
Lasergraphics_Language_Fmt
169
Lasergraphics Language
AutoShade_Rendering_Fmt
170
AutoShade Rendering
GEM_VDI_Fmt
171
GEM VDI
VDI
Windows_Help_Fmt
172
Windows Help File
HLP
Volkswriter_Fmt
173
Volkswriter
VW4
Ability_WP_Fmt
174
Ability
Ability_DB_Fmt
175
Ability
Ability_SS_Fmt
176
Ability
Ability_Comm_Fmt
177
Ability
•
•
378 ••
•
•
Associated File
Extension
XML Export SDK C Programming Guide
File Format and Extension Table
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Format Description
Ability_Image_Fmt
178
Ability
XyWrite_Fmt
179
XYWrite / Nota Bene
XY4
CSV_Fmt
180
CSV (Comma Separated
Values)
CSV
IBM_Writing_Assistant_Fmt
181
IBM Writing Assistant
IWA
WordStar_2000_Fmt
182
WordStar 2000
WS2
HP_PCL_Fmt
183
HP Printer Control Language
PCL
UNIX_Exe_PreSysV_VAX_Fmt
184
Unix Executable (PDP-11/
pre-System V VAX)
UNIX_Exe_Basic_16_Fmt
185
Unix Executable (Basic-16)
UNIX_Exe_x86_Fmt
186
Unix Executable (x86)
UNIX_Exe_iAPX_286_Fmt
187
Unix Executable (iAPX 286)
UNIX_Exe_MC68k_Fmt
188
Unix Executable (MC680x0)
UNIX_Exe_3B20_Fmt
189
Unix Executable (3B20)
UNIX_Exe_WE32000_Fmt
190
Unix Executable (WE32000)
UNIX_Exe_VAX_Fmt
191
Unix Executable (VAX)
UNIX_Exe_Bell_5_Fmt
192
Unix Executable (Bell 5.0)
UNIX_Obj_VAX_Demand_Fmt
193
Unix Object Module (VAX
Demand)
UNIX_Obj_MS8086_Fmt
194
Unix Object Module (old MS
8086)
UNIX_Obj_Z8000_Fmt
195
Unix Object Module (Z8000)
AU_Audio_Fmt
196
NeXT/Sun Audio Data
NeWS_Font_Fmt
197
NeWS bitmap font
cpio_Archive_CRChdr_Fmt
198
cpio archive (CRC Header)
cpio_Archive_CHRhdr_Fmt
199
cpio archive (CHR Header)
PEX_Binary_Archive_Fmt
200
SUN PEX Binary Archive
Sun_vfont_Fmt
201
SUN vfont Definition
Curses_Screen_Fmt
202
Curses Screen Image
XML Export SDK C Programming Guide
Associated File
Extension
AU
•
•
• 379
•
•
•
Appendix F File Formats and Extensions
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Format Description
Associated File
Extension
UUEncoded_Fmt
203
UU encoded
UUE
WriteNow_Fmt
204
WriteNow MAC
PC_Obj_Fmt
205
DOS/Windows Object Module
Windows_Group_Fmt
206
Windows Group
TrueType_Font_Fmt
207
TrueType Font
TTF
Windows_PIF_Fmt
208
Program Information File (PIF)
PIF
MS_COM_Executable_Fmt
209
PC (.COM)
COM
StuffIt_Fmt
210
StuffIt (MAC)
HQX
PeachCalc_Fmt
211
PeachCalc
Wang_GDL_Fmt
212
WANG Office GDL Header
Q_A_DOS_Fmt
213
Q & A for DOS
Q_A_Win_Fmt
214
Q & A for Windows
JW
WPS_PLUS_Fmt
215
WPS-PLUS
WPL
DCX_Fmt
216
DCX FAX Format(PCX images
DCX
OLE_Fmt
217
OLE Compound Document
OLE
EBCDIC_Fmt
218
EBCDIC Text
DCS_Fmt
219
DCS
UNIX_SHAR_Fmt
220
SHAR
Lotus_Notes_BitMap_Fmt
221
Lotus Notes Bitmap
Lotus_Notes_CDF_Fmt
222
Lotus Notes CDF
CDF
Compress_Fmt
223
Unix Compress
Z
GZ_Compress_Fmt
224
GZ Compress
GZ1
TAR_Fmt
225
TAR
TAR
ODIF_FOD26_Fmt
226
ODA / ODIF
F26
ODIF_FOD36_Fmt
227
ODA / ODIF
F36
ALIS_Fmt
228
ALIS
Envoy_Fmt
229
Envoy
•
•
380 ••
•
•
SHAR
EVY
XML Export SDK C Programming Guide
File Format and Extension Table
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Format Description
Associated File
Extension
PDF_Fmt
230
Portable Document Format
PDF
BinHex_Fmt
231
BinHex
HQX
SMTP_Fmt
232
SMTP
SMTP
MIME_Fmt
233
MIME2
EML, MBX
USENET_Fmt
234
USENET
SGML_Fmt
235
SGML
SGML
HTML_Fmt
236
HTML
HTM1, HTML1
ACT_Fmt
237
ACT
ACT
PNG_Fmt
238
Portable Network Graphics
(PNG)
PNG
MS_Video_Fmt
239
Video for Windows (AVI)
AVI
Windows_Animated_Cursor_Fmt
240
Windows Animated Cursor
ANI
Windows_CPP_Obj_Storage_Fmt
241
Windows C++ Object Storage
Windows_Palette_Fmt
242
Windows Palette
RIFF_DIB_Fmt
243
RIFF Device Independent
Bitmap
RIFF_MIDI_Fmt
244
RIFF MIDI
RIFF_Multimedia_Movie_Fmt
245
RIFF Multimedia Movie
MPEG_Fmt
246
MPEG Movie
MPG1, MPEG
QuickTime_Fmt
247
QuickTime Movie, MPEG-4
Audio
MOV, QT, MP4
AIFF_Fmt
248
Audio Interchange File Format
(AIFF)
AIF, AIFF
Amiga_MOD_Fmt
249
Amiga MOD
MOD
Amiga_IFF_8SVX_Fmt
250
Amiga IFF (8SVX) Sound
IFF
Creative_Voice_Audio_Fmt
251
Creative Voice (VOC)
VOC
AutoDesk_Animator_FLI_Fmt
252
AutoDesk Animator FLIC
FLI
AutoDesk_AnimatorPro_FLC_Fmt
253
AutoDesk Animator Pro FLIC
FLC
Compactor_Archive_Fmt
254
Compactor / Compact Pro
XML Export SDK C Programming Guide
PAL
RMI
•
•
• 381
•
•
•
Appendix F File Formats and Extensions
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Format Description
Associated File
Extension
VRML_Fmt
255
VRML
WRL
QuickDraw_3D_Metafile_Fmt
256
QuickDraw 3D Metafile
PGP_Secret_Keyring_Fmt
257
PGP Secret Keyring
PGP_Public_Keyring_Fmt
258
PGP Public Keyring
PGP_Encrypted_Data_Fmt
259
PGP Encrypted Data
PGP_Signed_Data_Fmt
260
PGP Signed Data
PGP_SignedEncrypted_Data_Fmt
261
PGP Signed and Encrypted
Data
PGP_Sign_Certificate_Fmt
262
PGP Signature Certificate
PGP_Compressed_Data_Fmt
263
PGP Compressed Data
PGP_ASCII_Public_Keyring_Fmt
264
ASCII-armored PGP Public
Keyring
PGP_ASCII_Encoded_Fmt
265
ASCII-armored PGP encoded
PGP1
PGP_ASCII_Signed_Fmt
266
ASCII-armored PGP encoded
PGP1
OLE_DIB_Fmt
267
OLE DIB object
SGI_Image_Fmt
268
SGI Image
Lotus_ScreenCam_Fmt
269
Lotus ScreenCam
MPEG_Audio_Fmt
270
MPEG Audio
MPEGA
FTP_Software_Session_Fmt
271
FTP Session Data
STE
Netscape_Bookmark_File_Fmt
272
Netscape Bookmark File
HTM1
Corel_Draw_CMX_Fmt
273
Corel CMX
CMX
AutoDesk_DWG_Fmt
274
AutoDesk Drawing (DWG)
DWG
AutoDesk_WHIP_Fmt
275
AutoDesk WHIP
WHP
Macromedia_Director_Fmt
276
Macromedia Director
DCR
Real_Audio_Fmt
277
Real Audio
RM
MSDOS_Device_Driver_Fmt
278
MSDOS Device Driver
SYS
Micrografx_Designer_Fmt
279
Micrografx Designer
DSF
SVF_Fmt
280
Simple Vector Format (SVF)
SVF
•
•
382 ••
•
•
RGB
XML Export SDK C Programming Guide
File Format and Extension Table
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Format Description
Associated File
Extension
Applix_Words_Fmt
281
Applix Words
AW
Applix_Graphics_Fmt
282
Applix Graphics
AG
MS_Access_Fmt
283
Microsoft Access
MDB1
MS_Access_95_Fmt
284
Microsoft Access 95
MDB1
MS_Access_97_Fmt
285
Microsoft Access 97
MDB1
MacBinary_Fmt
286
MacBinary
BIN
Apple_Single_Fmt
287
Apple Single
Apple_Double_Fmt
288
Apple Double
Enhanced_Metafile_Fmt
289
Enhanced Metafile
MS_Office_Drawing_Fmt
290
Microsoft Office Drawing
XML_Fmt
291
XML
XML1
DeVice_Independent_Fmt
292
DeVice Independent file (DVI)
DVI
Unicode_Fmt
293
Unicode
UNI
Lotus_123_Worksheet_Fmt
294
Lotus 1-2-3
WK11
Lotus_123_Format_Fmt
295
Lotus 1-2-3 Formatting
FM3
Lotus_123_97_Fmt
296
Lotus 1-2-3 97
WK11
Lotus_Word_Pro_96_Fmt
297
Lotus Word Pro 96
LWP1
Lotus_Word_Pro_97_Fmt
298
Lotus Word Pro 97
LWP1
Freelance_DOS_Fmt
299
Lotus Freelance for DOS
Freelance_Win_Fmt
300
Lotus Freelance for Windows
PRE
Freelance_OS2_Fmt
301
Lotus Freelance for OS/2
PRS
Freelance_96_Fmt
302
Lotus Freelance 96
PRZ1
Freelance_97_Fmt
303
Lotus Freelance 97
PRZ1
MS_Word_95_Fmt
304
Microsoft Word 95
DOC1
MS_Word_97_Fmt
305
Microsoft Word 97
DOC1
Excel_Fmt
306
Microsoft Excel
XLS1
Excel_Chart_Fmt
307
Microsoft Excel
XLS1
XML Export SDK C Programming Guide
EMF
•
•
• 383
•
•
•
Appendix F File Formats and Extensions
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Format Description
Associated File
Extension
Excel_Macro_Fmt
308
Microsoft Excel
XLS1
Excel_95_Fmt
309
Microsoft Excel 95
XLS1
Excel_97_Fmt
310
Microsoft Excel 97
XLS1
Corel_Presentations_Fmt
311
Corel Presentations
XFD, XFDL
Harvard_Graphics_Fmt
312
Harvard Graphics
Harvard_Graphics_Chart_Fmt
313
Harvard Graphics Chart
CH3, CHT
Harvard_Graphics_Symbol_Fmt
314
Harvard Graphics Symbol File
SY3
Harvard_Graphics_Cfg_Fmt
315
Harvard Graphics
Configuration File
Harvard_Graphics_Palette_Fmt
316
Harvard Graphics Palette
Lotus_123_R9_Fmt
317
Lotus 1-2-3 Release 9
Applix_Spreadsheets_Fmt
318
Applix Spreadsheets
AS
MS_Pocket_Word_Fmt
319
Microsoft Pocket Word
PWD, DOC1
MS_DIB_Fmt
320
MS Windows Device
Independent Bitmap
MS_Word_2000_Fmt
321
Microsoft Word 2000
DOC1
Excel_2000_Fmt
322
Microsoft Excel 2000
XLS1
PowerPoint_2000_Fmt
323
Microsoft PowerPoint 2000
PPT
MS_Access_2000_Fmt
324
Microsoft Access 2000
MDB1, MPP1
MS_Project_4_Fmt
325
Microsoft Project 4
MPP1
MS_Project_41_Fmt
326
Microsoft Project 4.1
MPP1
MS_Project_98_Fmt
327
Microsoft Project 98
MPP1
Folio_Flat_Fmt
328
Folio Flat File
FFF
HWP_Fmt
329
HWP(Arae-Ah Hangul)
HWP
ICHITARO_Fmt
330
ICHITARO V4-10
IS_XML_Fmt
331
Extended or Custom XML
XML1
Oasys_Fmt
332
Oasys format
OA2, OA3
•
•
384 ••
•
•
XML Export SDK C Programming Guide
File Format and Extension Table
Table 37 KeyView file formats and extensions
Format Name
Format
Number
PBM_ASC_Fmt
333
Portable Bitmap Utilities ASCII
Format
PBM_BIN_Fmt
334
Portable Bitmap Utilities Binary
Format
PGM_ASC_Fmt
335
Portable Greymap Utilities
ASCII Format
PGM_BIN_Fmt
336
Portable Greymap Utilities
Binary Format
PPM_ASC_Fmt
337
Portable Pixmap Utilities ASCII
Format
PPM_BIN_Fmt
338
Portable Pixmap Utilities Binary
Format
XBM_Fmt
339
X Bitmap Format
XBM
XPM_Fmt
340
X Pixmap Format
XPM
FPX_Fmt
341
FPX Format
FPX
PCD_Fmt
342
PCD Format
PCD
MS_Visio_Fmt
343
Microsoft Visio
VSD
MS_Project_2000_Fmt
344
Microsoft Project 2000
MPP1
MS_Outlook_Fmt
345
Microsoft Outlook
MSG, OFT
ELF_Relocatable_Fmt
346
ELF Relocatable
O
ELF_Executable_Fmt
347
ELF Executable
ELF_Dynamic_Lib_Fmt
348
ELF Dynamic Library
SO
MS_Word_XML_Fmt
349
Microsoft Word 2003 XML
XML1
MS_Excel_XML_Fmt
350
Microsoft Excel 2003 XML
XML1
MS_Visio_XML_Fmt
351
Microsoft Visio 2003 XML
VDX
SO_Text_XML_Fmt
352
StarOffice Text XML
SXW1, ODT1
SO_Spreadsheet_XML_Fmt
353
StarOffice Spreadsheet XML
SXC1, ODS1
SO_Presentation_XML_Fmt
354
StarOffice Presentation XML
SXI1, SXP1, ODP1
XHTML_Fmt
355
XHTML
XML1
XML Export SDK C Programming Guide
Format Description
Associated File
Extension
PGM
•
•
• 385
•
•
•
Appendix F File Formats and Extensions
Table 37 KeyView file formats and extensions
•
•
386 ••
•
•
Format Name
Format
Number
Format Description
Associated File
Extension
MS_OutlookPST_Fmt
356
Microsoft Outlook PST
PST
RAR_Fmt
357
RAR
RAR
Lotus_Notes_NSF_Fmt
358
IBM Lotus Notes Database
NSF/NTF
NSF
Macromedia_Flash_Fmt
359
SWF
SWF
MS_Word_2007_Fmt
360
Microsoft Word 2007 XML
DOCX, DOTX
MS_Excel_2007_Fmt
361
Microsoft Excel 2007 XML
XLSX, XLTX
MS_PPT_2007_Fmt
362
Microsoft PPT 2007 XML
PPTX, POTX, PPSX
OpenPGP_Fmt
363
OpenPGP Message Format
(with new packet format)
PGP
Intergraph_V7_DGN_Fmt
364
Intergraph Standard File
Format (ISFF) V7 DGN
(non-OLE)
DGN1
MicroStation_V8_DGN_Fmt
365
MicroStation V8 DGN (OLE)
DGN1
MS_Word_Macro_2007_Fmt
366
Microsoft Word Macro 2007
XML
DOCM, DOTM
MS_Excel_Macro_2007_Fmt
367
Microsoft Excel Macro 2007
XML
XLSM, XLTM, XLAM
MS_PPT_Macro_2007_Fmt
368
Microsoft PPT Macro 2007
XML
PPTM, POTM,
PPSM, PPAM
LZH_Fmt
369
LHA Archive
LZH, LHA
Office_2007_Fmt
370
Office 2007 document
XLSB
MS_XPS_Fmt
371
Microsoft XML Paper
Specification (XPS)
XPS
Lotus_Domino_DXL_Fmt
372
IBM Lotus representation of
Domino design elements in
XML format
DXL
ODF_Text_Fmt
373
ODF Text
ODT1, SXW1, STW
ODF_Spreadsheet_Fmt
374
ODF Spreadsheet
ODS1, SXC1, STC
ODF_Presentation_Fmt
375
ODF Presentation
SXD1, SXI1, ODG1,
ODP1
XML Export SDK C Programming Guide
File Format and Extension Table
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Legato_Extender_ONM_Fmt
376
Legato Extender Native
Message ONM
bin_Unknown_Fmt
377
n/a
TNEF_Fmt
378
Transport Neutral
Encapsulation Format (TNEF)
various
CADAM_Drawing_Fmt
379
CADAM Drawing
CDD
CADAM_Drawing_Overlay_Fmt
380
CADAM Drawing Overlay
CDO
NURSTOR_Drawing_Fmt
381
NURSTOR Drawing
NUR
HP_GLP_Fmt
382
HP Graphics Language
(Plotter)
HPG
ASF_Fmt
383
Advanced Systems Format
(ASF)
ASF
WMA_Fmt
384
Window Media Audio Format
(WMA)
WMA
WMV_Fmt
385
Window Media Video Format
(WMV)
WMV
EMX_Fmt
386
Legato EMailXtender Archives
Format (EMX)
EMX
Z7Z_Fmt
387
7 Zip Format(7z)
7Z
MS_Excel_Binary_2007_Fmt
388
Microsoft Excel Binary 2007
XLSB
CAB_Fmt
389
Microsoft Cabinet File (CAB)
CAB
CATIA_Fmt
390
CATIA Formats (CAT*)
CAT3
YIM_Fmt
391
Yahoo Instant Messenger
History
DAT1
ODF_Drawing_Fmt
392
ODF Drawing
SXD1, SXI1, ODG1
Founder_CEB_Fmt
393
Founder Chinese E-paper
Basic (ceb)
CEB
QPW_Fmt
394
Quattro Pro 9+ for Windows
QPW
MHT_Fmt
395
MHT format2
MHT
MDI_Fmt
396
Microsoft Document Imaging
Format
MDI
XML Export SDK C Programming Guide
Format Description
Associated File
Extension
ONM
•
•
• 387
•
•
•
Appendix F File Formats and Extensions
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Format Description
Associated File
Extension
GRV_Fmt
397
Microsoft Office Groove Format
GRV
IWWP_Fmt
398
Apple iWork Pages format
PAGES, GZ1
IWSS_Fmt
399
Apple iWork Numbers format
NUMBERS, GZ1
IWPG_Fmt
400
Apple iWork Keynote format
KEY, GZ1
BKF_Fmt
401
Windows Backup File
BKF
MS_Access_2007_Fmt
402
Microsoft Access 2007
ACCDB
ENT_Fmt
403
Microsoft Entourage Database
Format
DMG_Fmt
404
Mac Disk Copy Disk Image File
CWK_Fmt
405
AppleWorks File
OO3_Fmt
406
Omni Outliner File
OO3
OPML_Fmt
407
Omni Outliner File
OPML
Omni_Graffle_XML_File
408
Omni Graffle XML File
GRAFFLE
PSD_Fmt
409
Photoshop Document
PSD
Apple_Binary_PList_Fmt
410
Apple Binary Property List
format
Apple_iChat_Fmt
411
Apple iChat format
OOUTLINE_Fmt
412
OOutliner File
OOUTLINE
BZIP2_Fmt
413
Bzip 2 Compressed File
BZ2
ISO_Fmt
414
ISO-9660 CD Disc Image
Format
ISO
DocuWorks_Fmt
415
DocuWorks Format
XDW
RealMedia_Fmt
416
RealMedia Streaming Media
RM, RA
AC3Audio_Fmt
417
AC3 Audio File Format
AC3
NEF_Fmt
418
Nero Encrypted File
NEF
SolidWorks_Fmt
419
SolidWorks Format Files
SLDASM, SLDPRT,
SLDDRW
XFDL_Fmt
420
Extensible Forms Description
Language
XFDL, XFD
•
•
388 ••
•
•
XML Export SDK C Programming Guide
File Format and Extension Table
Table 37 KeyView file formats and extensions
Format Name
Format
Number
Format Description
Apple_XML_PList_Fmt
421
Apple XML Property List format
OneNote_Fmt
422
OneNote Note Format
ONE
Dicom_Fmt
424
Digital Imaging and
Communications in Medicine
DCM
EnCase_Fmt
425
Expert Witness Compression
Format (EnCase)
E01, L01, Lx01
Scrap_Fmt
426
Shell Scrap Object File
SHS
MS_Project_2007_Fmt
427
Microsoft Project 2007
MPP1
MS_Publisher_98_Fmt
428
Microsoft Publisher 98/2000/
2002/2003/2007/
PUB1
Skype_Fmt
429
Skype Log File
DBB
Hl7_Fmt
430
Health level7 message
HL7
MS_OutlookOST_Fmt
431
Microsoft Outlook OST
OST
Epub_Fmt
432
Electronic Publication
EPUB
MS_OEDBX_Fmt
433
Microsoft Outlook Express
DBX
DBX
BB_Activ_Fmt
434
BlackBerry Activation File
DAT1
DiskImage_Fmt
435
Disk Image
Milestone_Fmt
436
Milestone Document
MLS, ML3, ML4,
ML5, ML6, ML7,
ML8, ML9
E_Transcript_Fmt
437
RealLegal E-Transcript File
PTX
PostScript_Font_Fmt
438
PostScript Type 1 Font
PFB
Ghost_DiskImage_Fmt
439
Ghost Disk Image File
GHO, GHS
JPEG_2000_JP2_File_Fmt
440
JPEG-2000 JP2 File Format
Syntax (ISO/IEC 15444-1)
JP2, JPF, J2K,
JPWL, JPX, PGX
Unicode_HTML_Fmt
441
Unicode HTML
HTM1, HTML1
CHM_Fmt
442
Microsoft Compiled HTML Help
CHM
EMCMF_Fmt
443
Documentum EMCMF format
EMCMF
XML Export SDK C Programming Guide
Associated File
Extension
•
•
• 389
•
•
•
Appendix F File Formats and Extensions
Table 37 KeyView file formats and extensions
Format Name
Format
Number
MS_Access_2007_Tmpl_Fmt
444
Microsoft Access 2007
Template
ACCDT
Jungum_Fmt
445
Samsung Electronics Jungum
Global document
GUL
JBIG2_Fmt
446
JBIG2 File Format
JB2, JBIG2
EFax_Fmt
447
eFax file
EFX
AD1_Fmt
448
AD1 Evidence file
AD1
SketchUp_Fmt
449
Google SketchUp
SKP
GWFS_Email_Fmt
450
Group Wise File Surf email
GWFS
JNT_Fmt
451
Windows Journal format
JNT
Yahoo_yChat_Fmt
452
Yahoo! Messenger chat log
YCHAT
PaperPort_MAX_File_Fmt
453
PaperPort image file
MAX
ARJ_Fmt
454
ARJ (Archive by Robert Jung)
file format
ARJ
RPMSG_Fmt
455
Microsoft Outlook Restricted
Permission Message
RPMSG
MAT_Fmt
456
MATLAB file format
MAT, FIG
SGY_Fmt
457
SEG-Y Seismic Data format
SGY, SEGY
CDXA_MPEG_PS_Fmt
458
MPEG-PS container with
CDXA stream
MPG1
EVT_Fmt
459
Microsoft Windows NT Event
Log
EVT
EVTX_Fmt
460
Microsoft Windows Vista Event
Log
EVTX
MS_OutlookOLM_Fmt
461
Microsoft Outlook for
Macintosh format
OLM
WARC_Fmt
462
Web ARChive
WARC
JAVACLASS_Fmt
463
Java Class format
CLASS
VCF_Fmt
464
Microsoft Outlook vCard file
format
VCF
•
•
390 ••
•
•
Associated File
Extension
Format Description
XML Export SDK C Programming Guide
File Format and Extension Table
Table 37 KeyView file formats and extensions
Format Name
Format
Number
EDB_Fmt
465
Microsoft Exchange Server
Database file format
EDB
ICS_Fmt
466
Microsoft Outlook iCalendar file
format
ICS, VCS
MS_Visio_2013_Fmt
467
Microsoft Visio 2013
VSDX, VSTX, VSSX
MS_Visio_2013_Macro_Fmt
468
Microsoft Visio 2013 macro
VSDM, VSTM, VSSM
Format Description
Associated File
Extension
1. This file extension can return more than one format number.
2. MHT, EML, and MBX files may return either format 2, 233 or 395, depending on the text contained in the file. In general, files that contain fields such as To, From, Date, or Subject are considered e-mail messages; files that contain
fields such as content-type and mime-version are considered to be MHT files; and files that do not contain any of
those fields are considered to be text files.
3. All CAT file extensions, for example CATDrawing, CATProduct, CATPart, and so on.
XML Export SDK C Programming Guide
•
•
• 391
•
•
•
Appendix F File Formats and Extensions
•
•
392 ••
•
•
XML Export SDK C Programming Guide
APPENDIX G
Extract and Format Lotus
Notes Sub Files
This section describes how to create XML templates to alter the appearance of
extracted Lotus mail note sub-files so that they maintain the look and feel of the
original notes.

Overview

Customize XML Templates

Template Elements and Attributes

Date and Time Formats
Overview
KeyView uses the NSF reader, nsfsr, to extract Lotus database files, and places
Lotus mail notes in sub-files. The NSF reader uses a set of default XML templates
to extract the notes and apply formatting, thereby approximating the look and feel
of the original notes.
In some cases, you might need to customize the XML templates, for instance if
your notes contain custom data. In such cases, you can modify the existing XML
templates or create your own.
XML Export SDK C Programming Guide
•
•
• 393
•
•
•
Appendix G Extract and Format Lotus Notes Sub Files
During extraction, the NSF reader loads all XML files in the NSFtemplates
directory and its subdirectories (except for the NSFtemplates\images
directory, which is reserved for images). During initialization, the KeyView XML
parser verifies the XML templates. If the templates contain any invalid XML,
elements, or attributes, initialization fails and errors are recorded in the
nsfsr.log file.
Customize XML Templates
XML templates are enabled by default. In most cases, the default templates
should be sufficient; however, you can customize them or create your own as
required.
To customize XML templates for Lotus note extraction
1. Modify the template files in the following directory.
install\OS\bin\NSFtemplates
The main.xml file must exist in the NSFtemplates directory. It is the
top-level template file that extracts all sub-files, usually by calling other
templates.
2. Ensure that any modifications or additional XML files conform to the supported
elements and attributes described in “Template Elements and Attributes” on
page 395.
3. Extract the Lotus database file.
Use Demo Templates
For testing purposes, you can extract notes using a set of demo templates, which
are provided to demonstrate the proper usage of all the XML elements and
attributes, because the default templates do not use all the XML elements.
The demo templates are available at:
install\OS\bin\NSFtemplates\demo
To use the demo XML templates
1. In the formats.ini file, set the following parameter.
[nsfsr]
UseDemoTemplate=1
2. In the main.xml file, uncomment the following section.
•
•
394 ••
•
•
XML Export SDK C Programming Guide
Template Elements and Attributes
<ifini name="UseDemoTemplate" text="1">
<call file="demo.xml"/>
<quit/>
</ifini>
Use Old Templates
For testing purposes, you can extract notes using legacy templates, which
produce MHTML output. You can generate similar output by disabling the XML
templates, but using the old templates allows you to see the XML code and
compare it to the standard and demo templates.
To use the old XML templates
1. In the formats.ini file, set the following parameter.
[nsfsr]
UseOldTemplate=1
2. In the main.xml file, uncomment the following section.
<ifini name="UseOldTemplate" text="1">
<call file="default_old.xml"/>
<quit/>
</ifini>
Disable XML Templates
For testing purposes, you can disable XML templates; KeyView will extract the
notes in MHTML format. You can compare the MHTML output directly by the NSF
reader with the MHTML output indirectly by the NSF reader through the XML
templates.
To disable XML templates
 In the formats.ini file, set the following parameter.
[nsfsr]
ExtractByTemplate=0
Template Elements and Attributes
This section lists the valid XML elements and attributes that you can use when
creating or modifying templates. Refer to the demo templates for examples.
XML Export SDK C Programming Guide
•
•
• 395
•
•
•
Appendix G Extract and Format Lotus Notes Sub Files
Conditional Elements
The following table lists the valid conditional elements.
Table 38 Conditional elements
Element
Description
<keyview>
KeyView XML template container (“root”) element
<if*>
If condition from comparison is true, process XML.
Conditions can be nested up to 25 levels deep.
Attributes
 name. (Required) Name of main item to compare to
item or text.
 item. (Required if no text) Name of item to compare to
item specified by name.
 text. (Required if no item) Text to compare to item
specified by name.
<ifex>, <ifnx>
If name item exists and has a text value or not.
The Notes item might have a value that cannot be
converted to text, such as an image.
<ifeq>, <ifne>,
<iflt>, <ifle>,
<ifgt>, <ifge>
<iftdeq>, <iftdne>,
<iftdlt>, <iftdle>,
<iftdgt>, <iftdge>
Respectively, if text ==, !=, <, >, <=, >, >=.
Text comparison uses a case-insensitive string compare.
Respectively, if time/date ==, !=, <, >, <=, >, >=.
Time/date comparison converts dates to text in local time
using the Notes default, TZFMT_NEVER, because Notes
also sometimes converts fields to text internally. For
example:
text="06/30/2005 02:52:04 PM"
<iftzeq>, <iftzne>
Respectively, if the time zone equals or does not equal the
comparison text, for example CDT, EST, and so on.
<ifini>
If the value of the INI option specified in name equals the
text value.
<else>
If the condition from the last <if> or <switch> was false,
process XML.
<switch>
If name value exists, process XML.
Attributes
 name. (Required) Name of main item to compare in
<case> sub-elements.
•
•
396 ••
•
•
XML Export SDK C Programming Guide
Template Elements and Attributes
<iftdeq>, <iftdne>,
<iftdlt>, <iftdle>,
<iftdgt>, <iftdge>
Respectively, if time/date ==, !=, <, >, <=, >, >=.
Time/date comparison converts dates to text in local time
using the Notes default, TZFMT_NEVER, because Notes
also sometimes converts fields to text internally. For
example:
text="06/30/2005 02:52:04 PM"
<iftzeq>, <iftzne>
Respectively, if the time zone equals or does not equal the
comparison text, for example CDT, EST, and so on.
<ifini>
If the value of the INI option specified in name equals the
text value.
<else>
If the condition from the last <if> or <switch> was false,
process XML.
<switch>
If name value exists, process XML.
Attributes
 name. (Required) Name of main item to compare in
<case> sub-elements.
XML Export SDK C Programming Guide
•
•
• 397
•
•
•
Appendix G Extract and Format Lotus Notes Sub Files
Control Elements
The following table lists the valid control elements.
Table 39 Control Elements
Element
Description
<call>
Call another XML template. You can nest templates up to
10 levels deep.
Attributes
 file. (Required) Template file name. Must be unique.
<log>
Log message to the NSF log file.
Attributes
 text. (Required) Text to log.
 type. (Optional) Type of log message. The following
values are valid.
 ERROR
 WARN
 INFO
 DIAG (default)
 DEBUG
 DUMP
<quit>
Quit processing the template. Exits without error.
Attributes
 text. (Optional) Text to log.
 type. (Optional) Type of log message. See <log>.
<stop>
Stop processing the template. Exits with an ERROR type of
log message.
Attributes
 text. (Required) Text to log.
•
•
398 ••
•
•
XML Export SDK C Programming Guide
Template Elements and Attributes
Data Elements
The following table lists the valid data elements.
Table 40 Data elements
Element
Description
<text>
Output text.
Attributes
 name. (Required if no parent) Name of the item to
output.
<rich>
Output rich text (MHTML). Images are output in the next
part or parts of the MHTML, after the first <HTML> part.
Attributes
 name. (Required if no parent) Name of the item to
output.
<body>
Output the message body in rich text (MHTML). As with
<rich>, images are output in the next part or parts of the
MHTML.
<form>
Output the message form (usually $Body field) in rich text
(MHTML).
Attributes
 name. (Required if no parent) Name of the item to
output.
<addr>
Output an address.
Attributes
 name. (Required if no parent) Name of the item to
output.
 type. (Optional) Type of address to output. If you use
this attribute, you must set it to CN (Common Name),
which is the only supported type.
<name>
XML Export SDK C Programming Guide
Output the name of the last name item, or in other words the
current main item. The item must exist.
•
•
• 399
•
•
•
Appendix G Extract and Format Lotus Notes Sub Files
Table 40 Data elements
Element
Description
<format>
Set default format for <date> and <date_kv>. This
element does not set the <text> format. See “Date and
Time Formats” on page 401 for a list of all Notes and
KeyView date and time formats and integer values
Attributes
 format. (Optional. Omit to reset to defaults) Notes and
KeyView date/time format. You can set the following
formats:
 TD=int. Time Date format (TDFMT_*)
 TS=int. Time Show format (TSFMT_*)
 TT=int. Time Time format (TTFMT_*)
 TZ=int. Time Zone format (TZFMT_*)
 KV=int. KeyView date and time format.
where int is an integer value that corresponds to the
desired format.
Separate multiple formats with commas. For example:
format="TD=0,TS=2,TT=1,TZ=1,KV=55"
<date>
Output a Notes date.
Attributes
 name. (Required if no parent) Name of the item to
output.
 format. (Optional) See <format>. You can set the
following values:
 TD
 TS
 TT
 TZ
<date_kv>
Output a KeyView date.
Attributes
 name. (Required if no parent) Name of the item to
output.
 format. (Optional) See <format>. You can set the
following values:
 TZ
 KV
•
•
400 ••
•
•
XML Export SDK C Programming Guide
Date and Time Formats
Table 40 Data elements
Element
Description
<time>
Output a time range, for example 1 hour, 30 minutes.
Attributes
 name. (Required if no parent) Item name of the start
date/time.
 item. (Required) Item name of the end date/time.
<zone>
Output a Notes time zone mnemonic, for example MST.
Attributes
 name. (Required if no parent) Name of date item to
output.
<zone_utc>
Output a time zone as UTC, for example (UTC-06:00).
<logo>
Output the mail header logo.
The image link is output; the actual image is output to a
different part of the MHTML sub-file.
<image>
Output an image.
The image link is output; the actual image is output to the
MHTML next part, as with <rich> and <body>.
<image_uri>
Output an image URI, in quotes. The actual image is output
to a different part of the MHTML sub-file.
Attributes
 link. (Required if no file) The image link, such as a
form or title name. For example:
link=”StdNotesLtr0”
 file. (Required if no link) Image file name. The file
must exist in the ../../templates/images
directory. For example:
file=”boxcheck.gif”
Date and Time Formats
This section lists the supported Notes and KeyView date/time formats for use with
<format>, <date>, and <date_kv>.
XML Export SDK C Programming Guide
•
•
• 401
•
•
•
Appendix G Extract and Format Lotus Notes Sub Files
Lotus Notes Date and Time Formats
Table 41 lists supported Lotus Notes date and time formats, and the integer
values that specify each one.
Table 41 Lotus Notes date and time formats
•
•
402 ••
•
•
Format
Integer
Value
Description
TDFMT_FULL
0
(Notes default) Year, month, and day.
TDFMT_CPARTIAL
1
Month and day, year if not this year.
TDFMT_PARTIAL
2
Month and day.
TDFMT_DPARTIAL
3
Year and month.
TDFMT_FULL4
4
Four-digit year, month, and day.
TDFMT_CPARTIAL4
5
Month and day, four-digit year if not this
year.
TDFMT_DPARTIAL4
6
Four-digit year and month
TTFMT_FULL
0
(Notes default) Hour, minute, and second.
TTFMT_PARTIAL
1
Hour and minute.
TTFMT_HOUR
2
Hour.
TZFMT_NEVER
0
(Notes default) All time zones are
converted to current time zone.
TZFMT_SOMETIMES
1
Show only when outside the current time
zone.
TZFMT_ALWAYS
2
Show for all time zones.
TSFMT_DATE
0
Date.
TSFMT_TIME
1
Time.
TSFMT_DATETIME
2
(Notes default) Date and time.
TSFMT_CDATETIME
4
Date and time, or time Today or time
Yesterday.
XML Export SDK C Programming Guide
Date and Time Formats
KeyView Date and Time Formats
Table 42 lists KeyView date and time formats. The KeyView formats use the
following syntax:
Month
Month = full month name
Mon = abbreviated month name.
m = month (number)
mm = two-digit month (leading 0)
Weekday
Weekday = full weekday name
Wday = abbreviated weekday name
Year
yy = two-digit year
yyyy = four-digit year
Day
d = day (number)
dd = two-digit day (leading 0)
Time
h = 12-hour
H = 24-hour
m = minutes
s = seconds
P = AM/PM
p = am/pm
Separators
_ = space
c = comma
s = slash
a = dash
o = dot
Table 42 KeyView date and time formats
Output
Integer
Value
KVDTF_P
P
1
KVDTF_P_hmm
P h:mm
2
KVDTF_hmm_P
h:mm P
3
Format
12-Hour and 24-Hour Time Formats
XML Export SDK C Programming Guide
•
•
• 403
•
•
•
Appendix G Extract and Format Lotus Notes Sub Files
Table 42 KeyView date and time formats
Format
Output
Integer
Value
KVDTF_P_hhmm
P hh:mm
4
KVDTF_hhmm_P
hh:mm P
5
KVDTF_P_hhmmss
P hh:mm:ss
6
KVDTF_hhmmss_P
hh:mm:ss P
7
KVDTF_Hmm
H:mm
8
KVDTF_HHmm
HH:mm
9
KVDTF_mmss
mm:ss
10
KVDTF_Hmmss
H:mm:ss
11
KVDTF_HHmmss
HH:mm:ss
12
KVDTF_mmsdd
mm/dd
13
KVDTF_msdsyy
m/d/yy
14
KVDTF_mmsddsyy
mm/dd/yy
15
KVDTF_mmsddsyyyy
mm/dd/yyyy
16
KVDTF_ddsmm
dd/mm
17
KVDTF_ddsmmsyy
dd/mm/yy
18
KVDTF_ddsmmsyy_Hmm
dd/mm/yy H:mm
19
KVDTF_ddsmm_P_hmm
dd/mm P h:mm
20
KVDTF_ddsmm_hmm_P
dd/mm h:mm P
21
KVDTF_ddsmm_P_hhmm
dd/mm P hh:mm
22
KVDTF_ddsmm_hhmm_P
dd/mm hh:mm P
23
KVDTF_ddsmmsyy_P_hmm
dd/mm/yy P h:mm
24
KVDTF_ddsmmsyy_hmm_P
dd/mm/yy h:mm P
25
KVDTF_ddsmmsyy_P_hmmss
dd/mm/yy P h:mm:ss
26
KVDTF_ddsmmsyy_hmmss_P
dd/mm/yy h:mm:ss P
27
KVDTF_ddsmmsyy_P_hhmmss
dd/mm/yy P hh:mm:ss
28
KVDTF_ddsmmsyy_hhmmss_P
dd/mm/yy hh:mm:ss P
29
Numerical Date Formats with Slashes
•
•
404 ••
•
•
XML Export SDK C Programming Guide
Date and Time Formats
Table 42 KeyView date and time formats
Format
Output
Integer
Value
KVDTF_yysmmsdd_P_hhmmss
yy/mm/dd P hh:mm:ss
30
KVDTF_yysmmsdd_hhmmss_P
yy/mm/dd hh:mm:ss P
31
KVDTF_msdsyy_Hmm
m/d/yy H:mm
32
KVDTF_mmsddsyy_Hmm
mm/dd/yy H:mm
33
KVDTF_msdsyy_P_hmm
m/d/yy P h:mm
34
KVDTF_msdsyy_hmm_P
m/d/yy h:mm P
35
KVDTF_mmsddsyy_hmm_P
mm/dd/yy h:mm P
36
KVDTF_mmsdd_P_hhmm
mm/dd P hh:mm
37
KVDTF_mmsdd_hhmm_P
mm/dd hh:mm P
38
KVDTF_mmsddsyy_P_hhmmss
mm/dd/yy P hh:mm:ss
39
KVDTF_mmsddsyy_hhmmss_P
mm/dd/yy hh:mm:ss P
40
KVDTF_msd
m/d
41
KVDTF_yysm
yy/m
42
KVDTF_yysmm
yy/mm
43
KVDTF_yysmsd
yy/m/d
44
KVDTF_yysmmsdd
yy/mm/dd
45
KVDTF_yyyysmmsdd
yyyy/mm/dd
46
KVDTF_ddammayy
dd-mm-yy
47
KVDTF_mmadd
mm-dd
48
KVDTF_mmayy
mm-yy
49
KVDTF_yyammadd
yy-mm-dd
50
KVDTF_yyyyammadd
yyyy-mm-dd
51
KVDTF_yyyyammaddaHHmmss
yyyy-mm-dd-HH:mm:ss
52
KVDTF_yyomod
yy.m.d
53
KVDTF_yyommodd
yy.mm.dd
54
Numerical Date Formats with Dashes
Numerical Date Formats with Dots
XML Export SDK C Programming Guide
•
•
• 405
•
•
•
Appendix G Extract and Format Lotus Notes Sub Files
Table 42 KeyView date and time formats
Format
Output
Integer
Value
KVDTF_mod
m.d
55
KVDTF_mmodd
mm.dd
56
Numerical/String Date Formats with Dashes, Commas, and Spaces
•
•
406 ••
•
•
KVDTF_ddaMon
dd-Mon
57
KVDTF_daMonayy
d-Mon-yy
58
KVDTF_ddaMonayy
dd-Mon-yy
59
KVDTF_ddaMonayyyy
dd-Mon-yyyy
60
KVDTF_Mon
Mon
61
KVDTF_Monayy
Mon-yy
62
KVDTF_Monayyyy
Mon-yyyy
63
KVDTF_Monaddayy
Mon-dd-yy
64
KVDTF_yyammadd_P_hhmmss
yy-mm-dd P hh:mm:ss
65
KVDTF_mmadd_P_hhmm
mm-dd P hh:mm
66
KVDTF_Mon_yy
Mon yy
67
KVDTF_Monc_yy
Mon, yy
68
KVDTF_Month
Month
69
KVDTF_Monthayy
Month-yy
70
KVDTF_Month_yy
Month yy
71
KVDTF_Monthc_yy
Month, yy
72
KVDTF_Monthayyyy
Month-yyyy
73
KVDTF_Month_yyyy
Month yyyy
74
KVDTF_Monthc_yyyy
Month, yyyy
75
KVDTF_Mon_dc_yyyy
Mon d, yyyy
76
KVDTF_d_Monc_yyyy
d Mon, yyyy
77
KVDTF_yyyy_Mon_d
yyyy Mon d
78
KVDTF_Month_dc_yyyy
Month d, yyyy
79
KVDTF_d_Monthc_yyyy
d Month, yyyy
80
XML Export SDK C Programming Guide
Date and Time Formats
Table 42 KeyView date and time formats
Format
Output
Integer
Value
KVDTF_yyyy_Month_d
yyyy Month d
81
KVDTF_Wday
Wday
82
KVDTF_Weekday
Weekday
83
KVDTF_Wdayc_Mon_dc_yyyy
Wday, Mon d, yyyy
84
KVDTF_Weekdayc_Month_dc_yyyy
Weekday, Month d, yyyy
85
KVDTF_Weekdayc_d_Monthc_yyyy
Weekday, d Month, yyyy
86
Weekday Date Formats
XML Export SDK C Programming Guide
•
•
• 407
•
•
•
Appendix G Extract and Format Lotus Notes Sub Files
•
•
408 ••
•
•
XML Export SDK C Programming Guide
APPENDIX H
Password Protected Files
This section lists supported password-protected container and non-container files
and describes how to open them.

Supported Password Protected File Types

Open Password Protected Container Files

Export Password Protected Files
Supported Password Protected File Types
Table 44 lists the password-protected file types that KeyView supports.
Table 43 Key to support table
XML Export SDK C Programming Guide
Symbol
Description
Y
Format is supported.
N
Format is not supported.
S
Support for viewing sub-files.
V
Support for viewing content.
P
Password required.
C
Password and certificate or User
ID file required.
•
•
• 409
•
•
•
Appendix H Password Protected Files
Table 44 Supported password-protected file types
File Type
Version
Filter
Export
Extract
View
Credentials
PST (Windows)
n/a
N
N
Y
S
P
PST (non-Windows)1
n/a
N
N
Y
S
N
ZIP
n/a
N
N
Y
S
P
7-Zip
n/a
N
N
Y
S
P
RAR
n/a
N
N
Y
S
P
SMIME in MSG,
EML, MBX
n/a
N
N
Y
N
C
Lotus Notes NSF
n/a
N
N
Y
N
C
Adobe PDF
n/a
Y
Y
Y
V
P
Microsoft Office
97-2003
2007
2010
Y
Y
Y
V
P
1. The native PST reader, pstnsr, does not require credentials to open password-protected PST files that
use Compressible Encryption.
Open Password Protected Container Files
This section describes how to extract password-protected container files using the
C API. The following guidelines apply to specific file types.
•
•
410 ••
•
•

Lotus Notes NSF files. If you are running a Notes client with an active user
connected to a Domino server, you must specify the user’s password as a
credential regardless of whether the NSF files you are opening are protected.
This allows KeyView to access the Notes client and the Lotus Notes API. If the
Notes client is not running with an active user, KeyView does not require
credentials to access the client.

PST files.To open password-protected PST files that use High Encryption
(Microsoft Outlook 2003 only), you must use the MAPI-based PST reader
(pstsr). The native PST reader (pstnsr) returns the error message
KVERR_PasswordProtected if a PST is encrypted with High Encryption.
XML Export SDK C Programming Guide
Export Password Protected Files
To open container files
1. Define the credential information in the KVOpenFileArg data structure. See
“KVOpenFileArg” on page 173.
2. Pass KVOpenFileArg to the fpOpenFile() function. See “fpOpenFile()”
on page 157.
3. Call fpCloseFile(). See “fpCloseFile()” on page 147.
Export Password Protected Files
This section describes how to export password-protected non-container files with
the C API.
To export password-protected files
1. Call the fpInit() function. See “fpInit()” on page 199.
2. Call the KVXMLConfig() function with the following arguments (see
“KVXMLConfig()” on page 205):
Argument
Parameter
nType
KVCFG_SETPASSWORD
nValue
TRUE
pData
The source file password. The password is a
null-terminated string with a maximum length of
255 characters (the final byte is null).
For example:
(*fpXMLConfig)(pKVXML, KVCFG_SETPASSWORD, TRUE, password);
where password is a null-terminated string of 255 or fewer characters.
3. Call the fpConvertStream() or KVXMLConvertFile() function. See
“fpConvertStream()” on page 186 or “KVXMLConvertFile()” on page 214.
XML Export SDK C Programming Guide
•
•
• 411
•
•
•
Appendix H Password Protected Files
•
•
412 ••
•
•
XML Export SDK C Programming Guide
Index
Symbols
$ANCHOR 329
$BASE 329
$CHARSET 329
$CONTENT 330
$ENDNOTE 330
$FOOTER 330
$FOOTNOTE 330
$FOOTNOTEALL 330
$HEADER 330
$MAINURL 330
$NAME 330
$NEXT 330
$PREV 330
$SPLITBLOCKNUMBER 331
$STYLESHEET 330
$SUMMARY 331
$SUMMARYNN 331
$TOC 331
$TOCB 331
$TOCBE 331
$TOCE 331
$TOCPE 331
$TOCTE 331
$TOPANCHOR 331
$USERCB 332
$USERSUMMARY 332
$XANCHOR 332
7-Zip 295
7-Zip reader 327
A
absolute text positioning
PDF 58, 205– 212
XML Export SDK C Programming Guide
Abstract Windowing Toolkit 109
access layer 41
AD1 295
AD1 Evidence file reader 322
ad1sr 322
ADDOCINFO 169, 177, 234, 239
adInfo 239
adinfo.h 31, 34, 233, 234, 239, 270
Adobe Maker Interchange Format (MIF) 310
reader 325
Adobe PDF 299
advanced document readers
enabling in an existing installation 33
license information 32
Lotus Notes database (NSF) 86
Mailbox (MBX) 81
Microsoft Outlook Personal Folders 82
Advanced Systems Format (ASF) 304
afsr 322
allocating memory 42, 236
Ami Pro Graphics reader 325
anchor 36
token 329, 330, 331
Animated cursor reader 323
ANSI (TXT) 309
Apple iChat Log 310
Apple iChat Log reader 323
Apple iWork
Keynote (GZ) 305
Numbers (GZ) 307
Pages (GZ) 310
Apple iWork Keynote reader 324
Apple iWork Numbers reader 323
Apple iWork Pages reader 323
Applix
•
•
• 413
•
•
•
Index
Presents (AG) 305
Presents reader 323
Spreadsheets (AS) 307
Spreadsheets reader 322
Words (AW) 310
Words reader 322
architecture 40
archive formats 295
ASCII (TXT) 309
reader 322
assr 322
attachment
external path 149, 154, 177, 179
Audio Interchange File Format (AIFF) 304
AutoCAD Drawing
Exchange Format (DXF) 297
format (DWG) 297
AutoCAD Drawing Exchange format reader 323
AutoCAD Drawing format reader 323
AutoCAD reader 324
automatic heading generation 248, 267, 280
awsr 322
AWT
See Abstract Windowing Toolkit
B
bAllowHeadingsInTables 268
base URL
token 329
bEnableEmptyRows 260
bentofio 321
bForceOutputCharSet 100, 104, 256
bForceSrcCharSet 100, 256
bGenerateURLs 258
bHardPageMakesNewBlock 265
bi-directional text 333
in PDF file 112
right-to-left (RTL) tag 341
Big Endian 163, 167, 169, 179
binary files
supported 296
bIndexOnly 255
•
•
414 ••
•
•
BinHex 295
reader 325
bKeepServantAlive 217
bkfsr 322
block chunks 36
blocks 36
bMustBeBold 249
bMustBeItalic 249
bMustBeUnderlined 249
bNbspEmptyCells 256
bNoMultiSpaces 250
bNonZeroIndent 249
bNoTabs 249
bookmarks 28, 205, 207
converting to XLinks 205, 207
bPutBlocksInSeparateFiles 265
bRasterizeFiles 258
bRemoveEmptyColumns 260
bRemoveEmptyRows 260
bSupportCellSpan 260
bSupportColumnHeadings 259
bSupportColumnWidth 260
bSupportRowHeadings 259
bSupportRowSpan 260
bUseDocumentColors 256
bUseDocumentFontInfo 256
bUseExistingStyleSheet 255
bUseVerityDTD 254
Bzip2 295
bzip2sr 322
C
C API
configure XML element extraction 122
enable logical order for PDF files 114
enabling logical order for PDF files 208
extensible stylesheet language 108
extracting sub file metadata 73, 75
extracting sub files 61, 70
map styles 105
opening a file 61, 70
running in out-of-process mode 47
XML Export SDK C Programming Guide
D
style sheets 108
cabsr 322
cache
configuring 42
CAD. See Computer-aided design
callback functions 36, 225
Cascading Style Sheets 54, 108, 140
CATIA 297
cbAnchorMax 229
cbHTML 229
cbmap 321
cbString 238
cebsr 322
character encoding
supported 333
character entities 63
character set
determining output 99
force output 256
force source 256
license information 32
mapping 99
setting during file extraction 104
setting source 99, 103, 239
setting target 100
supported 341
token 329
character styles 104
charset 239
chartbls.ux 321
childArray 153, 180
chmdll 321
chmsr 322
chunks 36
CloseFile() 62, 147, 157
closing a file 62, 147, 157
cnv2xml sample program 35, 136
cnv2xmloop sample program 35, 137
Comma-Separated Values (CSV) 307
reader 322
compound documents 68
Computer Graphics Metafile (CGM) 299
XML Export SDK C Programming Guide
writer 323
Computer Graphics Metafile reader 323
Computer-aided design 286, 297
configuration options
setting 205
ConnectRetry 46
ConnectRetryInterval 46
container files 51, 68
archive 295
default filenames 91
determining number of sub file 151, 169, 170
email 302
example tree structure 69
recreating file hierarchy 70– 72, 153, 174
sub file infoflag 177
supported 302
Continue() 227
conversion options 52, 253, 262, 267
setting using template files 53
setting using the API 53
converting
spreadsheets 117– 119
XML files 120– 126, 140
ConvertStream() 61, 186, 195, 252
Corel
CorelDraw (CDR) 299
Draw reader 323
Presentations (SHW) 305
Presentations reader 325
Quattro Pro (QPW, WB3) 307
cRedact 259
credentials
defining for protected files 160, 173
cReplaceChar 259
CSS template 54
csvsr 322
cxVectorToRasterXRes 258, 261
cyVectorToRasterYRes 258, 261
D
Data Interchange Format (DIF) 307
Data Interchange Format reader 322
•
•
• 415
•
•
•
Index
dBase 298
dBase Database reader 322
dbfsr 322
dbxsr 322
DCA/RFT reader 322
dcasr 322
DCX (fax) reader 323
DCX Fax System 299
definition of terms 36
deleted text 110
detectPSTbyExtension 85
difsr 322
Digital Imaging and Communications in Medicine
299
directory structure 34
DiskCacheSize 42
DisplayWrite (IP) 310
reader 322
dmgsr 322
document readers 41, 352
document type 99, 239
Documentum EMCMF 302
Domino XML Language 302
Domino XML Language reader 322
DTD 63
character entities 63
modifying 64
root element 63
dw4sr 322
dwFlags 106, 242
dxlsr 322
E
eClass 234, 239
eEmptyParaType 259
eFormat 234, 239
eHardPageBreakType 259
eKVFormat 123, 245
email files
supported 302
embedded OLE objects 51, 68
converting using Conversion APIs 90
•
•
416 ••
•
•
converting using File Extraction APIs 90
linked 148, 153, 177
naming convention 92
reader 326
writer 320
emlsr 323
emxsr 323
Encapsulated PostScript (EPS) 299, 335
reader 324
encase2sr 323
encasesr 323
EndCharStyle 241
endnote token 330
ENdocAttributes 234
ENDocClass 234
ENdocFmt 234, 239, 245
Enhanced Windows Metafile (EMF) 299
reader 324
ENSATableBorder 271
entsr 323
eOutputLanguageID 256
eOutputRasterGraphicType 258
eOutputVectorGraphicType 109, 110, 258
epubsr 323
error codes 272
extended 245, 274
Outlook PST 276
eSATableBorder 257
eSrcCharSet 103, 256
Executable (EXE) 296
Expat XML parser 322
Expert Witness Compression Format (Encase) 295
Expert Witness Compression Format (EnCase) v6
reader 323
Expert Witness Compression Format (EnCase) v7
reader 323
Export Demo sample program 35, 56– 59, 142
extended error codes 245, 274
Extensible Forms Description Language 305
Extensible Forms Description Language reader 325
ExtractSubFile() 61, 70, 104, 148, 176
XML Export SDK C Programming Guide
F
F
file cache 42
configuring 42
file extraction
extract sub file 148
extraction flags 164
extraction path 165, 173, 176
get main file information 151, 169
get sub file information 153, 176, 178
get sub file metadata from mail formats 155,
167, 181
input parameters 163
Lotus Domino XML (DXL) 85
Lotus Notes database (NSF) 86
Mailbox (MBX) 81
Microsoft Outlook 80
Microsoft Outlook Express (EML) 80
Microsoft Outlook Personal Folders 81
output to file 165
output to stream 165
PDF file 90
sub file properties 177
ZIP file 91
File Extraction interface 52
entry point 60, 69, 146
file hierarchy 70– 72
childArray 153, 180
parentIndex 153, 180
file time
EPOCH 286
filenames
default for sub files 91
FileToInputStreamCreate() 48, 60, 61, 189,
190
FileToInputStreamFree() 189, 190, 191, 192
FileToOutputStreamCreate() 60, 191, 252
FileToOutputStreamFree() 192, 252
flags
extraction flags 164
KVCFG_DELSOFTHYPHEN 116, 208
KVCFG_DISABLEZONE 207
KVCFG_ENABLEPOSITIONINFO 207
XML Export SDK C Programming Guide
KVCFG_INCLREVISIONMARK 209, 211
KVCFG_INCLTRACKCHANGES 111
KVCFG_LOGICALPDF 114, 208
KVCFG_PG_HIDECOMMENT 127, 129, 209
KVCFG_PG_HIDEHIDDENSLIDE 126, 129, 209
KVCFG_PG_SHOWCOMMENTSSLIDE 127, 129,
210
KVCFG_PG_SHOWSLIDENOTES 127, 129, 210
KVCFG_SETPASSWORD 210, 211, 411
KVCFG_SETTEMPDIRECTORY 208
KVCFG_SETXMLCONFIGINFO 122, 208
KVCFG_SS_SHOWCOMMENTS 126, 129, 209
KVCFG_SS_SHOWFORMULA 126, 129, 209
KVCFG_SS_SHOWHIDDENINFOR 126, 129, 209
KVCFG_SUPPRESSIMAGES 255
KVCFG_SUPPRESSTOCPRINTIMAGE 207
KVCFG_WP_NOCOMMENTS 126, 129, 209
KVCFG_WP_SHOWDATEFIELDCODE 126, 129,
209
KVCFG_WP_SHOWFILENAMEFIELDCODE 126,
129, 209
KVCFG_WP_SHOWHIDDENTEXT 126, 129, 209
KVExtractionFlag_CreateDir 164, 166
KVExtractionFlag_ExcludeMailHeader
80, 164
KVExtractionFlag_GetFormattedBody
164
KVExtractionFlag_Overwrite 164, 166
KVExtractionFlag_SaveAsMSG 81, 164
KVMainFileInfoFlag_HasContent 169
KVOpenFileFlag_CreateRootNode 174
KVSubFileExtractInfoFlag_CharsetCon
verted 177
KVSubFileExtractInfoFlag_External
177
KVSubFileExtractInfoFlag_FileCreate
d 177
KVSubFileExtractInfoFlag_FolderCrea
ted 177
KVSubFileExtractInfoFlag_NeedsExtra
ction 177
KVSubFileExtractInfoFlag_NonFormatt
edBodyExtracted 177
KVSubFileInfoFlag_External 179
•
•
• 417
•
•
•
Index
KVSubFileInfoFlag_MailItem 179
KVSubFileInfoFlag_NeedsExtraction
179
KVSubFileInfoFlag_Secure 179
KVSubFileInfoFlag_SMIME 179
KVSubFileMetaInfoFlag_CharsetConver
ted 181
main file properties 169
metadata 181
open file 174
sub file properties 179
Flash reader 326
Folio Flat File (FFF) 310
reader 323
foliosr 323
fontSizeMax 249
fontSizeMin 249
footer token 330
footnote token 330
format detection 41, 99, 347– 352
ADDOCINFO 234
coding practice 234, 239
determining format support 348
extracting format information 348
file class 350
KVStreamInfo 239
major format 350, 352
major version 350
minor format 350
minor version 350
module 28, 294, 313, 347
translating format information 350
formats 293– 312
binary 296
container 302
container (email) 302
graphic 299
multimedia 304
presentations 305
word processing 310
formats_e 34
formats_e.ini 33, 34, 199, 320, 349
configuring file cache 42
•
•
418 ••
•
•
converting hidden text in spreadsheets 117
converting MSG files directly using the MSG
reader 52
determining document reader 352
enable logical order for PDF files 114
out-of-process configuration 44
formats.ini 350
formulas
extracting from Excel files 118
supported Excel formula functions 119
Founder Chinese E-paper Basic (CEB) 310
Founder Chinese E-paper Basic reader 322
fpCloseFile() 62, 147, 157
fpContinue() 273
fpConvertStream() 61, 186, 195, 252
callbacks 226
fpExtractSubFile() 61, 70, 104, 148, 176
fpFileToInputStreamCreate() 48, 60, 61,
189, 190, 252
fpFileToInputStreamFree() 189, 190, 252
fpFileToOutputStreamCreate() 60, 191, 192,
252
fpFileToOutputStreamFree() 191, 192, 252
fpFreeStruct() 148, 150, 151, 153, 155
fpGetAnchor() 193, 252
fpGetAuxOutput() 247
fpGetConvertFileList() 195, 252
fpGetMainFileInfo() 61, 70, 151, 169
fpGetStreamInfo() 99, 196, 234, 239, 252
fpGetSubFileInfo() 61, 70, 153, 178
fpGetSubFileMetadata() 155, 167
fpGetSummaryInfo() 96, 197, 244, 252
fpInit() 60, 199, 252
fpOpenFile() 61, 157, 173
fpSetStyleMapping() 105, 201, 241, 252
fpShutDown() 195, 202, 203, 227, 252
fpValidateTemplate() 204
fpXMLConfig() 111, 114
free File Extraction structures 150
FreeStruct() 148, 150, 151, 153, 155
Fujitsu Oasys (OA2) 310
reader 326
function suites 59
XML Export SDK C Programming Guide
G
G
generating minimal attributes 54
generating output with minimal markup and without
images 187, 215, 255
generating output with verbose markup and without
images 57, 58, 187, 205– 207, 255
GetAnchor() 193, 228, 230, 252
GetAuxOutput() 230, 247
GetConvertFileList() 195, 252
GetMainFileInfo() 61, 70, 151, 169
GetStreamInfo() 99, 196, 234, 239, 252
GetSubFileInfo() 61, 70, 153, 178
GetSubFileMetadata() 155, 167
GetSummaryInfo() 96, 197, 244, 252
glossary 36
Graphic Interchange Format (GIF) 300, 335
reader 324
graphics
displaying vector graphics on Windows 109
setting resolution 261
supported 299
suppressing 57, 58, 205, 213, 255
GroupWise FileSurf 302
GroupWise FileSurf reader 323
gwfssr 323
GZIP 295
reader 325
H
Hangul (HWP) 310
Hangul 2002, 2005, 2007 reader 323
header files 270
header token 330
heading generation 248, 267, 280
headingCreateType 268
Health level7 310
Health level7 reader 323
hidden data 126, 129
Excel
comments 126, 129, 209
formulas 126, 129, 209
XML Export SDK C Programming Guide
hidden information 126, 129, 209
PowerPoint
comments 127, 129, 209
comments slides 127, 129, 210
hidden slides 126, 129, 209
slide notes 127, 129, 210
toggle output 127, 130
Word
comments 126, 129, 209
date field codes 126, 129, 209
file name field codes 126, 129, 209
hidden text 126, 129, 209
hidden text
converting in spreadsheets 117
hl7sr 323
HTML 309
reader 323
HTML (MIME) 309
htmlexport 320
htmsr 323
hwposr 323
hyphenation 115, 206, 208
I
I/O model 60
IBM DCA/RFT (Revisable Form Text) (DC) 311
ichatsr 323
icssr 323
index mode 187, 208, 215
and hyphenation 115
index template 54
initialization function 60, 199, 252
input streams 48, 61
creating 189
extracting metadata 197
freeing 49, 62, 190
KVInputStream 235
installation
directory structure 34
error messages 200
ISO 295
ISO-9660 CD Disc Image Format reader 323
•
•
• 419
•
•
•
Index
isosr 323
iwsssr 323
iwwpsr 323
J
Java API
extensible stylesheet language 108
using style sheets 109
Java archive 295
javadoc 34
JBIG2 300, 335
JBIG2 reader 324
jp2000sr 323
JPEG 300, 335
reader 324
writer 324
JPEG 2000 300, 335
JPEG 2000 metadata reader 323
JPEG 2000 reader 324
jtdsr 323
JustSystems Ichitaro (JTD) 311
reader 323
K
kp3dwrld
kpagrdr
kpanirdr
kpbmprdr
kpbmpwrt
kpcdrrdr
kpcgmrdr
kpcgmwrt
kpchtrdr
kpdcxrdr
kpDWGrdr
kpDXFrdr
kpemfrdr
kpepsrdr
kpgifrdr
kpicordr
kpifcnvt
•
•
420 ••
•
•
321
323
323
323
323
323
323
323
321
323
323
323
324
324
324
324
320
kpifutil 320
kpIWPGrdr 324
kpJAVwrt 321
kpjbig2rdr 324
kpjp2000rdr 324
kpjpeg 321
kpjpgrdr 324
kpjpgwrt 324
kpmacrdr 324
kpmsordr 324
kpnbmprdr 324
kpODArdr 324
kpodfrdr 324
kpONErdr 324
kpp40rdr 324
kpp95rdr 324
kpp97rdr 324
kppctrdr 324
kppcxrdr 324
kppdf2rdr 324
kppdfrdr 324
kppicrdr 324
kppng 321
kppngrdr 324
kppngwrt 324
kpppxrdr 324
kpprerdr 324
kpprzrdr 324
kpsdwrdr 325
kpsgirdr 325
kpSHWrdr 325
kpsunrdr 325
kptgardr 325
kptifrdr 325
kpvsdrdr 325
kpwg2rdr 325
kpwmfrdr 325
kpwmfwrt 325
kpwpgrdr 325
kpxfdlrdr 325
KV_Bool 286
KV_ClipBoard 286
XML Export SDK C Programming Guide
K
KV_DateTime 243, 286
KV_IEEE8 243, 286
KV_Int4 286
KV_Other 286
KV_String 243, 286
KV_Unicode 243, 286
kv.lic 32, 34, 321
updating in existing installation 33
KVCFG_DELSOFTHYPHEN 116, 137, 208
KVCFG_DISABLEZONE 207
KVCFG_ENABLEPOSITIONINFO 137, 138, 207
KVCFG_INCLREVISIONMARK flag 209, 211
KVCFG_INCLTRACKCHANGES flag 111
KVCFG_LOGICALPDF 208
KVCFG_LOGICALPDF flag 114
KVCFG_PG_HIDECOMMENT flag 127, 129, 209
KVCFG_PG_HIDEHIDDENSLIDE flag 126, 129, 209
KVCFG_PG_SHOWCOMMENTSSLIDE flag 127, 129,
210
KVCFG_PG_SHOWSLIDENOTES flag 127, 129, 210
KVCFG_SETPASSWORD flag 210, 211, 411
KVCFG_SETTEMPDIRECTORY 208
KVCFG_SETXMLCONFIGINFO 122, 208
KVCFG_SS_SHOWCOMMENTS flag 126, 129, 209
KVCFG_SS_SHOWFORMULA flag 126, 129, 209
KVCFG_SS_SHOWHIDDENINFOR flag 126, 129, 209
KVCFG_SUPPRESSIMAGES 137, 138, 255
KVCFG_SUPPRESSTOCPRINTIMAGE 207
KVCFG_WP_NOCOMMENTS flag 126, 129, 209
KVCFG_WP_SHOWDATEFIELDCODE flag 126, 129,
209
KVCFG_WP_SHOWFILENAMEFIELDCODE flag 126,
129, 209
KVCFG_WP_SHOWHIDDENTEXT flag 126, 129, 209
KVCharSet 99, 104, 163, 167
KVCredential 160, 173
KVCredentialComponent 161
KVEPT_EMPTY 282
KVEPT_SUPPRESS 282
KVEPT_VERBOSE 282
KVERR_ADSNotFound 273
KVERR_ArchiveFatalError 274
KVERR_ArchiveFileNotFound 274
XML Export SDK C Programming Guide
KVERR_AutoDetFail 273
KVERR_AutoDetNoFormat 273
KVERR_badInputStream 273
KVERR_badOutputType 273
KVERR_ChildTimeOut 274
KVERR_CreateOutputFileFailed 273
KVERR_CreateProcessFailed 273
KVERR_CreateTempFileFailed 273
KVERR_DLLNotFound 273
KVERR_ErrorWritingToOutputFile 273
KVERR_FormatNotSupported 273
KVERR_General 273
KVERR_NoReader 273
KVERR_OutOfCore 273
KVERR_PasswordProtected 273, 410
KVERR_processCancelled 273
KVERR_ReaderInitError 273
KVERR_SUCCESS 273
KVERR_WaitForChildFailed 273
KVError_CompressionNotSupported 277
KVError_GPF 275
KVError_InputFileNotFound 275
KVError_InterfaceFunctionNotFound 275
KVError_InvalidArgs 276
KVError_InvalidOopDriverSignature 277
KVError_InvalidOopServiceSignature 277
KVError_IPCTimeOut 276
KVError_KVoopLogFailed 275
KVError_MemoryLeak 275
KVError_MemoryOverwrite 275
KVError_OopBadConfig 276
KVError_OopBrokenPipe 276
KVError_OopCore 275
KVError_OopPipeOEF 276
KVError_OpenOutputFileFailed 275
KVError_OpenStreamFailure 275
KVError_OutputFileExists 164
KVError_OverNestedFileLimit 275
KVError_PasswordRequired 276
KVError_PSTAccessFailed 276
KVError_ReaderUsageDenied 276
KVError_ZeroFile 277
•
•
• 421
•
•
•
Index
KVErrorCode 272
KVErrorCodeEx 274
KVExtractInterface 146, 162
KVExtractionFlag_CreateDir 164, 166
KVExtractionFlag_ExcludeMailHeader 80,
164
KVExtractionFlag_GetFormattedBody 164
KVExtractionFlag_Overwrite 164, 166
KVExtractionFlag_SaveAsMSG 81, 164
KVExtractSubFileArg 148, 163
KVFileType_Main 163
KVGetExtractInterface() 60, 69, 146
KVGetSubFileMetaArg 167
KVGFX_CGM 280
KVGFX_GIF 280
KVGFX_JAVA 280
KVGFX_JPEG 280
KVGFX_PNG 280
KVGFX_WMF 280
kvgraph 321
kvgzsr 325
KVHC_CreateHeadingsAlways 281
KVHC_DocHeadingsOnly 281
KVHeadingCreateOptions 280
KVHPBT_EMPTY 283
KVHPBT_EMPTYID 283
KVHPBT_ID 283
KVHPBT_SUPPRESS 282
kvhqxsr 325
KVInputStream 60, 173, 235
KVMainFileInfo 151, 169
KVMainFileInfoFlag_HasContent 151, 169
KVMemoryStream 236
KVMetadata_Binary 171, 284
KVMetadata_Bool 171, 284
KVMetadata_DateTime 171, 284
KVMetadata_Double 284
KVMetadata_Float 284
KVMetadata_Int4 171, 284
KVMetadata_Int8 284
KVMetadata_String 171, 284
KVMetadata_UInt4 284
•
•
422 ••
•
•
KVMetadata_UInt8 284
KVMetadata_Unicode 171, 284
KVMetadata_Unknown 284
KVMetadataElem 171
KVMetadataType 171, 283
KVMetaName 172
kvolefio 320
KVOpenFileArg 61, 69, 157, 173
KVOpenFileFlag_CreateRootNode 170, 174
KVOutputStream 165, 175, 237
kvpie 321
kvradar 321
kvraster.class 34, 321
KVSTR 238
KVStreamInfo 239
KVStructHead 240
KVStructInit 240
KVStyle 238, 241
KVSTYLE_DELETECONTENT 107
KVSTYLE_HEADING[1-6] 107
KVSTYLE_ONCONSECUTIVEPARAGRAPHS 107
KVSTYLE_ORDERLIST 107
KVSTYLE_PRE 107
KVSTYLE_REDACT 107
KVSTYLE_UNORDEREDLIST 107
KVSubFileExtractInfo 148, 176
KVSubFileExtractInfoFlag_CharsetConver
ted 177
KVSubFileExtractInfoFlag_External 148,
177
KVSubFileExtractInfoFlag_FileCreated
177
KVSubFileExtractInfoFlag_FolderCreated
177
KVSubFileExtractInfoFlag_NeedsExtracti
on 177
KVSubFileExtractInfoFlag_NonFormattedB
odyExtracted 177
KVSubFileInfo 153, 178
KVSubFileInfoFlag_External 153, 179
embedded objects in PowerPoint 154
KVSubFileInfoFlag_MailItem 179
XML Export SDK C Programming Guide
L
KVSubFileInfoFlag_NeedsExtraction 179,
180
KVSubFileInfoFlag_Secure 179
KVSubFileInfoFlag_SMIME 179
KVSubFileMetaData 181
KVSubFileMetaInfoFlag_CharsetConverted
181
KVSubFileType_Attachment 178, 179
KVSubFileType_Folder 178
KVSubFileType_Main 178, 180
KVSubFileType_OLE2 178
KVSumInfoElemEx 243
KVSumInfoType 243
KVSummaryInfoEx 197, 198, 244
KVSumType 96, 97, 331
KVT_ZONE token 207
kvtypes.h 34, 233, 270
kvutil 320
kvVector.class 321
kvvector.jar 34, 321
kvxconfig.ini 122, 123– 126, 208, 276, 321
and xmlini sample program 140
KVXConfigInfo 245
kvxml 320
KVXML library 48, 60
kvxml.h 31, 34, 233, 270
KVXMLAnchorType 278
KVXMLCallbacks 226, 247
KVXMLConfig 205
KVXMLConfig() 116, 122
export password-protected files 411
KVXMLConvertFile() 61, 214
callbacks 226
KVXMLEmptyParaType 281
KVXMLEndOOPSession() 217
KVXMLGetInterface 185
KVXMLGetInterface() 48, 60
KVXMLGraphicType 109, 279
KVXMLHardPageBreakType 282
KVXMLHeadingInfo 248
KVXMLInit() 198
KVXMLInterface 48, 60, 185, 251
KVXMLOptions 109, 230, 247, 253
XML Export SDK C Programming Guide
KVXMLSetStyleSheet 109, 219
KVXMLStartOOPSession() 221
KVXMLStyleSheetType 277
KVXMLTemplate 60, 97, 262, 329
KVXMLTOCOptions 267
kvxpgsa 320
kvxsssa 320
kvxtract 320
kvxtract.h 31, 161, 270
kvxwpsa 320
kvzeesr 325
kwad 294, 313, 320, 347
L
l123sr 325
language detection
license information 32
lasr 325
lcbBlockSize 265
lcbFilesize 235
lcbMaxMemUsage 259
Legato EMailXtender Archive 295
Legato EMailXtender archive (EMX) reader 323
Legato Extender 302
Libraries 319
license information
enabling a full version 32
kv.lic 32, 321
Link Library (DLL) 296
ListenerPortList 45
ListenerTimeout 45
logical reading order
direction flags 206
PDF file 112
Lotus
1-2-3
(123) 307
(WK4) 307
Charts (123) 307
V2 to 5 reader 327
V96/97/98 reader 325
AMI Draw Graphics (SDW) 300, 335
•
•
• 423
•
•
•
Index
AMI Pro (SAM) 311
reader 325
AMI Professional Write Plus 311
Domino XML (DXL)
file extraction 85
Freelance Graphics (PRE)
96/97/98 reader 324
reader 324
Freelance Graphics (SDW) 305
Notes
embedded image reader 324
Notes database
license information 32
Notes database (NSF) 68, 302
file extraction 86
installation and configuration 87
licensing 86
reader 326
system requirements 87
Pic (PIC) 300, 335
SmartMaster (MWP) 311
Word Pro 311
Word Pro (LWP) 311
reader 325
LPDF_AUTO 114, 115, 290
LPDF_DIRECTION 290
LPDF_LTR 114, 115, 290
LPDF_RAW 114, 115, 290
LPDF_RTL 114, 115, 290
lVersion 234, 239
lwpsr 325
M
Mac Disk Copy Disk Image 295
Mac Disk Copy Disk Image File reader 322
MacBinary 295
MacBinary reader 325
macbinsr 325
Macintosh Picture (PICT) reader 324
Macintosh Raster (PICT/PCT) 300, 335
MacPaint (PNTG) 300, 335
reader 324
•
•
424 ••
•
•
Macromedia Flash (SWF) 305
reader 326
mail
default list of metadata 73
extracting metadata 72– 80, 96, 155
metadata 168
Mailbox
license information 32
Mailbox (MBX) 68, 302
file extraction 81
licensing 81
reader 325
main file
get information 61, 70, 151
main URL token 330
MAPI 78, 82, 83
ATTACH_BY_REF_ONLY 84
ATTACH_BY_REF_RESOLVE 84
ATTACH_BY_REFERENCE 84
attachment methods 84
mapidefs.h 80
mapitags.h 80
PR_ATTACH_LONG_PATHNAME 84
PR_ATTACH_METHOD 84
PR_ATTACH_PATHNAME 84
property tag 78
supported property types 78
MAPI-based PST reader 82
mapping styles 104, 241
MarkUpEnd 242
MarkUpStart 242
maximum memory 259
maxParaLen 249
mbsr 325
mbxsr 325
mdbsr 325
memory allocation 42, 236
memory management 236
metadata 41, 55, 197, 243, 244, 285
custom metadata in PDF 116
data types 243, 283, 284
extracting 96– 98
XML Export SDK C Programming Guide
M
extracting default mail metadata 73, 155, 168
extracting default mail metadata set 72
extracting from mail formats 72– 80, 96, 155,
167
extracting from PST files 78
extracting mail metadata as text 80
field names 286
non-standard 96
sample program 35, 138
standard 96
token 97, 330, 331, 332
metaNameArray 155, 167
metaNameCount 155, 167
Microsoft
Access 298
Access (MDB)
reader 325
Drawing Objects reader 324
Excel
2007 XML reader 327
Binary Format 308
Charts (XLS) 307
Macintosh (XLS) 307
Windows (XLS) 308
Windows (XLSX) 308
Windows XML format (XLS) 309
Excel (XLS)
converting formulas 118
reader 327
supported formula functions 119
OneNote 305
OneNote reader 324
Outlook 68, 302
file extraction 80
metadata fields 74
Outlook (MSG)
convert directly using the MSG reader 52
reader 326
Outlook Express 68, 302
file extraction 80
Outlook Express (EML) 51
reader 323
XML Export SDK C Programming Guide
Outlook Personal Folders 68, 303
attachment methods 84
detect by extension 85
error codes 276
extracting metadata 78
file extraction 81
KVErrorPasswordRequired 276
license information 32
licensing 82
MAPI-based reader 83
native and MAPI-based reader 82
native reader 83
system requirements 83
Outlook Personal Folders (PST) 51
MAPI-based reader 326
native reader 326
pstnsr 326
pstsr.dll 326
PowerPoint
2007 XML reader 324
embedded objects 154
Macintosh (PPT) 306
PC (PPT) 306
Windows (PPT) 306
Windows (PPTX) 306
Project 298
Project (MPP)
reader 326
Rich Text Format (RTF)
reader 326
Visio 297
XML format (VDX) 309
Visio (VSD)
reader 327
Wave Sound (WAV) 304
Windows Bitmap (BMP) 336
Windows Write (WRI) 312
Word
2007 XML reader. 326
6/95 reader 326
97, 2000, XP reader 326
DOS reader 326
•
•
• 425
•
•
•
Index
Mac reader 325
Macintosh (DOC) 311
PC (DOC) 311
V2 reader 325
Windows (DOC) 311
Windows (DOCX) 311
Windows XML format (DOC) 309
Works
(WPS) 311
6, 2000 reader 326
Spreadsheet (S30,S40) 308
Spreadsheet reader 326
V1 and 2 reader 326
Write reader 326
Microsoft Backup File 295
Microsoft Backup File reader 322
Microsoft Cabinet format 295
Microsoft Cabinet format reader 322
Microsoft Compiled HTML Help 295
Microsoft Compiled HTML Help reader 322
Microsoft Compressed Folder 296
Microsoft Entourage Database 302
Microsoft Entourage Database Format reader 323
Microsoft Office 2007 Excel Binary Format reader
327
Microsoft Office Drawing 300
Microsoft OneNote 305
reader 324
Microsoft Outlook DBX 302
Microsoft Outlook Express DBX reader 322
Microsoft Outlook for Macintosh 302
Microsoft Outlook for Macintosh reader 326
Microsoft Outlook iCalendar 302
Microsoft Outlook iCalendar reader 323
Microsoft Outlook Offline Storage File 303
Microsoft Outlook Offline Storage File reader 326
Microsoft Outlook vCard Contact 303
Microsoft Outlook vCard Contact reader 327
Microsoft Publisher 298, 335
Microsoft Publisher reader 326
Microsoft Visio reader 325
MIDI (MID) 304
•
•
426 ••
•
•
mifsr 325
MIME HTML 309
minParaLen 249
misr 325
MP3 files 96
reader 326
mp3sr 326
MPEG-1
Audio layer 3 (MP3) 304
Video (MPG) 304
MPEG-2 Audio (MPEGA) 304
MPEG-4 Audio 304
mppsr 326
MSBLSB byte order 163, 167, 169, 179
mscomctl.ocx 321
msgsr 326
mspubsr 326
msvbvm60 321
MSVCP60.dll 321
msvcrt 321
msw6sr 326
mswsr 326
multi-byte support 333
multimedia files
supported 304
mw6sr 326
mw8sr 326
mwsr 326
mwssr.dll 326
mwxsr 326
N
namespace 125
native PST reader 82
nCompressionQuality 258
nElem 244
NeXT/Sun Audio (AU) 304
non-standard metadata 96
nRowsBeforeSplit 260
nsfsr 326
nSpaceAfter 250
nSpaceBefore 250
XML Export SDK C Programming Guide
O
creating 191
freeing 49, 62, 192
KVOutputStream 237
nTableBorderWidth 257
numSubFiles 151, 170
O
oa2sr 326
OASIS
Open Document Format (ODP) 306
Open Document Format (ODS) 308
Open Document Format (ODT) 312
ODF presentation
reader 324
ODF spreadsheets
reader 326
ODF word processing
reader 326
odfsssr 326
odfwpsr 326
oleaut32 321
olepro32 322
olesr 326
olmsr 326
Omni Graffle 300
Omni Outliner 312
Omni Outliner reader 326
oo3sr 326
Open Publication Structure eBook 312
Open Publication Structure eBook reader 323
OpenFile() 61, 157, 173
opening a file 61, 157
OpenOffice 312
Calc 308
Impress 306
out of process
configuration 44
conversions 43, 51
"keep servant active" option 217
sample program 137
temporary files 45
output stream
KVOutputStream 175
output streams 48, 61, 330
auxiliary 230
XML Export SDK C Programming Guide
P
page number
token 331
paragraph styles 104
parentIndex 153, 180
password-protected files 409
export 411
extract 410
supported file types 409
PC Paintbrush (PCX) 300, 336
reader 324
pCallbacks 226
pCallingContext 226
pcHTML 229
pcString 238
PDF file
absolute positioning of text 58, 205– 212
configuration options 112– 117
converting bi-directional text 112
converting PDFs with images 116
direction flags 206
enable logical order for PDF files in C API 114
enable logical order for PDF files in
formats_e.ini 114
enabling logical order in C API 208
extracting custom metadata 116
file extraction 90
generating XLinks 205, 207
graphic-based reader 324
high-fidelity graphic-based reader 324
logical reading order 112, 113, 114
pdfsr.ini 116
reader 326
specifying paragraph direction 112
specifying text flow in cnv2xml sample program
137
structured text stream 112
unstructured text stream 112, 114, 290
•
•
• 427
•
•
•
Index
pdfsr 326
pdfsr.ini 116
pElem 244
pffsr 326
Pictor PC Paint format (PIC) reader 324
PKZIP (ZIP) 296
Portable Network Graphics (PNG) 300, 336
reader 324
writer 324
PowerPoint
95 reader 324
97 reader 324
reader 324
presentations
setting resolution 261
supported 305
process_images_with_min_height 116
process_images_with_min_width 116
pstnsr 82, 83, 326
pstsr.dll 83, 326
pszBaseURL 257
pszChunkTemplate 265
pszDefaultOutputDirectory 230, 247, 257
pszEndBlock 265
pszExContent 246
pszExMeta 246
pszFirstH1End 263
pszFirstH1Start 263
pszH[2..6]XML 264
pszInAttribute 246
pszInContent 246
pszInMeta 245
pszJavaURL 257
pszLastH1End 263
pszLastH1Start 263
pszMainBottom 263
pszMainTop 263
pszMainURL 257
pszMiddleH1End 263
pszMiddleH1Start 263
pszPicPath 257
pszPicURL 257
•
•
428 ••
•
•
pszRoot 245
pszStartBlock 265
pszStyleSheet 255
pszTOC_H[1..6] 264
pszTOCH[1..6]End 264
pszTOCH[1..6]LeafNode 265
pszTOCH[1..6]Start 264
pszUserSummary 265
pszXEndBlock 264
pszXFile 264
pszXStartBlock 264
Q
qpssr 326
Quattro Pro Spreadsheet reader 326
QuickTime Movie (QT/MOV) 304
R
RAR Archive (RAR) 296
reader 326
rarsr 326
RasterPictureAnchor 229
RasterPictureAnchorEx 229
reader initialization error 273
redacted (hidden) text 107
redistributable files 319
regsvr32.exe 320
resolution
presentations 258
revision marks 206
revision tracking information 110
Rich Text Format (RTF) 309
root element 63
root node 153, 170, 174
creating 70
rtfsr 326
S
SA_BaseOnDocument 271
SA_Border 271
SA_NoBorder 271
XML Export SDK C Programming Guide
T
sample program
cnv2xml 35, 136
cnv2xmloop 35, 137
Export Demo 35, 56– 59, 142
metadata 35, 138
tstxtract 35, 135
xmlcallback 35, 141
xmlindex 35, 138
xmlini 35, 139
xmlmulti 36
xmlonefile 36, 141
sample template
for C API 53
secured NSF Files 89
secured PST Files 85
servant.exe 322
ServantName 46
SetStyleMapping 241, 252
SetStyleMapping() 105, 201
SGI RGB
Image 300, 336
reader 325
ShutDown() 195, 202, 203, 227, 252
single file for presentation template 55
single file template 55
single file with TOC template 55
Skype Log 312
Skype log file reader 326
skypesr 326
sosr 326
spreadsheets
converting 117– 119
converting headers and footers 117
converting hidden rows and columns 117
standard metadata 96
StarOffice 312
Calc 308
Impress 306
stderr 200
streams 36
auxiliary output 230
input 189, 190
XML Export SDK C Programming Guide
KVInputStream 235
KVOutputStream 175, 237
output 191, 192
structured access layer 41
style sheets 108
token 330
StyleName 242
styles
mapping 104
STYLESHEET_DISABLED 278
sub file
external path to 148, 153, 177
extract 61, 70, 148
extract metadata 155, 167
get information 61, 70, 153
summary information 41, 55, 197, 243, 244, 285
extracting 96– 98
token 97, 330, 331, 332
Sun Raster Image (RS) 300, 336
reader 325
supported formats 293– 312
suppressing graphics 57, 58, 205, 213, 255
swfsr 326
szExContentElement 124
szExMetaElement 124
szInAttribute 124
szInContentElement 124
szInMetaElement 123
szRoot 123
T
table border 271
table of contents
generating 267
token 331
Tagged Image File Format (TIFF) 301, 336
reader 325
Tape Archive (TAR) 296
reader 327
tarsr 327
TempFilePath 45
TempFileSizeMark 45
•
•
• 429
•
•
•
Index
template 53
C sample 53
css 54
index 54
single file 55
single file for presentations 55
single file with TOC 55
template file 55
map styles 105
setting conversion options 53
temporary files
out of process 45
terms 36
defined 36
Text Mail (MIME) 303
threads 47, 62, 200
tnefsr 327
token 36, 329– 332
anchor 329, 330, 331
base URL 329
character set 329
endnote 330
footer 330
footnote 330
header 330
main URL 330
metadata 330
page number 331
style sheet 330
table of contents 331
user callback 332
zone 207
token buffer 259
Track Changes 110, 206
Transfer Neutral Encapsulation Format (TNEF) 303
Transfer Neutral Encapsulation Format reader 327
Truevision Targa (TGA) 301, 336
reader 325
tstxtract sample program 35, 135
txtcnv 320
•
•
430 ••
•
•
U
ulAttributes 234
Unicode
reader 327
text 309
Unicode HTML 309
Unicode HTML reader 327
unihtmsr 327
unisr 327
UNIX
converting graphics on 109
UNIX Compress 296
reader 325
unzip 327
URL
base 329
main 330
user callback
function 232
token 332
UserCB() 232, 332
uudsr 327
UUEncoding (UUE) 296
reader 327
V
ValidateTemplate() 204
vcfsr 327
vector graphics
converting 109
VectorPictureAnchor 229
verbose markup 57, 58, 205, 255
Verity Document Type Definition 63, 64
Visio reader 327
vsdsr 327
W
W3C 63
WaitForConnectionTime 45
WaitForConvert 45
Windows
XML Export SDK C Programming Guide
X
Animated Cursor (ANI) 301, 336
Bitmap (BMP)
reader 323
writer 323
bitmap (BMP) 301
Icon Cursor 301
icon reader 324
Metafile (WMF)
reader 325
writer 325
metafile (WMF) 301, 336
Video (AVI) 304
Windows Scrap File 296
WinZIP (ZIP) 296
Wireless Markup Language 63
wkssr 327
WML 63
word processing files
supported 310
WordPad 312
WordPerfect
6.x to 10.x reader 327
Graphics 1 (WPG) 301, 336
Graphics 2 (WPG) 301, 336
Graphics reader 325
Linux 310
Macintosh 310
MacIntosh reader 327
reader 327
Windows (WO) 310
wosr 327
wp6sr 327
wpmap 322
wpmsr 327
X
XHTML 63, 309
detection 351
reader 323
xlsbsr 327
xlssr 327
xlsxsr 327
XML Export SDK C Programming Guide
XML
and format ID 123, 245
configuration flag 208
configuring custom document type 125
converting 120– 126
converting using xmlini sample program 140
Expat XML parser 322
extracting elements 125
generic 309
kvxconfig.ini 123– 126
modifying element extraction settings 121, 208
namespace 125
Paper Specification 312
reader 327
root element 121, 123
writers 41
XML Export API functions 183– 224
XML Paper Specification reader 327
XML Style Language Transformation 63
xml_css.ini 54
xml_index.ini 54, 255
xml1file_pg.ini 55
xml1file.ini 55
xml1filetoc.ini 55
xmlcallback sample program 35, 141
xmlcnv 320
XMLConfig() 111, 114
xmlexport 320
xmlindex sample program 35, 138
xmlini sample program 35, 53, 139
xmlmulti sample program 36
xmlonefile sample program 36, 141
xmlsh 322
xmlsr 327
xpssr 327
XSLT 63
XyWrite 312
reader 327
xywsr 327
Y
Yahoo! Instant Messenger 312
•
•
• 431
•
•
•
Index
Yahoo! Instant Messenger reader 327
yimsr 327
Z
z7zsr 327
Zip archive 296
reader 327
ZIP file extraction 91
zone
disable creation of 207
elements 207
•
•
432 ••
•
•
XML Export SDK C Programming Guide
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement