KeyView XML Export SDK 10.23 C Programming Guide

Add to my manuals
432 Pages

advertisement

KeyView XML Export SDK 10.23 C Programming Guide | Manualzz

KeyView

XML Export SDK

C

Programming Guide

Version 10.23

Document Revision 0

20 January 2015

Copyright Notice

Notice

This documentation is a proprietary product of Autonomy and is protected by copyright laws and international treaty. Information in this documentation is subject to change without notice and does not represent a commitment on the part of Autonomy. While reasonable efforts have been made to ensure the accuracy of the information contained herein, Autonomy assumes no liability for errors or omissions. No liability is assumed for direct, incidental, or consequential damages resulting from the use of the information contained in this documentation.

The copyrighted software that accompanies this documentation is licensed to the End User for use only in strict accordance with the End User

License Agreement, which the Licensee should read carefully before commencing use of the software. No part of this publication may be reproduced, transmitted, stored in a retrieval system, nor translated into any human or computer language, in any form or by any means, electronic, mechanical, magnetic, optical, chemical, manual or otherwise, without the prior written permission of the copyright owner.

This documentation may use fictitious names for purposes of demonstration; references to actual persons, companies, or organizations are strictly coincidental.

Trademarks and Copyrights

Copyright © 2015 Hewlett-Packard Development Company, L.P. ACI API, Alfresco Connector, Arcpliance, Autonomy Process Automation,

Autonomy Fetch for Siebel eBusiness Applications, Autonomy, Business Objects Connector, Cognos Connector, Confluence Connector,

ControlPoint, DAH, Digital Safe Connector, DIH, DiSH, DLH, Documentum Connector, DOH, EAS Connector, Ektron Connector, Enterprise

AWE, eRoom Connector, Exchange Connector, FatWire Connector, File System Connector for Netware, File System Connector, FileNet

Connector, FileNet P8 Connector, FTP Fetch, HTTP Connector, Hummingbird DM Connector, IAS, IBM Content Manager Connector, IBM

Seedlist Connector, IBM Workplace Fetch, IDOL Server, IDOL, IDOLme, iManage Fetch, IMAP Connector, Import Module, iPlanet Connector,

KeyView, KVS Connector, Legato Connector, LiquidOffice, LiquidPDF, LiveLink Web Content Management Connector, MCMS Connector,

MediClaim, Meridio Connector, Meridio, Moreover Fetch, NNTP Connector, Notes Connector, Objective Connector, OCS Connector, ODBC

Connector, Omni Fetch SDK, Open Text Connector, Oracle Connector, PCDocs Fetch, PLC Connector, POP3 Fetch, Portal-in-a-Box, RecoFlex,

Retina, SAP Fetch, Schlumberger Fetch, SharePoint 2003 Connector, SharePoint 2007 Connector, SharePoint 2010 Connector, SharePoint

Fetch, SpeechPlugin, Stellent Fetch, TeleForm, Tri-CR, Ultraseek, Verity Profiler, Verity, VersiForm, WebDAV Connector, WorkSite Connector, and all related titles and logos are trademarks of Hewlett-Packard Development Company, L.P. and its affiliates, which may be registered in certain jurisdictions.

Microsoft is a registered trademark, and MS-DOS, Windows, Windows 95, Windows NT, SharePoint, and other Microsoft products referenced herein are trademarks of Microsoft Corporation.

UNIX is a registered trademark of The Open Group.

AvantGo is a trademark of AvantGo, Inc.

Epicentric Foundation Server is a trademark of Epicentric, Inc.

Documentum and eRoom are trademarks of Documentum, a division of EMC Corp.

FileNet is a trademark of FileNet Corporation.

Lotus Notes is a trademark of Lotus Development Corporation.

mySAP Enterprise Portal is a trademark of SAP AG.

Oracle is a trademark of Oracle Corporation.

Adobe is a trademark of Adobe Systems Incorporated.

Novell is a trademark of Novell, Inc.

Stellent is a trademark of Stellent, Inc.

All other trademarks are the property of their respective owners.

Notice to Government End Users

If this product is acquired under the terms of a DoD contract: Use, duplication, or disclosure by the Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of 252.227-7013. Civilian agency contract: Use, reproduction or disclosure is subject to 52.227-19 (a) through

(d) and restrictions set forth in the accompanying end user agreement. Unpublished-rights reserved under the copyright laws of the United States.

Autonomy, Inc., One Market Plaza, Spear Tower, Suite 1900, San Francisco, CA. 94105, US.

20 January 2015

Contents

Tables

............................................................................................................................................. 13

Figures

........................................................................................................................................... 15

About This Document

.............................................................................................................. 17

Part 1 Overview of XML Export

Chapter 1

Introducing XML Export

......................................................................................................... 27

Overview ..................................................................................................................................... 27

Features...................................................................................................................................... 28

Platforms, Compilers and Dependencies ................................................................................... 29

Package Contents ...................................................................................................................... 31

License Information .................................................................................................................... 32

Enable Advanced Document Readers ................................................................................. 33

Update License Information ................................................................................................. 33

Directory Structure ..................................................................................................................... 34

Definition of Terms ..................................................................................................................... 36

Chapter 2

Getting Started

.......................................................................................................................... 39

Architectural Overview ................................................................................................................ 40

Memory Abstraction .................................................................................................................... 42

Enhance Performance................................................................................................................. 42

File Caching ......................................................................................................................... 42

Convert Files Out of Process ...................................................................................................... 43

Configure Out-of-Process Conversions ................................................................................ 44

Run Export Out of Process—Overview ................................................................................ 46

XML Export SDK C Programming Guide

3

Contents

4

Run Export Out of Process in the C API ............................................................................... 47

Convert Files .............................................................................................................................. 50

Sub File Extraction ...................................................................................................................... 51

Convert Outlook Email without Using the Extraction API ...................................................... 52

Set Conversion Options .............................................................................................................. 52

Set Conversion Options Using the API ................................................................................. 53

Set Conversion Options Using the Template Files ............................................................... 53

Templates ...................................................................................................................... 53

Use the Export Demo Program ................................................................................................... 56

Change Input/Output Directories .......................................................................................... 57

Set Configuration Options .................................................................................................... 57

Suppress Imagesn ......................................................................................................... 58

Using PDF Position Information ..................................................................................... 58

Convert Files ........................................................................................................................ 58

Use the C-Language Implementation of the API ......................................................................... 59

Input/Output Operations ....................................................................................................... 60

Convert Files ........................................................................................................................ 60

Multi-threaded Conversions ................................................................................................. 62

Use the Verity Document Type Definition (DTD) ......................................................................... 63

Use XML Style Language Transformation (XSLT) ................................................................ 63

Add Elements and Attributes to the DTD .............................................................................. 64

Move the DTD ...................................................................................................................... 64

Part 2 Use the Export API

Chapter 3

Use the File Extraction API

................................................................................................... 67

Introduction.................................................................................................................................. 68

Extract Sub Files ........................................................................................................................ 69

Recreate a File’s Hierarchy ........................................................................................................ 70

Create a Root Node ............................................................................................................. 70

Recreate a File’s Hierarchy—Example ................................................................................. 71

Extract Mail Metadata ................................................................................................................. 72

Default Metadata Set ............................................................................................................ 72

Extract the Default Metadata Set ................................................................................... 73

Microsoft Outlook (MSG) Metadata ...................................................................................... 74

Extract MSG-Specific Metadata ..................................................................................... 75

Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata .......................................... 76

Extract EML- or MBX-Specific Metadata ........................................................................ 76

XML Export SDK C Programming Guide

Contents

Lotus Notes Database (NSF) Metadata ............................................................................... 77

Extract NSF-Specific Metadata ...................................................................................... 77

Microsoft Personal Folders File (PST) Metadata .................................................................. 78

MAPI Properties ............................................................................................................ 78

Extract PST-Specific Metadata ...................................................................................... 79

Exclude Metadata from the Extracted Text File .................................................................... 80

Extract Sub Files from Outlook Files ........................................................................................... 80

Extract Sub Files from Outlook Express Files ............................................................................. 80

Extract Sub Files from Mailbox Files ........................................................................................... 81

Extract Sub Files from Outlook Personal Folders Files .............................................................. 81

Use the Native or MAPI-based Reader ................................................................................ 82

Use the Native PST Reader (pstnsr) ............................................................................. 83

Use the MAPI Reader (pstsr) ......................................................................................... 83

MAPI Attachment Methods .................................................................................................. 84

Open Secured PST Files ..................................................................................................... 85

Detect PST Files While the Outlook Client is Running ......................................................... 85

Extract Sub Files from Lotus Domino XML Language Files ........................................................ 85

Extract Sub Files from Lotus Notes Database Files ................................................................... 86

System Requirements .......................................................................................................... 87

Installation and Configuration ............................................................................................... 87

Open Secured NSF Files ..................................................................................................... 89

Format Note Sub Files ......................................................................................................... 89

Extract Sub Files from PDF Files ................................................................................................ 90

Extract Embedded OLE Objects.................................................................................................. 90

Extract Sub Files from ZIP Files ................................................................................................. 91

Default Filenames for Extracted Sub Files .................................................................................. 91

Default Filename for Mail Formats ....................................................................................... 91

Default Filename for Embedded OLE Objects ..................................................................... 92

Chapter 4

Use the XML Export API

......................................................................................................... 95

Extract Metadata ......................................................................................................................... 96

Extract Metadata Using the API ........................................................................................... 96

Extract Metadata Using a Template File .............................................................................. 96

Extract File Format Information .................................................................................................. 99

Convert Character Sets .............................................................................................................. 99

Determine the Character Set of the Output Text .................................................................. 99

Guidelines for Character Set Conversion .................................................................... 100

Examples of Character Set Conversion ............................................................................. 101

Document Character Set Can be Determined ............................................................. 102

XML Export SDK C Programming Guide

5

Contents

6

Document Character Set Cannot be Determined ......................................................... 103

Set the Character Set During Conversion .......................................................................... 103

Set the Character Set During File Extraction from a Container .......................................... 104

Map Styles ................................................................................................................................ 104

Use Style Sheets ...................................................................................................................... 108

Use Extensible Style Sheet Language (XSL) ..................................................................... 108

Use Cascading Style Sheets (CSS) ................................................................................... 108

Display Vector Graphics on UNIX and Linux ............................................................................ 109

Convert Revision Tracking Information ..................................................................................... 110

Convert PDF Files .................................................................................................................... 112

Convert PDF Files to a Logical Reading Order .................................................................. 112

Logical Reading Order and Paragraph Direction .......................................................... 112

Enable Logical Reading Order ..................................................................................... 113

Control Hyphenation .......................................................................................................... 115

Improve Performance for PDFs with Many Small Images .................................................. 116

Extract Custom Metadata from PDF Files .......................................................................... 116

Convert Spreadsheet Files ....................................................................................................... 117

Convert Hidden Text in Microsoft Excel Files ..................................................................... 117

Convert Headers and Footers in Microsoft Excel 2003 Files .............................................. 117

Specify Date and Time Format on UNIX Systems .............................................................. 118

Extract Microsoft Excel Formulas ....................................................................................... 118

Convert XML Files .................................................................................................................... 120

Configure Element Extraction for XML Documents ............................................................ 121

Modify Element Extraction Settings ............................................................................. 122

Modify Element Extraction Settings in the kvxconfig.ini File ......................................... 123

Specify an Element’s Namespace and Attribute .......................................................... 125

Add Configuration Settings for Custom XML Document Types .................................... 125

Show Hidden Data .................................................................................................................... 126

Hidden Data in Microsoft Documents ................................................................................. 126

Toggle Word Comment Settings in the formats_e.ini File ............................................ 127

Toggle PowerPoint Slide Note Settings in the formats_e.ini File .................................. 128

Show Hidden Data .................................................................................................................... 129

Hidden Data in Microsoft Documents ................................................................................. 129

Toggle Word Comment Settings in the formats_e.ini File ............................................ 130

Toggle PowerPoint Slide Note Settings in the formats_e.ini File .................................. 130

Chapter 5

Sample Programs

................................................................................................................... 133

Introduction................................................................................................................................ 133

C Sample Programs ........................................................................................................... 134

XML Export SDK C Programming Guide

Contents

Compile the Visual Basic Sample Program ........................................................................ 135 tstxtract .................................................................................................................................... 135

cnv2xml .................................................................................................................................... 136

cnv2xmloop .............................................................................................................................. 137

metadata .................................................................................................................................. 138 xmlindex ................................................................................................................................... 138

xmlini ........................................................................................................................................ 139

Use Style Sheets with xmlini .............................................................................................. 140

xmlcallback ............................................................................................................................... 141 xmlonefile ................................................................................................................................. 141

xmlmulti .................................................................................................................................... 142

Export Demo ............................................................................................................................ 142

Part 3 C API Reference

Chapter 6

File Extraction API Functions

............................................................................................ 145

KVGetExtractInterface() ........................................................................................................... 146

fpCloseFile() ............................................................................................................................. 147

fpExtractSubFile() .................................................................................................................... 148

fpFreeStruct() ........................................................................................................................... 150

fpGetMainFileInfo() .................................................................................................................. 151

fpGetSubFileInfo() .................................................................................................................... 153

fpGetSubFileMetaData() .......................................................................................................... 155

fpOpenFile() ............................................................................................................................. 157

Chapter 7

File Extraction API Structures

........................................................................................... 159

KVCredential ............................................................................................................................ 160

KVCredentialComponent .......................................................................................................... 161

KVExtractInterface ................................................................................................................... 162

KVExtractSubFileArg ................................................................................................................ 163

KVGetSubFileMetaArg ............................................................................................................. 167

KVMainFileInfo ......................................................................................................................... 169

KVMetadataElem ..................................................................................................................... 171

KVMetaName ........................................................................................................................... 172

KVOpenFileArg ........................................................................................................................ 173

KVOutputStream ...................................................................................................................... 175

KVSubFileExtractInfo ............................................................................................................... 176

KVSubFileInfo .......................................................................................................................... 178

XML Export SDK C Programming Guide

7

Contents

8

KVSubFileMetaData ................................................................................................................. 181

Chapter 8

XML Export API Functions

.................................................................................................. 183

KVXMLGetInterface() ............................................................................................................... 185

fpConvertStream() .................................................................................................................... 186

fpFileToInputStreamCreate() .................................................................................................... 189

fpFileToInputStreamFree() ....................................................................................................... 190

fpFileToOutputStreamCreate() ................................................................................................. 191

fpFileToOutputStreamFree() ..................................................................................................... 192

fpGetAnchor() ........................................................................................................................... 193

fpGetConvertFileList() ............................................................................................................... 195

fpGetStreamInfo() ..................................................................................................................... 196

fpGetSummaryInfo() ................................................................................................................. 197

fpInit() ....................................................................................................................................... 199

fpSetStyleMapping() ................................................................................................................. 201

fpShutDown() ............................................................................................................................ 203

fpValidateTemplate() ................................................................................................................ 204

KVXMLConfig() ......................................................................................................................... 205

Configuration Flags ............................................................................................................ 207

KVXMLConvertFile() ................................................................................................................. 214

KVXMLEndOOPSession() ........................................................................................................ 217

KVXMLSetStyleSheet() ............................................................................................................ 219

KVXMLStartOOPSession() ....................................................................................................... 221

Chapter 9

XML Export API Callback Functions

............................................................................... 225

Introduction ............................................................................................................................... 226

Continue() ................................................................................................................................. 227

GetAnchor() .............................................................................................................................. 228

GetAuxOutput() ........................................................................................................................ 230

UserCB() .................................................................................................................................. 232

XML Export API Structures

................................................................................................. 233

ADDOCINFO ............................................................................................................................ 234

KVInputStream ......................................................................................................................... 235

KVMemoryStream .................................................................................................................... 236

KVOutputStream ...................................................................................................................... 237

KVSTR ..................................................................................................................................... 238

KVStreamInfo ........................................................................................................................... 239

XML Export SDK C Programming Guide

Contents

KVStructHead .......................................................................................................................... 240

KVStyle .................................................................................................................................... 241

KVSumInfoElemEx ................................................................................................................... 243

KVSummaryInfoEx ................................................................................................................... 244

KVXConfigInfo .......................................................................................................................... 245

KVXMLCallbacks ..................................................................................................................... 247

KVXMLHeadingInfo .................................................................................................................. 248

KVXMLInterface ....................................................................................................................... 251

KVXMLOptions ......................................................................................................................... 253

KVXMLTemplate ...................................................................................................................... 262

KVXMLTOCOptions ................................................................................................................. 267

Enumerated Types

................................................................................................................. 269

Introduction ............................................................................................................................... 270

ENSATableBorder .................................................................................................................... 271

KVCredKeyType ...................................................................................................................... 271

KVErrorCode ............................................................................................................................ 272

KVErrorCodeEx ........................................................................................................................ 274

KVXMLStyleSheetType ............................................................................................................ 277

KVXMLAnchorType .................................................................................................................. 278

KVXMLGraphicType ................................................................................................................. 279

KVHeadingCreateOptions ........................................................................................................ 280

KVXMLEmptyParaType ........................................................................................................... 281

KVXMLHardPageBreakType .................................................................................................... 282

KVMetadataType ..................................................................................................................... 283

KVMetaNameType ................................................................................................................... 285

KVSumInfoType ....................................................................................................................... 285

KVSumType ............................................................................................................................. 286

LPDF_DIRECTION .................................................................................................................. 290

Appendixes

Appendix A

Supported Formats

................................................................................................................. 293

Supported Formats .................................................................................................................. 294

Archive Formats ................................................................................................................. 295

Binary Format .................................................................................................................... 296

Computer-Aided Design Formats ....................................................................................... 297

Database Formats ............................................................................................................. 298

XML Export SDK C Programming Guide

9

Contents

10

Desktop Publishing ............................................................................................................ 298

Display Formats ................................................................................................................. 299

Graphic Formats ................................................................................................................ 299

Mail Formats ...................................................................................................................... 302

Multimedia Formats ............................................................................................................ 304

Presentation Formats ......................................................................................................... 305

Spreadsheet Formats ......................................................................................................... 307

Text and Markup Formats .................................................................................................. 309

Word Processing Formats .................................................................................................. 310

Supported Formats (Detected) ................................................................................................. 313

Appendix B

Files Required for Redistribution

...................................................................................... 319

Core Files ................................................................................................................................. 320

Support Files ............................................................................................................................ 321

Document Readers and Writers ................................................................................................ 322

Document Type Definition Files ................................................................................................ 328

Appendix C

Export Tokens

........................................................................................................................... 329

Appendix D

Character Sets

........................................................................................................................... 333

Multi-Byte and Bi-Directional Support ....................................................................................... 333

Coded Character Sets .............................................................................................................. 341

Appendix E

File Format Detection

............................................................................................................. 347

Introduction................................................................................................................................ 347

Extract Format Information ........................................................................................................ 348

Determine Format Support ....................................................................................................... 348

Refine Detection of Text Files ............................................................................................ 349

Change the Amount of File Data to Read .................................................................... 349

Change the Percentage of Allowed Non-ASCII Characters ......................................... 349

Use the File Extension for Detection ............................................................................ 350

Translate Format Information .................................................................................................... 350

Distinguish Between Formats ............................................................................................. 351

Determine a Document Reader ................................................................................................. 352

Category Values in formats_e.ini .............................................................................................. 352

XML Export SDK C Programming Guide

Contents

Appendix F

File Formats and Extensions

.............................................................................................. 371

File Format and Extension Table .............................................................................................. 371

Appendix G

Extract and Format Lotus Notes Sub Files

.................................................................... 393

Overview ................................................................................................................................... 393

Customize XML Templates ...................................................................................................... 394

Use Demo Templates ........................................................................................................ 394

Use Old Templates ............................................................................................................ 395

Disable XML Templates ..................................................................................................... 395

Template Elements and Attributes ........................................................................................... 395

Conditional Elements ......................................................................................................... 396

Control Elements ............................................................................................................... 398

Data Elements ................................................................................................................... 399

Date and Time Formats ............................................................................................................ 401

Lotus Notes Date and Time Formats ................................................................................. 402

KeyView Date and Time Formats ...................................................................................... 403

Appendix H

Password Protected Files

.................................................................................................... 409

Supported Password Protected File Types .............................................................................. 409

Open Password Protected Container Files ............................................................................... 410

Export Password Protected Files ..............................................................................................411

Index

............................................................................................................................................. 413

XML Export SDK C Programming Guide

11

Contents

12

• XML Export SDK C Programming Guide

Tables

Table 1

Supported Compilers .................................................................................................. 30

Table 2

Supported Compilers for Java and .NET Components ............................................... 31

Table 3

XML Export Installed Directory Structure .................................................................... 34

Table 4

Architectural Components........................................................................................... 41

Table 5

Parameters for Out-of-Process Conversion ................................................................ 45

Table 6

Default Mail Metadata List .......................................................................................... 73

Table 7

MSG-specific Metadata List ........................................................................................ 74

Table 8

Document Character Set Can be Determined .......................................................... 102

Table 9

Document Character Set Cannot be Determined ..................................................... 103

Table

Flags for Defining Styles ........................................................................................... 107

Table

Supported Microsoft Excel Functions ....................................................................... 119

Table

Hidden data settings ................................................................................................. 126

Table

Hidden data settings ................................................................................................. 129

Table

Options for the cnv2xml Sample Program .............................................................. 137

Table

Options for the cnv2xmloop Sample Program........................................................ 138

Table

Options for the xmlini Sample Program ................................................................ 140

Table

Key to Support Tables .............................................................................................. 294

Table

Supported Archive Formats ...................................................................................... 295

Table

Supported Binary Formats ........................................................................................ 296

Table

Supported CAD Formats........................................................................................... 297

Table

Supported Database Formats ................................................................................... 298

Table

Supported Desktop Publishing Formats ................................................................... 298

Table

Supported Display Formats ...................................................................................... 299

Table

Supported Graphic Formats...................................................................................... 299

Table

Supported Mail Formats............................................................................................ 302

Table

Supported Multimedia Formats ................................................................................. 304

Table

Supported Presentation Formats .............................................................................. 305

Table

Supported Spreadsheet Formats .............................................................................. 307

Table

Supported Text and Markup Formats ....................................................................... 309

Table

Supported Word Processing Formats ....................................................................... 310

XML Export SDK C Programming Guide

13

Tables

Table

Export Tokens ...........................................................................................................329

Table

Multi-byte and bi-directional support..........................................................................333

Table

Code Character Sets .................................................................................................341

Table

Major Formats ...........................................................................................................352

Table

File Classes ...............................................................................................................368

Table

Minor Formats ...........................................................................................................369

Table

KeyView file formats and extensions.........................................................................372

Table

Conditional elements .................................................................................................396

Table

Control Elements .......................................................................................................398

Table

Data elements ...........................................................................................................399

Table

Lotus Notes date and time formats............................................................................402

Table

KeyView date and time formats.................................................................................403

Table

Key to support table...................................................................................................409

Table

Supported password-protected file types ..................................................................410

14

• XML Export SDK C Programming Guide

Figures

Figure 1

XML Export Architecture ............................................................................................. 40

Figure 2

Export Demo: Launching ............................................................................................ 56

Figure 3

Export Demo: Setting Directories................................................................................ 57

Figure 4

Export Demo: Converting Files ................................................................................... 59

Figure 5

Example Container File Tree Structure....................................................................... 69

Figure 6

Extracted PST File ...................................................................................................... 71

Figure 7

Recreated File Hierarchy ............................................................................................ 72

Figure 8

Document Character Set Can Be Determined .......................................................... 100

Figure 9

Document Character Set Cannot Be Determined ..................................................... 101

XML Export SDK C Programming Guide

15

Figures

16

• XML Export SDK C Programming Guide

About This Document

This guide is for developers who incorporate KeyView XML conversion technology into their custom Web applications using a C development environment. It is intended for readers who are familiar with XML and C.

Documentation Updates

Related Documentation

Conventions

Autonomy Customer Support

Contact Autonomy

Documentation Updates

The information in this document is current as of XML Export SDK version 10.23.

The content was last modified 20 January 2015.

You can retrieve the most current product documentation from the HP Autonomy

Knowledge Base on the Customer Support Site.

A document in the Knowledge Base displays a version number in its name, such as IDOL Server 7.5 Administration Guide. The version number applies to the product that the document describes. The document may also have a revision

number in its name, such as IDOL Server 7.5 Administration Guide Revision 6.

The revision number applies to the document and indicates that there were revisions to the document since its original release.

Autonomy recommends that you periodically check the Knowledge Base for revisions to documents for the products your enterprise is using.

To access Autonomy documentation

1. Go to the Autonomy Customer Support site: https://customers.autonomy.com/

XML Export SDK C Programming Guide

17

18

About This Document

2. Click Login.

3. Type the login credentials that you were given, and then click Login.

The Customer Support Site opens.

4. Click Knowledge Base.

The Knowledge Base Search page opens.

5. Search or browse the Knowledge Base.

To search the knowledge base: a. In the Search box, type a search term or phrase and click Search.

Documents that match the query display in a results list.

To browse the knowledge base: a. Select one or more of the categories in the Browse list. You can browse by:

Repository. Filters the list by Documentation produced by technical publications, or Solutions to Technical Support cases.

Product Family. Filters the list by product suite or division. For example, you could retrieve documents related to the iManage, IDOL,

Virage or KeyView product suites.

Product. Filters the list by product. For example, you could retrieve documents related to IDOL Server, Virage Videologger, or KeyView

Filter.

Version. Filters the list by product or component version number.

Type. Filters the list by document type. For example, you could retrieve Guides, Help, Packages (ZIP files), or Release Notes.

Format. Filters the list by document format. For example, you could retrieve documents in PDF or HTML format. Guides are typically provided in both PDF and HTML format.

6. To open a document, click its title in the results list.

To download a PDF version of a guide, open the PDF version, click the

Download icon in the PDF reader, and save the PDF to another location.

To download a documentation ZIP package, click Get Documentation

Package under the document title in the results list. Alternatively, browse to the desired ZIP package by selecting either the Packages document Type or the ZIP document Format from the Browse list.

Autonomy welcomes your comments.

XML Export SDK C Programming Guide

Related Documentation

To send feedback on Autonomy documentation

 send e-mail to [email protected]

 provide:

 full document title with version and revision number location: heading, a snippet of text or screen capture your comments your contact information in the event we need clarification

Related Documentation

The following documents provide more details on XML Export.

 XML Export Release Notes

 XML Export SDK Java Programming Guide

Conventions

The following conventions are used in this document.

Notational Conventions

This document uses the following conventions.

Convention

Bold

Italics

Usage

User-interface elements such as a menu item or button.

For example:

Click Cancel to halt the operation.

Document titles and new terms. For example:

 For more information, see the IDOL Server

Administration Guide.

 An action command is a request, such as a query or indexing instruction, sent to IDOL Server.

XML Export SDK C Programming Guide

19

20

About This Document

Convention monospace font monospace bold monospace italics

Usage

File names, paths, and code. For example:

The FileSystemConnector.cfg file is installed in

C:\Program Files\FileSystemConnector\ .

Data typed by the user. For example:

 Type run at the command prompt.

 In the User Name field, type Admin.

Replaceable strings in file paths and code. For example: user UserName

Command-Line Syntax Conventions

This document uses the following command-line syntax conventions.

Convention

[ optional ]

|

{ required }

Usage

Brackets describe optional syntax. For example:

[ -create ]

Bars indicate “either | or” choices. For example:

[ option1 ] | [ option2 ]

In this example, you must choose between option1 and option2.

Braces describe required syntax in which you have a choice and that at least one choice is required. For example:

{ [ option1 ] [ option2 ] }

In this example, you must choose option1, option2, or both options.

XML Export SDK C Programming Guide

Conventions

Notices

Convention required variable

<variable>

...

Usage

Absence of braces or brackets indicates required syntax in which there is no choice; you must type the required syntax element.

Italics specify items to be replaced by actual values. For example:

-merge filename1

(In some documents, angle brackets are used to denote these items.)

Ellipses indicate repetition of the same pattern. For example:

-merge filename1, filename2 [, filename3

... ] where the ellipses specify, filename4, and so on.

The use of punctuation—such as single and double quotes, commas, periods— indicates actual syntax; it is not part of the syntax definition.

This document uses the following notices:

CAUTION A caution indicates an action can result in the loss of data.

IMPORTANT An important note provides information that is essential to completing a task.

NOTE A note provides information that emphasizes or supplements important points of the main text. A note supplies information that may apply only in special cases—for example, memory limitations, equipment configurations, or details that apply to specific versions of the software.

XML Export SDK C Programming Guide

21

22

About This Document

TIP A tip provides additional information that makes a task easier or more productive.

Autonomy Customer Support

Autonomy Customer Support provides prompt and accurate support to help you quickly and effectively resolve any issue you may encounter while using

Autonomy products. Support services include access to the Customer Support

Site (CSS) for online answers, expertise-based service by Autonomy support engineers, and software maintenance to ensure you have the most up-to-date technology.

To access the Customer Support Site

 go to https://customers.autonomy.com

The Customer Support Site includes:

 Knowledge Base documentation, FAQs, and technical articles that is easy to navigate and search.

: The CSS contains an extensive library of end user

 Case Center : The Case Center is a central location to create, monitor, and manage all your cases that are open with technical support.

 Download Center : Products and product updates can be downloaded and requested from the Download Center.

 Resource Center : Other helpful resources appropriate for your product.

To contact Autonomy Customer Support by e-mail or phone

 go to http://www.autonomy.com/work/services/customer-support

XML Export SDK C Programming Guide

Contact Autonomy

Contact Autonomy

For general information about Autonomy, contact one of the following locations:

Europe and Worldwide

E-mail: [email protected]

Telephone: +44 (0) 1223 448 000

Fax: +44 (0) 1223 448 001

Autonomy Corporation plc

Cambridge Business Park

Cowley Rd.

Cambridge CB4 0WZ

United Kingdom

North and South America

E-mail: [email protected]

Telephone: +1.415.243.9955

Fax: +1.415.243.9984

Autonomy, Inc.

One Market Plaza

Spear Tower, Suite 1900

San Francisco CA 94105

USA

XML Export SDK C Programming Guide

23

About This Document

24

• XML Export SDK C Programming Guide

P ART 1

Overview of XML

Export

This section provides an overview of the Export SDK and describes how to use the C implementation of the API. It contains the following chapters:

Introducing XML Export

Getting Started

Part 1 Overview of XML Export

26

• XML Export SDK C Programming Guide

C HAPTER 1

Introducing XML Export

This section describes the KeyView Export SDK package. It contains the following topics:

Overview

Features

Platforms, Compilers and Dependencies

Package Contents

License Information

Directory Structure

Definition of Terms

Overview

XML Export is part of the KeyView Export SDK. It enables you to convert virtually any document, spreadsheet, presentation, or graphic into well-formed, valid XML which is validated against a predefined Document Type Definition (DTD). With

XML Export, you control the content, structure, and format of the XML output using either easily customized templates, or the flexible and robust APIs.

The main purpose of XML Export is to apply an XML vocabulary to the data structures in a document so that content and metadata can be indexed and subsequently searched in context.

XML Export SDK C Programming Guide

27

28

Chapter 1 Introducing XML Export

Data structures in a source document can be:

 metadata (title, author, subject, and so on)

 document components (headers, footers, footnotes, endnotes, captions, bookmarks, and so on) tagged text (chapters, sections, bulleted lists, and so on) table components (sheet names, rows, columns, cell ranges, and so on) presentation components (notes, slide titles, slide descriptions, and so on)

Although viewing is not the main purpose of XML Export, Extensible Stylesheet

Language (XSL) style sheets or Cascading Style Sheets (CSS) can be used to display the XML data.

Export SDK supports a number of programming environments, such as Visual

Basic, Java, and Delphi and runs on all popular operating system platforms including Windows, Solaris, HP-UX, IBM AIX, and Linux.

Export SDK is part of the KeyView suite of products. KeyView provides high-speed text extraction, conversion to Web-ready HTML and well-formed XML, and high-fidelity document viewing.

Features

Export supports over 300 formats in 70 languages.

Convert files either in-process or out of process. Out-of-process conversion ensures the stability and robustness of the calling application if a corrupt document causes an exception or the conversion process to fail.

 Files embedded within files can be extracted, using the File Extraction API, and then converted, using the Export API.

 Use redirected input/output. You can provide an input stream that is not restricted to file system access.

Dynamically convert word processing, spreadsheet, presentation, and graphics files into well-formed, valid, and 1.0-compliant XML. The XML output is validated against a predefined DTD named the “Verity.dtd.”

Export automatically recognizes the file format being converted and uses the appropriate reader. Your application does not need to rely on filename extensions to determine the file format.

XML Export SDK C Programming Guide

Platforms, Compilers and Dependencies

 Use callbacks to control such aspects of the conversion process as file naming and the insertion of scripts.

Create heading levels in the output file by either using the structure in the source document or by allowing Export to automatically generate a structure based on document properties, such as font or font attributes.

Manage memory allocation to optimize speed and performance of application.

Insert predefined XML markup at specific points in the output stream.

Apply XSL or Cascading Style Sheets (CSS) to improve the fidelity of the output.

 Map paragraph and character styles in word processing documents to any markup you specify in the output.

 Control the resolution of rasterized vector graphics to optimize storage requirements or image quality.

 Select the target format for converted graphics, including GIF, JPEG, CGM,

PNG, WMF, and Java on Windows, and Java and JPEG on Unix and Linux.

Platforms, Compilers and Dependencies

This section lists the supported platforms, supported compilers, and software dependencies for the KeyView software.

Supported Platforms

 FreeBSD 8.1 x86.

HP HP-UX 11i and 11i v2 PA-RISC

Mac OS X Mountain Lion 10.8 or higher on 32- and 64-bit Apple-Intel architecture

Microsoft Windows 2003 Server x86 and x64

Microsoft Windows Vista Business Edition x86 and x64. Other editions of Vista have not been tested, but are likely supported.

Microsoft Windows 2008 Server Enterprise Edition x86 and x64

Microsoft Windows 2008 Server R2

Microsoft Windows XP x86 (Service Pack 2)

Microsoft Windows 7 x86 and x64

XML Export SDK C Programming Guide

29

30

Chapter 1 Introducing XML Export

Microsoft Windows 8 x86 and x64

Red Hat Enterprise Linux AS 4.0 x86

Red Hat Enterprise Linux AS 4.0 x64

Red Hat Enterprise Linux 5.0 x86 and x64

Red Hat Enterprise Linux 6.0 x86 and x64

Sun Solaris 9.0, and 10 SPARC

Sun Solaris 10 x64

SuSE Linux Enterprise Server 10, 10.1, 11 x86

SuSE Linux Enterprise Server 10, 10.1 x64

SuSE Linux Enterprise Server 11 x64

Supported Compilers

Platform

Microsoft

Windows

Sun Solaris

Architecture x86 x64 x86 64-bit

Linux x64

HP HP-UX PA-RISC

Mac OSX

SPARC 64-bit x86

FreeBSD

Apple-Intel 32-bit and 64-bit

BSD x86

Compiler Name cl cl

Sun Studio 12

Sun Studio 11 gcc / g++ gcc / g++ cc / aCC

LLVM gcc / g++

Compiler Version

Microsoft 32-bit C/C++ Optimizing Compiler

Version 16.00.30319.01 for x86

Microsoft C/C++ Optimizing Compiler Version

16.00.30319.01 for x64

Sun C 5.9 SunOS_i386 Patch 124868-01

2007/07/12

Sun C 5.8 Patch 121015-06 2007/10/03

3.4.3 (Redhat 4), 4.1.0 (SuSE Linux 10)

4.1.0 (Redhat 4), 4.1.0 (SuSE Linux 10) aCC: HP ANSI C++ B3910B A.03.70 for 32 bit

1

Apple LLVM 5.1 (clang-503.0.40) (based on LLVM

3.4svn)

4.2.1 [FreeBSD] 20070719

XML Export SDK C Programming Guide

Package Contents

Component

Java components

.NET components

Compiler

Java 1.5

Microsoft Visual J# 2005 Compiler

8.00.50727.42

Software Dependencies

Some KeyView components require that you have installed specific third-party software:

 Java Runtime Environment (JRE) or Java Software Developer Kit (JDK) version 1.5. Required for Java API and graphics conversion in Export SDK.

 Outlook 2002 client or later versions. Required when processing Microsoft

Outlook Personal Folders (PST) files using the MAPI-based reader (pstsr).

The native PST reader (pstnsr) does not require an Outlook client.

 Lotus Notes or Lotus Domino (minimum requirement is 6.5.1, but version 8.5 is recommended). Required for Lotus Notes database (NSF) file processing.

 Microsoft .NET Framework SDK version 2.0, Microsoft .NET Framework version 2.0 Redistributable Package (if programming in .NET environment)

Package Contents

The Export installation contains:

 Libraries and executable files necessary for converting source documents into high-quality, well-formed XML (see

“Files Required for Redistribution” on page ).

 The include files that define the functions and structures used by the application to establish an interface with Export: adinfo.h

kvxml.h

kvtypes.h

kvxtract.h

 The Java API implemented in the package com.verity.api.export contained in the file KeyView.jar.

 Several sample programs that demonstrate Export’s functionality.

XML Export SDK C Programming Guide

31

32

Chapter 1 Introducing XML Export

Sample images that can be used as navigation buttons and background textures in your output.

 Template files that allow you to set conversion options without modifying at the

API level. They can be used to generate a wide range of output, from highly-stylized user-defined XML to stripped-down, text-only output suitable for use with an indexing engine.

The predefined DTD, Verity.dtd, used to validate all XML output.

Sample style sheets: wp.xsl (for word processing documents), ss.xsl (for spreadsheets), and pg.xsl (for presentation graphics).

License Information

During installation, the installation program validates the organization name and license key you enter and generates the install/OS/bin/kv.lic file, where install is the directory in which you installed KeyView, and OS is the operating system. This file is opened and validated when the KeyView API is used.

The kv.lic file contains the organization name and the 28-digit license key you specified during installation. The contents of a kv.lic file looks similar to the following:

Company Name

XXXXXXX-XXXXXXX-XXXXXXX-XXXXXXX

The license key controls whether the following are enabled:

 full version of the KeyView SDK

 trial version of the KeyView SDK

 language detection and advanced document readers—The following components are considered advanced features, and are licensed separately:

Microsoft Outlook Personal Folders (PST) reader (pstsr and pstnsr)

Lotus Notes database (NSF) reader (nsfsr)

Mailbox (MBX) reader (mbxsr)

Character set detection library (kvlangdetect)

If you change the license key at any time, you must update the licensing information in the kv.lic file. See

“Update License Information” on page .

XML Export SDK C Programming Guide

License Information

Enable Advanced Document Readers

To enable advanced readers in one of the KeyView SDKs, you must obtain an appropriate license key from Autonomy and update the installed license key with

the new information as described in “Update License Information” on page

.

If you are enabling the MBX reader in an existing installation of Export, in addition to updating the license key, change the parameter 208=eml to 208=mbx in the formats_e.ini

file.

Update License Information

If you currently have an evaluation version of KeyView and have purchased a full version of the SDK, or you are adding a document reader (for example, the PST reader), you must update the license information that was installed with the original version of the KeyView SDK.

If you installed a full version of KeyView, but did not enter licensing information at the time of installation, you must also update the license information.

To update the information, do one of the following:

 Manually update the license information that is stored in the text file named kv.lic

.

 Re-install the product and enter the new license information when prompted.

To update the KeyView license information:

1. Open the license key file, kv.lic, in a text editor. The file is in the install\

OS\bin directory, where install is the directory in which you installed

KeyView, and OS is the operating system. The file contains the following text:

COMPANY NAME

XXXXXXX-XXXXXXX-XXXXXXX-XXXXXXX

2. Replace the text COMPANY NAME with the company name that appears at the top of the License Key Sheet provided by Autonomy. Enter the text exactly as it appears in the document.

3. Replace the characters XXXXXX-XXXXXXX-XXXXXXX-XXXXXXX with the appropriate license key from the License Key Sheet provided by Autonomy.

The license key is listed in the Key column in the Standalone Products table.

The key is a string containing 31 characters, for example,

2TQD22D-2M6FV66-2KPF23S-2GEM5AB . Enter the characters exactly as they appear in the document, and do not include a leading or trailing space.

4. The finished kv.lic file looks similar to the following:

Autonomy

24QD22D-2M6FV66-2KPF23S-2G8M59B

XML Export SDK C Programming Guide

33

Chapter 1 Introducing XML Export

34

5. Save the kv.lic file.

Directory Structure

Table 3

describes the directories created during the XML Export installation. The variable install is the pathname of the Export installation directory (for example, /usr/autonomy/KeyviewExportSDK on UNIX, or C:\Program

Files\Autonomy\KeyviewExportSDK on Windows). On UNIX, the XML

Export directory is named /xmlexpt.

The variable OS is the operating system for which the SDK is installed. For example, the bin directory on a standard 32-bit Windows installation would be located at C:\Program Files\Autonomy\KeyviewExportSDK\WINDOWS\ bin .

Directory

install\OS\bin

install\javaapi\ini

install\javaapi\javadoc

install\javaapi\sample

install\testdocs

install\XML Export\guide

install\XML Export\include

install\XML Export\programs\bin

Contents

Contains the libraries, executables for the sample programs Export Demo and cnv2xml, the Java program (kvraster.class), the Java applet

(kvvector.jar), the format detection file, formats_e.ini

, the license key file (kv.lic), and a number of other supporting files.

Contains the template files used with the Java API.

Contains the Javadoc for the Java API.

Contains the source files and sample programs for the

Java API.

Contains sample word processing, spreadsheet, and presentation graphics files that can be used to test

XML Export’s options. You may also find this directory useful when testing your own applications.

Contains the XML Export C Programming Guide and

XML Export Java Programming Guide in HTML and

PDF format.

Contains the header files (adinfo.h, kvxml.h and kvtypes.h

) for the C API.

Contains the executable files for the sample Visual

Basic program called Export Demo.

XML Export SDK C Programming Guide

Directory Structure

Directory Contents

install\XML Export\programs\cnv2xml Contains the C source code files for a sample program that creates a single XML file. The executable for this sample program is in the bin directory.

install\XML Export\programs\ cnv2xmloop

install\XML Export\programs\

ExportDemo

Contains the C source code for a sample program that creates a single XML file out of process.

Contains the source code for a sample Visual Basic program. The executable for this sample program is in the bin directory. Export Demo is available through the

Start menu.

install\XML Export\programs\ini

install\XML Export\programs\metadata Contains the C source code and supporting files for a sample program that creates a valid XML file containing only the document’s metadata.

install\XML Export\programs\pdfini

Contains the template files used to set the conversion options in the C API.

Contains the template file used to extract custom metadata from PDF documents.

install\XML Export\programs\tempout The default output directory for converted files.

Contains the KeyView DTD, sample style sheets, and character entity files. These files are required for viewing the converted XML files.

install\XML Export\programs\tstxtract Contains the C source code and supporting files for a sample program that demonstrates the File Extraction interface.

install\XML Export\programs\ xmlcallback

Contains the C source code and supporting files for a sample program that demonstrates how user callbacks can dynamically shape the XML conversion.

install\XML Export\programs\xmlindex Contains the C source code and supporting files for a sample program that produces text-only XML.

install\XML Export\programs\xmlini Contains the C source code and supporting files for a sample program that uses template files to set the conversion options.

XML Export SDK C Programming Guide

35

Chapter 1 Introducing XML Export

36

Directory Contents

install\XML Export\programs\xmlmulti Contains the C source code and supporting files for a sample program that creates multiple XML files from a source document. The main file contains the table of contents. Each H1 heading is contained within its own file.

install\XML Export\programs\ xmlonefile

Contains the C source code and supporting files for a sample program that converts a source document into a single, formatted XML file.

install\XML Export\rel_notes Contains the XML Export Release Notes in HTML and

PDF format.

Definition of Terms

The following are specialized terms used throughout the guide. anchor block block chunk or chunk callback stream token

XML markup that defines both anchors and hyperlinks. An anchor is a named place in a document to which other documents can form a link. Anchors use the XML anchor tags (<a xmlns:xlink= xlink href=> </a> ) to facilitate navigation within a document.

The major browsers do not currently support linking in XML documents.

All source document content (including sub-headings) associated with Heading Level 1. Export identifies and/or generates blocks from the input stream for the implementation of the your XML markup.

All source document content associated with Heading Levels 2 through 6. Chunks are subdivisions of blocks. You may supply specific XML markup for the different levels of block chunks.

A function optionally supplied by your application and called from within the Export API. For example, callbacks allow your application to monitor the progress of the conversion process dynamically.

Transmission of a file’s content between memory and disk in a continuous flow.

The vehicle for conveying specific types of information to and from the

API during the conversion process. Tokens are placeholders for

markup that appears in the output. See “Export Tokens” on page .

XML Export SDK C Programming Guide

Definition of Terms

XML Export SDK C Programming Guide

37

Chapter 1 Introducing XML Export

38

• XML Export SDK C Programming Guide

C HAPTER 2

Getting Started

This section provides an overview of XML Export and describes how to use the C implementations of the API. It contains the following topics:

Architectural Overview

Memory Abstraction

Enhance Performance

Convert Files Out of Process

Convert Files

Sub File Extraction

Set Conversion Options

Use the Export Demo Program

Use the C-Language Implementation of the API

Use the Verity Document Type Definition (DTD)

XML Export SDK C Programming Guide

39

Chapter 2 Getting Started

Architectural Overview

The general architecture of the KeyView XML conversion technology is the same across all supported platforms and is illustrated in

Figure 1

.

Figure 1 XML Export Architecture

40

• XML Export SDK C Programming Guide

Architectural Overview

Each component is described in

Table 4

.

Component

Developer’s Application

File Extraction API

XML Export API

Format Detection Module

Structured Access Layer

Document Reader

XML Writers

XML Export SDK C Programming Guide

Description

The developer’s application interfaces directly with the XML Export API through either a C-language or Java implementation.

The File Extraction API opens a file and extracts the file’s sub files so that they are available for conversion. See

“Use the File Extraction API” on page .

The XML Export API exposes the functionality of XML Export and controls all other XML Export modules during the conversion process.

The format detection module determines the file type of the source file, which enables the XML Export interface to load the appropriate

structured access layer module and document reader. See “File Format

Detection” on page .

The structured access layer contains three modules: one for word processing, one for spreadsheets, and one for presentations and graphics. Information from the format detection module determines which access layer module operates at this stage of the conversion. The structured access layer performs the following:

1. Loads the appropriate document reader.

2. Processes the data stream from the document reader.

3. Determines table of contents entries.

4. Sends the stream to the appropriate XML writer.

5. Accepts the XML stream from the XML writer.

6. Generates the XML output file with a table of contents, metadata, and the document’s contents, and sends it to the XML Export interface.

Each document reader reads a specific file format and sends a text stream of the document to the structured access layer. Word processing readers return a token stream to the structured access layer. A token stream contains the document contents and messages (tokens) that precede the content and identify the type of information that follows them. Each reader is loaded as required by the structured access layer.

See

“Document Readers and Writers” on page

for a complete list of document readers.

Each XML writer accepts a text stream or token stream from the structured access layer and generates an equivalent XML stream that is sent back to the structured access layer. The structured access layer then generates the output file. See

“Document Readers and Writers” on page

for a list of format writers.

41

42

Chapter 2 Getting Started

Memory Abstraction

All dynamic memory allocations in Export modules are abstracted through a C interface. This memory allocation interface is defined in the KVMemoryStream structure in kvtypes.h

. See

“KVMemoryStream” on page . You may override

all memory allocations by providing a C structure containing pointers to functions identical in nature to their standard ANSI C counterpart. The xmlcallback sample program demonstrates Export memory management features. See

“xmlcallback” on page

.

Enhance Performance

KeyView is designed for optimal performance out of the box. However, there are some parameters that you can adjust to improve system performance according to your needs.

File Caching

To reduce the frequency of I/O operations, and consequently improve performance, the KeyView readers load file data into memory. The readers then read the data from the cache rather than the physical disk. You can configure the amount of memory used for file caching through the formats_e.ini

file.

Generally, when you increase the memory, performance will improve.

By default, KeyView uses a maximum of 1MB of memory for each thread— assuming a thread contains only one instance of pContext that is returned from the session initialization (see

“fpInit()” on page ). If the file data is larger than

1MB, up to 1MB of data is cached and the data beyond 1MB is read from disk.

The minimum amount of memory that can be used for file caching is 64KB.

To determine a reasonable value, divide the maximum amount of memory you want KeyView to use for file caching by the total number of threads. For example, if you want KeyView to use a maximum of 50MB of memory and have 10 threads, set the value to 5MB.

To modify the memory allocated for file caching, change the value for the following parameter in the

[DiskCache]

section of the formats_e.ini

file:

DiskCacheSize=1024

The value is in kilobytes. If this parameter is not set or is set to 0 (zero), the minimum value of 64KB is used.

XML Export SDK C Programming Guide

Convert Files Out of Process

The formats_e.ini

file is in the directory install

\ OS \bin

, where install is the pathname of the Export installation directory and OS is the name of the operating system.

Convert Files Out of Process

Export can run independently from the calling application. This is called out of

process. Out-of-process conversions protect the stability of the calling application in the rare case when a malformed document causes Export to fail. You can also run Export in the same process as the calling application. This is called

in-process. However, it is strongly recommended you convert documents out of process whenever possible.

The Export out-of-process framework uses a client-server architecture. The calling application sends an out-of-process conversion request to the Service

Request Broker in the main Export process. The Broker then creates, monitors, and manages a Servant process for the request—each request is handled by one independent Servant process. Data is exchanged between the application thread and the Servant through TCP/IP sockets. The source data is sent to the Servant process as a data stream or file, converted in the Servant, and then returned to the application thread. At that point, the application can either terminate the

Servant process or send more data for conversion.

Multiple conversion requests can be sent from multiple threads in the calling application simultaneously. All requests sent from one thread are processed by the Servant mapped to that thread, in other words, each thread can only have one

Servant to process its conversion requests.

Any standard conversion errors generated by the Servant are sent to the application.

NOTE Currently, the main Export process and Servant processes must run on the same host.

The following are requirements for running Export out of process:

 Internet Protocol (TCP/IP) must be installed

 Multi-threaded processing must be supported on the operating system platform

 The user application must be built with a multi-threaded runtime library

XML Export SDK C Programming Guide

43

44

Chapter 2 Getting Started

The following functions run in-process or out of process:

NOTE When converting out of process, these functions must be called after the call to start an out-of-process session and before the call to end an out-of-process session.

Other Export API functions and the File Extraction functions always run in-process.

Configure Out-of-Process Conversions

Although most components of the out-of-process conversion are transparent, the following parameters are configurable:

 File-size threshold/temporary file location

 Conversion time-out

Listener port numbers and time-out

Connection time-out and retry

Servant process name

These parameters are defined internally, but you can override the default by defining the parameter in the formats_e.ini

file. The formats_e.ini

file is in the directory install \ OS \bin , where install is the pathname of the Export installation directory and OS is the name of the operating system.

To set the parameters, add the following section to the formats_e.ini

file:

[KVExportOOPOptions]

TempFileSizeMark=

TempFilePath=

WaitForConvert=

WaitForConnectionTime=

ListenerPortList=

ListenerTimeout=

ConnectRetryInterval=

ConnectRetry=

ServantName=

XML Export SDK C Programming Guide

Convert Files Out of Process

Each parameter is described in

Table 5

. The default values for these parameters are set to ensure reasonable performance on most systems. If you are processing a large number of files, or running Export on a slow machine, you may need to increase some of the time-out and retry values.

Parameter

TempFileSizeMark unit = megabytes default=10

TempFilePath type = file path default = current working directory

Description

The file-size threshold. If the input file received by the Servant is larger than this value, temporary files are created to store the data. The directory in which the temporary files are stored is defined by the TempFilePath parameter. If the file received is smaller than this value, the data is stored in memory in the

Servant. This only applies when the input is a stream.

The directory in which temporary files are stored. Temporary files are created when the input file surpasses the file-size threshold

(TempFileSizeMark). If the Servant cannot access the file path, an error is generated.

This only applies when converting in stream mode.

The length of time to wait for a Servant to convert a file. If the conversion is not completed within the specified time, the error code “Wait for child process failed” is generated.

WaitForConvert unit = seconds default = 1800 range = 30~3600

WaitForConnectionTime unit = seconds default = 180 range = 15~600

ListenerPortList type = integer default = 9985, 9986, 9987, 9988, 9989

ListenerTimeout unit = seconds default = 10 range = 5~30

The length of time to wait for the Servant to connect to the application thread after the application has sent a conversion request to the Broker. If the Servant does not connect within the specified time, the error code “Wait for child process failed ” is generated. If there are many Servant processes running simultaneously, this value may need to be increased.

The TCP/IP port number(s) used for communication between the calling application and the Servant. You can specify a single port number or a series of numbers (enter the number separated by commas).

The length of time to wait for the Servant listener thread to get a process ID from the Servant after the connection is established.

If the ID is not obtained within the specified time, the error code

“Wait for child process failed” is generated. During this time, no other Servant can connect with the application.

XML Export SDK C Programming Guide

45

Chapter 2 Getting Started

46

Parameter

ConnectRetryInterval unit = microseconds default = 0.1

range = 50000~500000

ConnectRetry type = integer default = 120 range = 30~600

Description

The length of time to wait after a Servant has failed to connect to the application before it retries the connection. A Servant may be unable to connect because the application is waiting for another

Servant to send a process ID.

To calculate the total retry interval, the value set here is added to the platform-specific TCP retry value (on Windows, this is 1 second).

The number of attempts the Servant makes to connect to the calling application. This value and the total retry interval determine the total delay time. The total delay is calculated as follows:

ConnectRetryInterval + platform-specific_TCP_retry_value * ConnectRetry

For example, if the ConnectRetryInterval is set to 2 seconds, and the Export process is running on Windows (the default TCP retry value on Windows is 1 second), the total delay would be:

2 + 1 * 120 = 360

The Servant would attempt to connect to the application every 3 seconds for 120 attempts for a total of 360 seconds.

The name of the Servant process. To move the Servant to another location, enter a fully qualified path.

ServantName type = string default = servant

Run Export Out of Process—Overview

To convert files out of process

1. If required, set parameters for the out-of-process conversion in the formats_e.ini

file.

2. Initialize an Export session.

3. If you are using streams, create an input stream.

4. Define the conversion options.

5. Initialize an out-of-process session.

6. Convert the input and/or call other functions that can run out of process.

7. Shutdown the out-of-process session.

XML Export SDK C Programming Guide

Convert Files Out of Process

8. Repeat

Step 3

through

Step 7 for additional files.

9. Terminate the out-of-process session and the Servant process.

10. Shutdown the Export session.

Recommendations

 To ensure multi-threaded conversions are thread-safe, you must create a unique context pointer for every thread by calling fpInit() . In addition, threads must not share context pointers, and the same context pointer must be used for all API calls in the same thread. Creating a context pointer for every thread does not affect performance because the context pointer uses minimal resources.

 All functions that can run out of process must be called within the out-of-process session, that is, after the call to initialize the out-of-process session and before the call to end the out-of-process session.

 When terminating an out-of-process session, persist the Servant process by setting the boolean flag bKeepServantAlive

in the

KVXMLEndOOPSession() function or endOOPSession method. If the Servant process remains active, subsequent conversion requests are processed more quickly because the

Servant process is already prepared to receive data. Only terminate the

Servant when there are no more out-of-process requests.

 To recover from a failure in the Servant process, start a new out-of-process session. This creates a new Servant process for the next conversion.

Run Export Out of Process in the C API

The cnv2xmloop sample program demonstrates how to run Export out of process.

To convert files out of process in the C API

1. If required, set parameters for the out-of-process conversion in the formats_e.ini

file. See “Configure Out-of-Process Conversions” on page .

2. Declare instances of the following types and assign values to the members as required:

KVXMLTemplateEx

KVXMLOptionsEx

KVXMLHeadingInfo

KVXMLTOCOptions

See

“XML Export API Structures” on page for more information.

XML Export SDK C Programming Guide

47

48

Chapter 2 Getting Started

3. Load the

KVXML

library and obtain the

KVXMLInterface

entry point by calling

KVXMLGetInterface() . See

185

.

4. Initialize an Export session by calling fpInit()

. See

“fpInit()” on page .

5. If you are using streams for the input and output source, follow these steps; otherwise proceed to

Step 6 :

a. Create an input stream ( KVInputStream ) by calling fpFileToInputStreamCreate()

. See

“fpFileToInputStreamCreate()” on page .

b. Create an output stream (

KVOutputStream

) by calling fpFileToOutputStreamCreate() . See

“fpFileToOutputStreamCreate()” on page .

c. Proceed to Step 6

.

6. Set up an out-of-process session by calling

KVXMLStartOOPSession()

. See

“KVXMLStartOOPSession()” on page . This functions performs the

following:

Initializes the out-of-process session.

Specifies the input stream or file. If you are using an input file, set pFileName to the filename, and set pInputStream to NULL. If you are using an input stream, set pInputStream

to point to

KVInputStream

, and set pFileName to NULL.

 Sets conversion options in the

KV X

KVXMLTOCOptions data structures.

MLTemplate

,

KVXMLOptions

, and

Creates a Servant process.

Establishes a communication channel between the application thread and the Servant.

 Sends the data to the Servant.

See the sample code in “Example—KVXMLStartOOPSession” on page ,

and “KVXMLStartOOPSession()” on page

.

7. Convert the input and generate the output files by calling

KVXMLConvertFile() or fpConvertStream() . The structures

KVXMLTemplate

,

KVXMLOptions

, and

KVXMLTOCOptions

are defined in the call to KVXMLStartOOPSession() , and should be NULL in the conversion call.

A conversion function can only be called once in a single out-of-process session. See

“KVXMLConvertFile()” on page

, and

“fpConvertStream()” on page .

8. Terminate the out-of-process session by calling KVXMLEndOOPSession() . The

Servant ends the current conversion session, and releases the source data and session resources. See sample code in

“Example—

XML Export SDK C Programming Guide

Convert Files Out of Process

KVXMLEndOOPSession” on page

, and

“KVXMLEndOOPSession()” on page .

9. If you used streams, free the memory allocated for the input stream and output stream by calling the functions fpFileToInputSreamFree() and fpFileToOutputStreamFree()

. See

“fpFileToInputStreamFree()” on page and

192 .

10. Repeat

Step 5

through

Step 9 for additional files.

11. After all files are converted, terminate the out-of-process session and the

Servant process by calling

KVXMLEndOOPSession()

and setting the boolean to FALSE .

12. After the out-of-process session and Servant are terminated, shutdown the

Export session by calling fpShutDown() . See

“fpShutDown()” on page

.

Example—KVXMLStartOOPSession

The following sample code is from the cnv2xmloop sample program:

/* declare OOP startsession function pointer */

KVXML_START_OOP_SESSION fpKVXMLStartOOPSession;

/* assign OOP startsession function pointer */ fpKVXMLStartOOPSession = (KVXML_START_OOP_SESSION)mpGetProcAddress

(hKVXML,"KVXMLStartOOPSession");

if(!fpKVXMLStartOOPSession)

{

printf("Error assigning KVXMLStartOOPSession pointer\n");

(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);

(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);

mpFreeLibrary(hKVXML);

return 7;

}

/********START OOP SESSION *****************/ if(!(*fpKVXMLStartOOPSession)(pKVXML,

&Input,

NULL,

&XMLTemplates,

&XMLOptions,

/* Mark-up and related variables */

/* Options */

NULL, /* TOC options */

&oopServantPID,

&error,

0,

NULL,

NULL))

{

printf("Error calling fpKVXMLStartOOPSession \n");

XML Export SDK C Programming Guide

49

Chapter 2 Getting Started

50

(*KVXMLInt.fpShutDown)(pKVXML);

mpFreeLibrary(hKVXML);

return 9;

}

Example—KVXMLEndOOPSession

The following sample code is from the cnv2xmloop sample program:

/* declare endsession function pointer */

KVXML_END_OOP_SESSION fpKVXMLEndOOPSession;

/* assign OOP endsession function pointer */ fpKVXMLEndOOPSession = (KVXML_END_OOP_SESSION)mpGetProcAddress

(hKVXML, "KVXMLEndOOPSession");

if(!fpKVXMLEndOOPSession)

{

printf("Error assigning KVXMLEndOOPSession pointer\n");

(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);

(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);

mpFreeLibrary(hKVXML);

return 8;

}

/********END OOP SESSION, DO NOT KEEP SERVANT ALIVE *********/ if(!(*fpKVXMLEndOOPSession)(pKVXML,

FALSE,

&error,

0,

NULL,

NULL))

{

printf("Error calling fpKVXMLEndOOPSession \n");

(*KVXMLInt.fpShutDown)(pKVXML);

mpFreeLibrary(hKVXML);

return 10;

}

Convert Files

KeyView Export SDK enables you to convert many different types of documents to

XML. Converting is the process of extracting the text from a document without the application-specific markup, and applying XML markup. However, the conversion process can also include the following:

XML Export SDK C Programming Guide

Sub File Extraction

 Extracting sub files—exposes all sub files for conversion. See

“Sub File

Extraction” on page .

 Setting conversion options—determines the content, structure, and appearance of the XML output. See

“Set Conversion Options” on page

.

 Extracting the file’s format—detects a file’s format, and reports the information to the API, which in turn reports the information to the developer’s application.

See

“Extract File Format Information” on page .

 Extracting metadata—extracts selected metadata (document properties) from a file. See

“Extract Metadata” on page .

 Converting character set—controls the character set of both the input and the

output text. See “Convert Character Sets” on page

.

 Implementing callbacks—controls the conversion while it is in progress. See

“XML Export API Callback Functions” on page .

You can use one of the following methods to convert documents:

 Use the Export Demo sample program. This Visual Basic program demonstrates most Export SDKs capabilities and is the easiest way to get

started. See “Use the Export Demo Program” on page

 Use the C-language implementation of the API from your C or C++ application. See

“Use the C-Language Implementation of the API” on page .

 Use the C sample programs. See

“Introduction” on page .

NOTE It is strongly recommended you convert documents out of process.

During out-of-process conversion, Export runs independently from the calling application. Out-of-process conversions protects the stability of the calling application in the rare case when a malformed document causes

Export to fail. See

“Convert Files Out of Process” on page .

Sub File Extraction

To convert a file, you must first determine whether the source file contains any sub files (attachments, embedded objects, and so on). A file that contains sub files is called a container file. Compressed files (such as Zip), mail messages with attachments (such as Microsoft Outlook Express), mail stores (such as Microsoft

Outlook Personal Folders), and compound documents with embedded OLE objects (such as a Microsoft Word document with an embedded Excel chart) are examples of container files.

XML Export SDK C Programming Guide

51

52

Chapter 2 Getting Started

If the file is a container file, the container must be opened and its sub files extracted using the File Extraction API. The extraction process is done repeatedly until all sub files are extracted and exposed for conversion. Once a sub file is extracted, you can call the XML Export APIs to convert the file.

If a file is not a container, you should pass it directly to the XML Export API for conversion without extraction.

See

“Use the File Extraction API” on page for more information.

Convert Outlook Email without Using the Extraction API

It is strongly recommended you convert all container files, including Microsoft

Outlook files, using the File Extraction API. However, you can convert Outlook email messages (MSG) directly using the Export API and the MSG reader

(msgsr).

NOTE The MSG reader only extracts the message body of an MSG file. Attachments are not extracted.

To convert MSG files using the MSG reader, add the following to the formats_e.ini

file (TRUE is case-sensitive):

[ContainerOptions] bConvertMSG=TRUE

Set Conversion Options

Conversion options are parameters that determine the content, structure, and appearance of the XML output. For example, you can specify the markup inserted at the beginning and end of specific XML blocks, whether a heading is included in the table of contents, the output character set, or the resolution at which graphics are converted. The conversion options can be set either in the API or in the template files. Regardless of the method used to set the options, the values are ultimately passed to the API and used to populate the following data structures:

KVXMLTemplate

KVXMLOptions

KVXMLHeadingInfo

KVXMLTOCOptions

XML Export SDK C Programming Guide

Set Conversion Options

The conversion options are described in

“XML Export API Structures” on page .

Set Conversion Options Using the API

The conversion options are set using any of the following functions:

fpConvertStream()

KVXMLConvertFile()

KVXMLStartOOPSession()

Set Conversion Options Using the Template Files

XML Export includes templates in the form of initialization files (

.ini

). The templates provide a quick and easy way to modify the conversion options without programming at the API level. However, the template files do not give you complete control of the conversion process. To control some features, you must use the API directly.

The template files can be fully customized using a text editor. For example, to change the output character set from the default

KVCS_UTF8

to

KVCS_SJIS

in the xml1file.ini

template, you would make the following change in bold:

[KVXMLOptions] eOutputCharSet=KVCS_SJIS bForceOutputCharSet=TRUE

To create valid XML, a template file must contain two structures:

KVXMLTemplateEx and

KVXMLOptionsEx.

NOTE If you enter markup in the template files that is not compliant with

XML standards, XML Export inserts the markup into the output file unchanged. This may result in a malformed XML file.

An application must then read the template file and write the data to the appropriate Export structures. In the C sample program xmlini

, a template file is supplied as a command-line argument (see

“xmlini” on page ).

Templates

The template files for the C API implementation are in the directory install \ xmlexport\programs\ini

, where install is the pathname of the Export installation directory. The following templates are provided:

XML Export SDK C Programming Guide

53

Chapter 2 Getting Started

Template

Cascading style sheet

(xml_css.ini)

Index (xml_index.ini)

Description

This template writes style sheet information to an external CSS file. This makes the XML output significantly smaller because the information is not stored within the output file.

See

“Use Style Sheets” on page and

“Use Style Sheets with xmlini” on page for more information on using an external CSS file.

 Converts a source document into a single, largely unformatted XML file that is appropriate for use with an indexing engine.

54

• XML Export SDK C Programming Guide

Set Conversion Options

Template

Single file( xml1file.ini

)

Single file for presentations

(xml1file_pg.ini)

Single file with table of contents

(xml1filetoc.ini)

Description

 Creates a single XML file.

 Does not define an XSL style sheet. A default XSL style sheet that is appropriate to the source document type is used. The defaults supplied are wp.xsl (for word processing documents), ss.xsl (for spreadsheets), pg.xsl (for presentations).

 Forces the output character set to UTF-8.

 Maintains the source document’s fonts and styles.

 Does not create a table of contents.

This template is designed specifically for presentation formats.

 Creates a single XML file.

 Defines an XSL style sheet for presentations (pg.xsl).

 Forces the output character set to UTF-8.

 Since XML Export only extracts textual components from presentations, the bRasterizeFiles member of KVXMLOptions

is set to FALSE. See “KVXMLOptions” on page .

 Only the szMainTop, szMainBottom, and szUserSummary parameters of the KVXMLTemplate structure are relevant to presentations and are set in the presentations template.

 A template file for presentations must not include any other parameters in the KVXMLTemplate structure. See

“KVXMLTemplate” on page .

 Creates a single XML file.

 Creates a table of contents at the top of the XML document.

 Uses the Verity.dtd.

 Uses an XSL style sheet (wp.xsl).

 Forces the output character set to UTF-8.

 Lists all metadata (Title, Subject, Author, Comments, Created,

Modified, Last Saved By, and Revision Number).

 Uses the name of the worksheets for spreadsheets.

 Uses the slide titles for presentations. If no titles are available in the source document, it uses “slide 1,” “slide 2,” “slide 3,” and so on.

XML Export SDK C Programming Guide

55

Chapter 2 Getting Started

Use the Export Demo Program

The easiest way to get started with XML Export is to become familiar with its capabilities through the Visual Basic sample program, Export Demo. The source code for the program is in the directory install \xmlexport\programs\

ExportDemo , where install is the pathname of the Export installation directory.

Export Demo is for Windows only, and requires Internet Explorer 4.01 with Service

Pack 1 or higher.

The output options for output files are pre-defined in Export Demo and cannot be changed in the user interface. Export Demo uses a small sample of the options available in the XML Export API.

You can use the sample documents in install \testdocs to experiment with converting different file formats.

To launch the sample program, select Export Demo from Start | Programs|

Autonomy | XML Export. The following dialog appears:

Figure 2 Export Demo: Launching

56

NOTE HTML conversion using HTML Export is available in Export Demo if you have HTML Export installed. If you do not have HTML Export installed, the HTML button is disabled.

XML Export SDK C Programming Guide

Use the Export Demo Program

Change Input/Output Directories

If XML Export is installed in the default directory, the output and input directories are automatically set. The default location for source files is the directory i nstall \testdocs . The default location for output files is the directory install \ xmlexport\programs\tempout

.

If XML Export is installed in a directory other than the default, you are prompted to select an output and input directory when you first start up Export Demo.

To change the default directories for the source and output files

1. Select Options | Set Directories. The following dialog appears:

Figure 3 Export Demo: Setting Directories

2. From the tree view, select the drive letter and directory for the source or output files.

3. In Change Location, select which files are in the directory, either Source or

XML.

4. Click Change. The Current Locations fields are updated with the new selection.

5. Follow the same procedure for the other file types. When you are finished, click OK.

Set Configuration Options

With XML Export, you can configure options prior to the document conversion using the XMLConfig() function. Export Demo demonstrates this function, and allows you to control the following options:

 Generating output with verbose markup and without images.

XML Export SDK C Programming Guide

57

58

Chapter 2 Getting Started

Convert Files

 Including position information in the markup generated for a PDF document.

Suppress Imagesn

Export Demo provides an option to generate output with verbose markup and

without images. For more information, see “KVXMLConfig()” on page

.

To specify that images are suppressed in the XML output, select Options | XML

Config | Suppress Images.

Using PDF Position Information

Export Demo provides an option to include position information in the markup

generated for a PDF document. For more information, see “KVXMLConfig()” on page .

To specify that PDF position information be included in the XML output, select

Options | XML Config | Enable Position Token.

To convert a single file:

1. Select Options | Convert | Single file.

2. Select the document from the file list, and click XML in the Convert file to pane.

To convert files in a directory:

1. Select Options | Convert | Entire directory.

2. Click XML in the Convert directory to pane.

To view a converted file, double-click the output file in the Output Files pane or select the output file and click View.

The converted file is displayed in the view pane.

XML Export SDK C Programming Guide

Use the C-Language Implementation of the API

Figure 4 Export Demo: Converting Files

To view the original document, select the document from the file list, and click

Open. If you have an application on your system associated with the file, the file is displayed in that application.

To delete output files, select the file in the Output Files pane and click Delete.

Use the C-Language Implementation of the API

The C-language implementation of the XML Export API is divided into the following function suites:

File Extraction API Functions —Open and extract sub files in a container file.

They also extract metadata and file format information, and control character set conversion on extraction.

XML Export API Functions

— Extract format information (metadata, character set, and format), create an input/output stream from a file, and open, convert, and close the stream.

XML Export API Callback Functions

progress.

—Controls the conversion while it is in

XML Export SDK C Programming Guide

59

Chapter 2 Getting Started

60

Input/Output Operations

In the XML Export API, the source input and target output can be either a physical file accessed through a file path, or a stream created from a data source. A stream is a C structure containing pointers to functions similar in nature to their standard

ANSI C counterparts. This structure is passed to Export functions in place of the standard input source. The input stream is defined by the structure

KVInputStream

in kvtypes.h

. The output stream is defined by the structure

KVOutputStream in kvtypes.h

. See 235

and

“KVOutputStream” on page .

You can create an input stream using the function fpFileToInputStreamCreate()

, and an output stream using the function fpFileToOutputStreamCreate() . These functions assign C equivalent I/O functions to fpOpen()

, fpRead()

, fpSeek()

, fpTell()

, and fpClose()

. See

“fpFileToInputStreamCreate()” on page

and

“fpFileToOutputStreamCreate()” on page .

Convert Files

To use the C-language implementation of the API

1. Develop the XML markup and tokens to be assigned to the required members of a declared instance of

KVXMLTemplate

.

If you use markup in the structure that is not compliant with XML standards,

XML Export inserts the markup into the output file unchanged. This may result in a malformed XML file.

2. Declare instances of the following types and assign values to the members as required:

KVXMLTemplateEx

KVXMLOptionsEx

KVXMLHeadingInfo

KVXMLTOCOptions

See

“XML Export API Structures” on page for more information.

3. Load the KVXML library and obtain the KVXMLInterface entry point by calling

KVXMLGetInterface()

. See

185

.

4. Initialize an Export session by calling fpInit() . The function’s return value, pContext

, is passed as the first parameter to all other Export functions. See

“fpInit()” on page .

5. Pass the context pointer from fpInit()

and the address of a structure containing pointers to the File Extraction API functions in the call to

KVGetExtractInterface()

. See

146 .

XML Export SDK C Programming Guide

Use the C-Language Implementation of the API

6. If you are using streams for the input and output source, follow these steps;

otherwise, proceed to Step 7

: a. Create an input stream (

KVInputStream

) by calling fpFileToInputStreamCreate() , or using code similar to the example code in the sample programs. See

“fpFileToInputStreamCreate()” on page .

b. Create an output stream (

KVOutputStream

) by calling fpFileToOutputStreamCreate() , or using code similar to the example code in the sample programs. See

“fpFileToOutputStreamCreate()” on page .

c. Proceed to Step 7

.

7. Declare the input stream or filename in the KVOpenFileArg structure. See

“KVOpenFileArg” on page .

8. Open the source file by calling fpOpenFile() and passing the

KVOpenFileArg

structure. This call defines the parameters necessary to open a file for extraction. See

“fpOpenFile()” on page .

9. Determine whether the source file is a container file (contains sub files) by calling fpGetMainFileInfo() . See

“fpGetMainFileInfo()” on page

.

10. If the call to fpGetMainFileInfo()

determined the source file is a container file, proceed to

Step 11

; otherwise, proceed to

Step 14 .

11. Determine whether the sub file is itself a container (contains sub files) by calling fpGetSubFileInfo() . See

“fpGetSubFileInfo()” on page .

12. Extract the sub file by calling fpExtractSubFile()

. See

“fpExtractSubFile()” on page .

13. If the call to fpGetSubFileInfo()

determined the sub file is a container file,

repeat Step 6 through Step 12

until all sub files are extracted; otherwise, proceed to

Step 14 .

14. Setup an out-of-process session by calling KVXMLStartOOPSession() . See

“KVXMLStartOOPSession()” on page .

15. Convert the input and generate the output files by calling

KVXMLConvertFile()

or fpConvertStream()

. The structures

KVXMLTemplate , KVXMLOptions , and KVXMLTOCOptions are defined in the call to

KVXMLStartOOPSession()

, and should be NULL in the conversion call.

A conversion function can only be called once in a single out-of-process

session. See “fpConvertStream()” on page

or

“KVXMLConvertFile()” on page .

XML Export SDK C Programming Guide

61

Chapter 2 Getting Started

62

If you are using callbacks, they are called while the conversion process is underway. If required, alternate paths and filenames can be specified for output files, including using the table of content entries for the filenames. See

16. If you are converting additional files, terminate the out-of-process session by calling KVXMLEndOOPSession() and setting the boolean to TRUE . The Servant ends the current conversion session, and releases the source data and session resources.

If you are not converting additional files, terminate the out-of-process session

and the Servant process by calling KVXMLEndOOPSession() and setting the boolean to

FALSE

. See “KVXMLEndOOPSession()” on page

17. Close the file by calling fpCloseFile() . See

“fpCloseFile()” on page .

18. If you used streams, free the memory allocated for the input stream and output stream by calling the functions fpFileToInputSreamFree() and fpFileToOutputStreamFree()

. See

“fpFileToInputStreamFree()” on page and

192 .

19. Repeat

Step 6

through

Step 18 for additional source files.

20. Shutdown the Export session by calling fpShutDown() . See

“fpShutDown()” on page .

Multi-threaded Conversions

To ensure multi-threaded conversions are thread-safe, you must create a unique context pointer for every thread by initializing the Export session using fpInit() .

In addition, threads must not share context pointers, and the same context pointer must be used for all API calls in the same thread. Creating a context pointer for every thread does not affect performance because the context pointer uses minimal resources.

For example, your code should have the following logic for one thread: fpInit()

KVGetExtractInterface()

fpFileToInputStreamCreate()

fpFileToOutputStreamCreate()

fpOpenFile()

fpGetMainFileInfo()

fpGetSubFileInfo()

fpExtractSubFile

/* container file */

fpGetSubFileMetadata()

KVXMLStartOOPSession()

fpConvertStream()

KVXMLEndOOPSession(bKeepServantAlive TRUE)

fpCloseFile()

XML Export SDK C Programming Guide

Use the Verity Document Type Definition (DTD)

fpFileToInputSreamFree()

fpFileToOutputStreamFree()

set input/output file

fpOpenFile()

fpGetMainFileInfo() /* not a container file */

KVXMLStartOOPSession()

KVXMLConvertFile()

KVXMLEndOOPSession(bKeepServantAlive TRUE)

fpCloseFile()

...

fpShutdown()

Use the Verity Document Type Definition (DTD)

XML Export produces well-formed, valid XML documents. Document validity is based on a Document Type Definition (DTD) called the

Verity.dtd

. The

Verity.dtd

is in the default output directory tempout . If the DTD is in a different directory, the full path must be specified in pszVerityDTDPath

.

The elements in the Verity.dtd

are based on those defined in the W3C XHTML

1.0 specification and the attributes are based on those defined in the W3C CSS 2 specification.

The root element of each document is “

VerityXMLExport

.” Character entities are imported by using the three XHTML DTDs defined at the beginning of the

Verity.dtd

.

<!-- Character entities -->

<!ENTITY % HTMLlat1x SYSTEM "HTMLlat1x.ent">

%HTMLlat1x;

<!ENTITY % HTMLspecialx SYSTEM "HTMLspecialx.ent">

%HTMLspecialx;

<!ENTITY % HTMLsymbolx SYSTEM "HTMLsymbolx.ent">

%HTMLsymbolx;

Use XML Style Language Transformation (XSLT)

XML Export is designed to generate XML documents based on the Verity DTD.

You can convert the XML produced by XML Export to other XML vocabularies, such as Wireless Markup Language (WML), using XSLT.

XML Export SDK C Programming Guide

63

Chapter 2 Getting Started

Add Elements and Attributes to the DTD

XML Export can only generate XML that conforms to the Verity DTD. You can create your own DTD based on the Verity DTD. You cannot rename the Verity

DTD, so make sure you back up the original Verity DTD to another name before making changes.

If you create your own DTD and add elements or attributes that are not defined in the original Verity DTD, you must ensure the new markup is defined in the XML

Export API classes. You can define the markup by entering the markup directly in the styles, or populating the styles using the template files. See

“Map Styles” on page for more information on mapping styles to user-defined markup.

Move the DTD

The default output directory for the Verity DTD is programs\tempout . If you move the Verity DTD to another output directory, you must set the string value of pszVerityDTDPath to the new location. This path is added to the document type declaration in the XML file. See

“pszVerityDTDPath” on page .

64

• XML Export SDK C Programming Guide

P ART 2

Use the Export API

This section explains how to perform some basic tasks using the File Extraction and Export APIs, and describes the sample programs. It contains the following chapters:

Use the File Extraction API

Use the XML Export API

Sample Programs

Part 2 Use the Export API

66

• XML Export SDK C Programming Guide

C HAPTER 3

Use the File Extraction API

This section describes how to extract sub-files from a container file using the File

Extraction API. It contains the following topics.

Introduction

Extract Sub Files

Recreate a File’s Hierarchy

Extract Mail Metadata

Extract Sub Files from Outlook Files

Extract Sub Files from Outlook Express Files

Extract Sub Files from Mailbox Files

Extract Sub Files from Outlook Personal Folders Files

Extract Sub Files from Lotus Domino XML Language Files

Extract Sub Files from Lotus Notes Database Files

Extract Sub Files from PDF Files

Extract Embedded OLE Objects

Extract Sub Files from ZIP Files

Default Filenames for Extracted Sub Files

XML Export SDK C Programming Guide

67

68

Chapter 3 Use the File Extraction API

Introduction

To convert a file, you must first determine whether the file contains any sub files

(attachments, embedded OLE objects, and so on). A file that contains sub files is called a container file. A container file has a main file (parent) and sub files

(children) embedded in the main file.

The following are examples of container files:

 Archive files such as ZIP, TAR, and RAR.

Mail messages such as Outlook (MSG) and Outlook Express (EML).

Mail stores such as Microsoft Outlook Personal Folders (PST), Mailbox

(MBX), and Lotus Notes database (NSF).

PDF files containing file attachments.

Compound documents with embedded OLE objects such as a Microsoft Word document with an embedded Excel chart.

NOTE

“Supported Formats” on page indicates which

formats are treated as container files and are supported by the

File Extraction API.

The sub files may also be container files, creating a file hierarchy of multiple levels. For example, let us say an MSG file (the root parent) contains three attachments:

 a Microsoft Word document containing an embedded Microsoft Excel spreadsheet.

an AutoCAD drawing file (DWG).

an EML file with an attached Zip file, which in turn contains four archived files.

XML Export SDK C Programming Guide

Figure 5

shows the file’s hierarchy.

Figure 5 Example Container File Tree Structure

Extract Sub Files

NOTE The parent MSG file contains four first-level children.

The body text of a message file, although not a standalone file in the container, is considered a child of the parent file.

Extract Sub Files

To convert all files in a container file, the container must be opened and its sub files extracted using the File Extraction API. The extraction process is done repeatedly until all sub files are extracted and exposed for conversion. Once a sub file is extracted, you can call Export API functions to convert the file.

If you require a container file, including sub files, to be converted to a single file, you must extract all files from the container, convert the files, and then append each converted output to its parent.

To extract sub files, follow this general procedure

1. Pass the context pointer from fpInit() and the address of a structure containing pointers to the File Extraction API functions in the call to

KVGetExtractInterface() . See.

“KVGetExtractInterface()” on page .

2. Declare the input stream or filename in the KVOpenFileArg structure. See

“KVOpenFileArg” on page .

XML Export SDK C Programming Guide

69

Chapter 3 Use the File Extraction API

70

3. Open the source file by calling fpOpenFile() and passing the

KVOpenFileArg structure. This call defines the parameters necessary to

open a file for extraction. See “fpOpenFile()” on page .

4. Determine whether the source file is a container file (contains sub files) by calling fpGetMainFileInfo(). See

“fpGetMainFileInfo()” on page .

5. If the call to fpGetMainFileInfo() determined the source file is a container file, proceed to

Step 6 ; otherwise, convert the file.

6. Determine whether the sub file is itself a container (contains sub files) by calling fpGetSubFileInfo(). See

“fpGetSubFileInfo()” on page .

7. Extract the sub file by calling fpExtractSubFile(). See

“fpExtractSubFile()” on page .

8. If the call to fpGetSubFileInfo() determined the sub file is a container file,

repeat Step 2

through Step 7 until all sub files are extracted and the lowest

level of sub files is reached; otherwise, convert the file.

Recreate a File’s Hierarchy

When a container file is extracted, any relationships between the sub files in the container are not maintained. However, the File Extraction interface provides information that enables you to recreate the hierarchy. The hierarchy can be used to create a directory structure in a file system, or to categorize documents according to their relationship to each other. For example, if you use KeyView to generate text for a search engine, the hierarchical information enables your users to search for a document based on the document’s parent or sibling. In addition, when the document is returned to the user, the parent and sibling documents can be returned as recommendations.

The information needed to recreate a file’s hierarchy is provided in the call to fpGetSubFileInfo()

. See “fpGetSubFileInfo()” on page . The members

KVSubFileInfo->parentIndex and KVSubFileInfo->childArray provide information about a sub file’s parent and children. Since you can only retrieve the first-level children in the sub file, you must call fpGetSubFileInfo() repeatedly until information for the leaf-node children is extracted.

Create a Root Node

Because of their structure, some container files do not contain a sub file or folder which acts as a root directory on which the hierarchy can be based. For example, sub files in a Zip archive can be extracted, but none of the sub files represent the root of the hierarchy. In this case, an artificial root node must be created at the top

XML Export SDK C Programming Guide

Recreate a File’s Hierarchy of the file hierarchy as a point of reference for each child, and ultimately to recreate the relationships. This artificial root node is an internal object, and is extracted to disk as a directory called root. Its index number is 0.

To create the root node, set openFlag to KVOpenFileFlag_CreateRootNode in the call to fpOpenFile(). See

“fpOpenFile()” on page . When a root node

is created, the value of numSubFiles in KVMainFileInfo includes the root node (see

“KVMainFileInfo” on page

). For example, when you call fpGetMainFileInfo() on a Microsoft Word document with three embedded

OLE objects and the root node is disabled, numSubFiles is 3. If you create a root node, numSubFiles is 4.

Recreate a File’s Hierarchy—Example

For example, let us say we extract a PST file containing seven sub files with a root node enabled. The call to fpGetMainFileInfo()returns the number of sub files as 8 (seven sub files and one root node).

Figure 6

shows the structure and the available hierarchy information after the sub files are extracted:

Figure 6 Extracted PST File

XML Export SDK C Programming Guide

71

Chapter 3 Use the File Extraction API

The parentIndex specifies the index number of a sub file’s parent. The childArray specifies an array of a sub file’s children. With this information, you can recreate the hierarchy shown in

Figure 7 .

Figure 7 Recreated File Hierarchy

72

Extract Mail Metadata

You can extract metadata, such as subject, sender, and recipient, from MSG,

EML, MBX, PST, and NSF files, by calling the fpGetSubFileMetaData() function.

You can extract a pre-defined set of metadata fields and/or individual fields that are unique to a file format.

Default Metadata Set

KeyView internally defines a set of common mail metadata fields that can be extracted as a group from mail formats. This default metadata set is listed in

Table 6

. When you retrieve all metadata for a file—that is, pass NULL for the array of metadata—the complete set of default metadata, not all available metadata in the file, is returned.

XML Export SDK C Programming Guide

Extract Mail Metadata

Field Name (string to specify) Description

From

Sent

To

Cc

Bcc

Subject

Priority

The display name and e-mail address of the sender.

The time the message was sent.

The display names and email addresses of the recipients.

The display names and email addresses of recipients who receive copies of the email.

The display names and email addresses of recipients who received blind copies of the email.

The text in the subject line of the message.

The priority applied to the message.

Because mail formats use different terms for the same fields, the format’s reader maps the default field name to the appropriate format-specific name. For example, when retrieving the default metadata set, the NSF field Importance is mapped to the name Priority and is returned.

You can also extract the default field names individually by passing the field name

(such as From, To, and Subject); however, in this case, the string is not mapped to the format-specific name. For example, if you pass Priority in the call, you will retrieve the contents of the Priority field from an MBX file, but will not retrieve the contents of the Importance field from an NSF file.

NOTE You cannot pass the field names listed in

Table on page

individually for PST files. However, you can pass either the MAPI tag number or the MAPI tag name as integers. See

“Microsoft Personal Folders File (PST) Metadata” on page .

Extract the Default Metadata Set

To extract the default metadata set, call the fpGetSubFileMetadata() function, and pass 0 for metaNameCount and NULL for metaNameArray. See

“fpGetSubFileMetaData()” on page .

KVGetSubFileMetaArgRec metaArg;

KVSubFileMetaData pMetaData = NULL;

KVStructInit(&metaArg); metaArg.index = subFileIndex; metaArg.metaNameCount = 0; metaArg.metaNameArray = NULL;

XML Export SDK C Programming Guide

73

74

Chapter 3 Use the File Extraction API error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg,

&pMetaData);

...

extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;

Microsoft Outlook (MSG) Metadata

In addition to the default metadata set, the metadata fields listed in Table 7 can be

extracted for MSG files. The field name must be passed to metaNameArray in the call to the fpGetSubFileMetadata() function.

Field Name (string to specify) Description

AttachFileName

ConversationTopic

CreationTime

InternetMessageID

An attachment's long filename and extension, excluding path.

The topic of the first message in a conversation thread. A conversation thread is a series of messages and replies.

This is the first message’s subject with any prefix removed.

The time the message or attachment was created. This value is displayed in the Sent field in the message’s

Properties dialog in Outlook.

The identifier for messages that come in over the Internet.

This is the MAPI property PR_INTERNET_MESSAGE_ID.

This property is not in the MAPI headers or MAPI documentation.

LastModificationTime The time the message or attachment was last modified.

This value is displayed in the Modified field in the message’s Properties dialog in Outlook.

Location The physical location of the event specified in the Outlook calendar entry.

MessageID

Received

The message transfer system (MTS) identifier for the message transfer agent (MTA). This value is displayed on the Message ID tab in the message’s Properties dialog in

Outlook.

The date and time a message was delivered. This value is displayed in the Received field in the message’s

Properties dialog in Outlook.

XML Export SDK C Programming Guide

Extract Mail Metadata

Field Name (string to specify) Description

Sender

Sensitivity

TransportMsgHeaders

StartDate

EndDate

The name and e-mail address of the message sender. This value is a concatenation of two MAPI properties in the following format:

"PR_SENDER_NAME" <PR_SENDER_EMAIL_ADDRESS>

The Sender value may be the same as or different than the default metadata From value (see

“Default Metadata

Set” on page

), depending on which MAPI properties exist in the MSG file.

The value indicating the message sender's opinion of the sensitivity of a message. For example, Personal, Private, or Confidential. This value is displayed in the Sensitivity field in the message’s Properties dialog in Outlook.

Contains transport-specific message envelope information. This value corresponds to the MAPI property

PR_TRANSPORT_MESSAGE_HEADERS .

Contains an appointment start date. This value corresponds to the PR_START_DATE MAPI property.

Contains an appointment end date. This value corresponds to the PR_END_DATE MAPI property.

Extract MSG-Specific Metadata

To extract specific metadata fields from an MSG file, call the fpGetSubFileMetadata() function, and pass the field name defined in

Table 7

to metaNameArray (the string is not case sensitive). See

“fpGetSubFileMetaData()” on page .

For example, the following code extracts the contents of the

ConversationTopic and MessageID fields:

KVGetSubFileMetaArgRec metaArg;

KVSubFileMetaData pMetaData = NULL;

KVStructInit(&metaArg);

KVMetaNameRec names[2];

KVMetaName pname[2]; names[0].type = KVMetaNameType_String; names[0].name.sname = “conversationtopic”; names[1].type = KVMetaNameType_String; names[1].name.sname = “MessageID”; pname[0] = &names[0]; pname[1] = &names[1];

XML Export SDK C Programming Guide

75

Chapter 3 Use the File Extraction API

76

• metaArg.metaNameCount = 2; metaArg.metaNameArray = pname; metaArg.index = subFileIndex; error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg,

&pMetaData);

...

extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;

Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata

In addition to the default metadata set, you can extract any metadata field that exists in the header of an EML or MBX file by passing the field’s name. If the name is a valid field in the file, the contents of the field is returned. For example, to retrieve the name of the last mail server that received the message before it was delivered, you can pass the string “Received”.

Extract EML- or MBX-Specific Metadata

To extract specific metadata fields from an EML or MBX file, call the fpGetSubFileMetadata() function, and pass the metadata name to metaNameArray (the string is not case sensitive). See

“fpGetSubFileMetaData()” on page .

For example, the following code extracts the contents of the Received and

Mime-version fields:

KVGetSubFileMetaArgRec metaArg;

KVSubFileMetaData pMetaData = NULL;

KVStructInit(&metaArg);

KVMetaNameRec names[2];

KVMetaName pname[2]; names[0].type = KVMetaNameType_String; names[0].name.sname = “Received”; names[1].type = KVMetaNameType_String; names[1].name.sname = “Mime-version”; pname[0] = &names[0]; pname[1] = &names[1]; metaArg.metaNameCount = 2; metaArg.metaNameArray = pname; metaArg.index = subFileIndex; error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg,

&pMetaData);

...

extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;

XML Export SDK C Programming Guide

Extract Mail Metadata

Lotus Notes Database (NSF) Metadata

In addition to the default metadata set, you can extract any Lotus field name that exists in an NSF file by passing the field’s name. (You can extract fields from mail

NSF files and non-mail NSF files.) If the name is a valid field in the file, the field is returned. For example, to retrieve the date a document in an NSF file was last accessed, you would pass the string “$LastAccessedDB”.

NOTE A complete list of NSF fields are provided in the Lotus

Notes file stdnames.h. This header file is available in the Lotus

API Toolkit.

Extract NSF-Specific Metadata

To extract specific metadata fields from an NSF file , call the fpGetSubFileMetadata() function, and pass the metadata name to metaNameArray (the string is not case sensitive). See

“fpGetSubFileMetaData()” on page .

For example, the following code extracts the contents of the Description and

Categories fields:

KVGetSubFileMetaArgRec metaArg;

KVSubFileMetaData pMetaData = NULL;

KVStructInit(&metaArg);

KVMetaNameRec names[2];

KVMetaName pname[2]; names[0].type = KVMetaNameType_String; names[0].name.sname = “description”; names[1].type = KVMetaNameType_String; names[1].name.sname = “Categories”; pname[0] = &names[0]; pname[1] = &names[1]; metaArg.metaNameCount = 2; metaArg.metaNameArray = pname; metaArg.index = subFileIndex; error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg,

&pMetaData);

...

extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;

XML Export SDK C Programming Guide

77

78

Chapter 3 Use the File Extraction API

Microsoft Personal Folders File (PST) Metadata

In addition to the default metadata set, you can extract Messaging Application

Programming Interface (MAPI) properties from a PST file. These properties describe all elements of an Outlook item in a PST file (such as subject, sender, recipient, and message text). Since the properties are stored in the PST file itself, they can be retrieved before the contents of the PST are extracted. This enables you to determine whether an Outlook item should be extracted based on its attributes. Some MAPI properties are also stored for Outlook attachments that are

not mail messages (such as an attached Microsoft Word document or Lotus 1-2-3 file).

NOTE Since all elements of a message (except non-mail attachments) are represented by MAPI properties, you can extract all components of a sub file, including the header and message text, by calling the fpGetSubFileMetadata() function.

MAPI Properties

Each MAPI property is identified by a property tag, which is a constant containing the property type and a unique identifier. For example, the property that indicates whether a message has attachments has the following components:

Property

Identifier

PR_HASATTACH

0x0E1B

Property type PT_BOOLEAN (000B )

Property tag 0x0E1B000B

The Microsoft MAPI documentation on the Microsoft Developer Network Web site lists all available MAPI properties, their tags, and types.

You can retrieve any MAPI property that is of one of the MAPI property types listed below:

PT_I2

PT_I4

PT_BINARY

PT_BOOLEAN

PT_DOUBLE

PT_FLOAT

PT_LONG

PT_SHORT

PT_STRING8

PT_TSTRING

PT_SYSTIME

PT_UNICODE

XML Export SDK C Programming Guide

Extract Mail Metadata

NOTE Properties with a PT_TSTRING type have the property type recompiled to either a Unicode string (PT_UNICODE) or to an ANSI string

(PT_STRING8) depending on the operating system’s character set. To retrieve the Unicode property, pass in the Unicode version of the tag. For example, the property tag for PR_SUBJECT is either 0x0037001E for an

ANSI string, or 0x0037001F for a Unicode string.

Extract PST-Specific Metadata

In the call to extract sub file metadata, you can pass either the MAPI tag number

(such as 0x0070001e) or the MAPI tag name (such as

PR_CONVERSATION_TOPIC ). If you specify the MAPI tag name, you must include the Windows header files mapitags.h and mapidefs.h in which the MAPI tag name is defined as a tag number.

To extract specific MAPI properties from a PST file, call the fpGetSubFileMetadata() function, and pass the property tag to metaNameArray . See

“fpGetSubFileMetaData()” on page . The tag is passed

as an integer.

For example, the following code extracts the MAPI properties PR_SUBJECT and

PR_ALTERNATE_RECIPIENT :

KVGetSubFileMetaArgRec metaArg;

KVSubFileMetaData pMetaData = NULL;

KVMetaNameRec names[2];

KVMetaName pName[2]; names[0].type = KVMetaNameType_Integer; names[0].name.iname = PR_SUBJECT; names[1].type = KVMetaNameType_Integer; names[1].name.iname = 0x3A010102; pName[0] = &names[0]; pName[1] = &names[1];

KVStructInit(&metaArg); metaArg.metaNameCount = 2; metaArg.metaNameArray = pName; metaArg.index = SubFileIndex; error = extractInterface->fpGetSubFileMetaData

(pFile,&metaArg,&pMetaData);

...

extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;

XML Export SDK C Programming Guide

79

Chapter 3 Use the File Extraction API

80

NOTE You must include the Windows header files mapitags.h

and mapidefs.h in which PR_SUBJECT is defined as 0x0037001E.

Exclude Metadata from the Extracted Text File

When a mail message is extracted, its message text and header information (To,

From , Sent, and so on) are also extracted. You can prevent the header information from appearing in the text file.

To exclude the header information, set the flag extractFlag to

KVExtractionFlag_ExcludeMailHeader in the call to fpExtractSubFile()

. See “fpExtractSubFile()” on page .

Extract Sub Files from Outlook Files

When an Outlook file (MSG) is extracted to disk, it’s message text and header information (To, From, Sent, and so on) are extracted to a text file. (If you do not want the header information to appear in the text file, see

“Exclude Metadata from the Extracted Text File” on page

.) If the Outlook file contains a non-mail attachment, the attachment is extracted in its native format to a sub directory. If

Outlook file contains a mail attachment, the attachment’s message text is extracted to a sub directory.

Extract Sub Files from Outlook Express Files

When an Outlook Express (EML) file is extracted to disk, its message text and header information (To, From, Sent, and so on) are extracted to a text file. (If you

do not want the header information to appear in the text file, see “Exclude

Metadata from the Extracted Text File” on page

.) If an Outlook file contains a non-mail attachment, the attachment is extracted in its native format to the same directory as the message text file. If the Outlook file contains a mail attachment, the complete attachment (including message text and attachments), message text file, and non-mail attachment(s) are extracted to a the same directory as the main message.

NOTE When the MBX reader (mbxsr) is enabled, it is used to filter MBX and EML files. If the MBX reader is not enabled, the

EML reader (emlsr) is used.

XML Export SDK C Programming Guide

Extract Sub Files from Mailbox Files

Extract Sub Files from Mailbox Files

A Mailbox (MBX) file is a collection of individual emails compiled with RFC 822 and RFC 2045 - 2049 (MIME), and divided by message separators. There are many mail applications that export to an MBX format, such as Eudora Email and

Mozilla Thunderbird.

When an MBX file is extracted to disk, the message text and header information

(To, From, Sent, and so on) from each mail file are extracted to text files. (If you

do not want the header information to appear in the text file, see “Exclude

Metadata from the Extracted Text File” on page

.)

In Eudora MBX files, attachments are inserted as a link and are stored externally from the message. These attachments are not extracted, but the path to the attachment is returned in the call to the fpGetSubFileInfo() function

(

“fpGetSubFileInfo()” on page ). You can write code to retrieve the attachment

based on the returned path.

For MBX files from other clients, KeyView extracts attachments when they are embedded in the message.

NOTE The Mailbox (MBX) reader is an advanced feature and is sold and licensed separately. To enable this reader in a KeyView SDK, you must obtain the appropriate license key from Autonomy. See

“Update License

Information” on page

for information on adding a new license key to an existing installation.

Extract Sub Files from Outlook Personal Folders

Files

KeyView can extract Outlook items such as messages, appointments, contacts, tasks, notes, and journal entries from a PST file. When a PST file is extracted to disk, the text and header information (To, From, Sent, and so on) from each

Outlook item are extracted to a text file. (If you do not want the header information to appear in the text file, see

“Exclude Metadata from the Extracted Text File” on page .)

You can also extract messages from PST files as MSG files, including all their attachments, by setting the KVExtractionFlag_SaveAsMSG flag in the

KVExtractSubFileArg structure when calling fpExtractSubFile(). See

“KVExtractSubFileArg” on page .

XML Export SDK C Programming Guide

81

82

Chapter 3 Use the File Extraction API

If an Outlook item contains a non-mail attachment, the attachment is extracted in its native format to a sub directory. If an Outlook item contains an Outlook attachment, the attached item’s text and attachment(s) are extracted to a sub directory.

NOTE The Microsoft Outlook Personal Folders (PST) reader is an advanced feature and is sold and licensed separately. To enable this reader in a KeyView SDK, you must obtain the appropriate license key from

Autonomy. See

“Update License Information” on page for information on

adding a new license key to an existing installation.

Use the Native or MAPI-based Reader

KeyView accesses PST files in one of two ways:

 indirectly using the Microsoft’s Messaging Application Programming Interface

(MAPI) reader named pstsr.

 directly using the native PST reader named pstnsr.

On UNIX and Windows x64 and IA-64, the native reader is always used to process PST files because the MAPI-based reader only runs on Windows x86. On

Windows x86, you can specify either reader, however, the MAPI-based reader is used by default. The differences between the two readers are summarized in the following table:

Feature/Requirement

All platforms supported

Outlook client required

MAPI properties supported

Password-protection supported

Compressible encryption supported

High encryption supported

Yes

No

Native Reader

(pstnsr)

Yes

No

Yes

All properties defined in mapitags.h

. Object properties are not supported.

Yes

MAPI-based Reader

(pstsr)

Windows only

Yes

Yes.

All properties defined in mapitags.h

. Object properties are not supported.

Yes (using

KVCredential structure)

Yes

Yes

XML Export SDK C Programming Guide

Extract Sub Files from Outlook Personal Folders Files

To specify the MAPI-based reader be used for PST files, change the PST entry in the formats_e.ini file as follows:

297=pst

To specify the native reader be used for PST files, change the PST entry in the formats_e.ini

file as follows:

297=pstn

NOTE You must ensure the PST you are extracting is not open in the Outlook client and the Outlook process is not running.

Use the Native PST Reader (pstnsr)

The native PST reader accesses PST files directly without relying on the Microsoft interface to the PST format. It runs on both Windows and UNIX and does not require an Outlook client to be installed on the system processing the PST files.

However, the native reader does not support password-protected PST files that use high encryption.

Use the MAPI Reader (pstsr)

The pstsr reader accesses PST files indirectly using Microsoft’s Messaging

Application Programming Interface (MAPI). MAPI is a standard Windows message interface that enables different mail programs and other mail-aware applications (such as word processors and spreadsheets) to exchange messages and attachments with each other. MAPI allows KeyView to open a PST file, traverse the folders and Outlook items, and extract the items inside the PST file.

NOTE When extracting sub files from PST files, information on the distribution list used in an e-mail is extracted to a file called

emailname.dist

. This applies to the MAPI reader (pstsr) only.

System Requirements

Since MAPI is only supported on Windows platforms, you can only convert PST files on Windows. And since MAPI relies on functionality in Microsoft Outlook, a

Microsoft Outlook client must be installed on the same machine as the application converting PST files, and must be the default e-mail application. KeyView supports the following PST formats and Outlook clients:

 Outlook 97 or higher PST files

 Outlook 2002 or Outlook 2003 clients

XML Export SDK C Programming Guide

83

84

Chapter 3 Use the File Extraction API

NOTE The Outlook client must be the same version as or newer than the version of Outlook that generated the PST file.

MAPI Attachment Methods

The way in which the contents of a PST message attachment can be accessed is determined by the MAPI attachment method applied to the attachment. For example, if the attachment is an embedded OLE object, then it uses the

ATTACH_OLE attachment method. KeyView can access message attachments that use the following attachment methods:

ATTACH_BY_VALUE

ATTACH_EMBEDDED_MSG

ATTACH_OLE

ATTACH_BY_REFERENCE

ATTACH_BY_REF_ONLY

ATTACH_BY_REF_RESOLVE

Attachments using the ATTACH_BY_VALUE, ATTACH_EMBEDDED_MSG, or

ATTACH_OLE attachment methods are extracted automatically when the PST file is extracted. An “attach by reference” method means the attachment is not in

Outlook, but Outlook contains an absolute path to the attachment. Before you can extract these types of attachments, you must retrieve the path to access the attachment.

To extract “attach by reference” attachments

1. Determine whether the attachment uses an ATTACH_BY_REFERENCE,

ATTACH_BY_REF_ONLY , or ATTACH_BY_REF_RESOLVE method by retrieving the MAPI property PR_ATTACH_METHOD.

2. If the attachment uses one of the “attach by reference” methods, get the fully qualified path to the attachment by retrieving the MAPI properties

PR_ATTACH_LONG_PATHNAME or PR_ATTACH_PATHNAME.

3. You can then either copy the files from their original location to the path where the PST file is extracted, or use the Export API functions to convert the attachment.

XML Export SDK C Programming Guide

Extract Sub Files from Lotus Domino XML Language Files

Open Secured PST Files

KeyView enables you to specify credentials (user name and password), which are

used to open a secured PST file for extraction. See “Password Protected Files” on page for more information.

Detect PST Files While the Outlook Client is Running

If you are running an Outlook client while running the File Extraction API, the

KeyView format detection module (kwad) may not be able to open the PST file to determine the file’s format because Outlook has the file locked. In this case, you may do one of the following:

 Close Outlook when using the Extraction API

 Detect PST files by extension only and bypass the format detection module.

To enable this option, add the following lines to the formats_e.ini file.

[container_flags] detectPSTbyExtension=1

NOTE The detectPSTbyExtension option only applies when you are using the MAPI reader (pstsr).

NOTE If you use this option, you must ensure in your code that valid PST files are passed to KeyView because the format detection module will not be available to verify the file type and pass the file to the appropriate reader.

Extract Sub Files from Lotus Domino XML

Language Files

When a Lotus Domino XML Language (.DXL) file is extracted, its message text and header information (To, From, Sent, and so on) are extracted to a text file.

NOTE To prevent header information from being extracted, see

“Exclude Metadata from the Extracted Text File” on page

.

You can ensure that dates and times extracted from Lotus Domino .DXL files are displayed in a uniform format.

XML Export SDK C Programming Guide

85

86

Chapter 3 Use the File Extraction API

To extract custom date/time formats

 In the formats_e.ini file, set the DateTimeFormat option in the [dxlsr] section. For example:

[dxlsr]

DateTimeFormat=%m/%d/%Y %I:%M:%S %p

In this example, dates and times are extracted in the following format:

02/11/2003 11:36:09 AM

The format arguments are the same as those for the strftime() function.

Refer to the following Web page for more information.

http://msdn.microsoft.com/en-us/library/fe06s4ak%28VS.71%29.aspx

Extract Sub Files from Lotus Notes Database Files

A Lotus Notes database is a single file that contains multiple documents called

notes. Notes include design notes (such as forms, views, folders, navigators, outlines, pages, framesets, agents, and resources), data document notes, profile document notes, access control list notes, and collection (index) notes. KeyView can extract text items, attachments, and OLE objects from data document notes only. Data document notes include emails, journal entries, discussion threads, documents (Microsoft Office and Lotus SmartSuite), and so on.

All components of a note are prefixed by field names such as “SendTo:”,

“Subject:”, and “Body:”. When a note is extracted, the field names are not included in the extracted output; only the field values are extracted.

When a mail message in an NSF file is extracted to disk, the body text and header information, such as the values from the SendTo, From, and DeliveredDate fields, in each message is extracted to a text file. (If you do not want the header

information to appear in the message text file, see “Exclude Metadata from the

Extracted Text File” on page

.)

NOTE The Lotus Notes Database (NSF) reader is an advanced feature and is sold and licensed separately. To enable this reader in a KeyView

SDK, you must obtain the appropriate license key from Autonomy. See

“Update License Information” on page for information on adding a new

license key to an existing installation.

XML Export SDK C Programming Guide

Extract Sub Files from Lotus Notes Database Files

System Requirements

The NSF format is proprietary. Therefore, KeyView accesses NSF files indirectly using Lotus Notes API. Since the NSF reader relies on functionality in Lotus

Notes, a Lotus Notes client or Lotus Domino server must be installed and configured on the same machine as the application converting NSF files. On UNIX and Linux, the Lotus Domino server is required. On Windows, the Lotus Notes client or Lotus Domino server is required.

KeyView supports the following Lotus Notes clients and Domino servers:

 Lotus Notes 6.5.1

 Lotus Domino 6.5.1

KeyView supports NSF files on the same platforms supported by Lotus Notes and

Lotus Domino:

 Windows XP x86 (Service Pack 1 and 2)

 Windows 2000 x86 (Service Pack 2)

Solaris 8.0 and 9.0 (built on Solaris 8.0)

Red Hat Enterprise Linux AS 3.0 (x86)

SuSE Linux Enterprise Server 8 and 9 (x86)

IBM AIX 5.1, 5L version 5.2

Installation and Configuration

Before KeyView can convert NSF files, you must set up the Lotus Notes client or

Lotus Domino server. Full configuration is not required. The following steps outline the minimal setup for NSF conversion:

Windows

1. Install the Lotus Notes client or Lotus Domino server. You do not need to configure the client or server.

2. Ensure the file notes.ini is in the proper location.

 If Lotus Notes is installed, the file should appear in the install\lotus\ notes directory, where install is the installation directory.

 If only Lotus Domino is installed, the file should appear in the install\ lotus\domino directory, where install is the installation directory.

If the file does not exist, create an ASCII file named notes.ini, and add the following text:

[Notes]

XML Export SDK C Programming Guide

87

88

Chapter 3 Use the File Extraction API

3. Add the KeyView bin directory and the install\lotus\notes or

install\lotus\domino directory to the PATH environment variable (the

KeyView bin directory must be first in the path). It is recommended you add the KeyView bin directory because the Lotus Notes or Domino server installation may contain older KeyView OEM libraries.

Solaris

1. Install Lotus Domino server. You do not need to configure the server.

2. Ensure the file notes.ini is in the install/lotus/notes/latest/ sunspa directory, where install is the directory where Lotus Notes is installed. If the file does not exist, create an ASCII file named notes.ini, and add the following text:

[Notes]

3. Add the install/lotus/notes/latest/sunspa directory to the PATH environment variable: setenv PATH install/lotus/notes/latest/sunspa:$PATH

4. Add the install/lotus/notes/latest/sunspa and the KeyView bin directory to the LD_LIBRARY_PATH environment variable: setenv LD_LIBRARY_PATH keyview_bin:install/lotus/notes/latest/ sunspa:$LD_LIBARY_PATH where keyview_bin is the location of the KeyView bin directory. It is recommended you add the KeyView bin directory because the Lotus Notes installation may contain older KeyView OEM libraries.

AIX 5.x

1. Install the bos.iocp.rte file set if it is not already installed, and reboot the machine. See the Lotus Domino server documentation for more information.

2. Install Lotus Domino server. You do not need to configure the server.

3. Ensure the file notes.ini is in the install/lotus/notes/latest/ ibmpow directory, where install is the directory where Lotus Notes is installed. If the file does not exist, create an ASCII file named notes.ini, and add the following text:

[Notes]

4. Add the install/lotus/notes/latest/ibmpow directory to the PATH environment variable: setenv PATH install/lotus/notes/latest/ibmpow:$PATH

5. Add the install/lotus/notes/latest/ibmpow and the KeyView bin directory to the LIBPATH environment variable:

XML Export SDK C Programming Guide

Extract Sub Files from Lotus Notes Database Files setenv LIBPATH keyview_bin:install/lotus/notes/latest/ ibmpow:$LIBPATH where keyview_bin is the location of the KeyView bin directory. It is recommended you add the KeyView bin directory because the Lotus Notes installation may contain older KeyView OEM libraries.

Linux

1. Install Lotus Domino server. You do not need to configure the server.

2. Ensure the file notes.ini is in the install/lotus/notes/latest/ linux directory, where install is the directory where Lotus Notes is installed. If the file does not exist, create an ASCII file named notes.ini, and add the following text:

[Notes]

3. Add the install/lotus/notes/latest/linux directory to the PATH environment variable: setenv PATH install/lotus/notes/latest/linux:$PATH

4. Add the install/lotus/notes/latest/linux and the KeyView bin directory to the LD_LIBRARY_PATH environment variable: setenv LD_LIBRARY_PATH keyview_bin:install/lotus/notes/latest/ linux:$LD_LIBRARY_PATH where keyview_bin is the location of the KeyView bin directory. It is recommended you add the KeyView bin directory because the Lotus Notes installation may contain older KeyView OEM libraries.

Open Secured NSF Files

KeyView enables you to specify credentials (user ID file and password) which are used to open a secured NSF file for extraction. See

“Password Protected Files” on page for more information.

Format Note Sub Files

The KeyView NSF reader uses XML templates to format note sub-files. You can customize the templates as required to approximate the look and feel of the original notes as closely as possible. For more information, see

“Extract and

Format Lotus Notes Sub Files” on page

.

XML Export SDK C Programming Guide

89

Chapter 3 Use the File Extraction API

90

Extract Sub Files from PDF Files

KeyView can extract document-level and page-level attachments from a PDF document. Document-level attachments are added by using the Attach A File tool and may include links to or from the parent document or to other file attachments.

Page-level attachments are added as comments by using various tools.

Page-level or comment attachments display the File Attachment icon or the

Speaker icon on the page where they are located.

When a PDF’s attachments are extracted to disk, the attachments are saved in their native format.

Extract Embedded OLE Objects

Embedded OLE objects can be converted in two ways:

 Using the File Extraction API, the OLE object is first extracted from the main

file and saved to disk (see “File Extraction API Functions” on page ). It can

then be converted by making a separate conversion call.

 Using the XML Export API, the main file is converted to XML and the OLE object is converted to a graphics file that is referenced in the XML file (see

“XML Export API Functions” on page ).

The File Extraction API can extract embedded OLE objects from the following types of documents:

 Lotus Notes (DXL)

Microsoft Excel

Microsoft Word

Microsoft PowerPoint

Microsoft Outlook

Microsoft Visio

Microsoft Project

OASIS Open Document

Rich Text Format (RTF)

When an embedded OLE object is extracted from its parent file, the location where the embedded file appears in the original document is not available. The parent and child are extracted as separate files.

XML Export SDK C Programming Guide

Extract Sub Files from ZIP Files

Extract Sub Files from ZIP Files

ZIP files that are not password-protected can be extracted using the general

method (see “Extract Sub Files” on page . However, some ZIP files use

password protection, in which case you must use a different method to enter the required credentials. See

“Password Protected Files” on page for more

information.

Default Filenames for Extracted Sub Files

When a filename is not specified in the call to fpExtractSubFile() (see

“fpExtractSubFile()” on page ) in some cases, a default filename is applied to

the extracted sub file.

Default Filename for Mail Formats

To avoid naming conflicts and problems with long filenames, KeyView applies its own names to the extracted mail items when a name is not supplied in the call to fpExtractSubFile() . A non-mail attachment retains its original filename and extension.

When the contents of a mail store or the message body of a mail message are extracted, the extracted filenames may include the following:

 The first valid eight characters of the original folder name or “Subject” line of the mail message. If the “Subject” line is empty, the characters kvext are used, where ext is the format’s extension. For example, the characters would be “kvmsg” for MSG and “kvnsf” for NSF.

The following special characters are considered invalid and are ignored: any non-printing character with a value less than 0x1F angle brackets (< >) double quote (“) asterisk (*) back slash (\) colon (:) forward slash (/) pipe (|) question mark (?)

For notes, the filename is derived from the first 24 characters of the note text.

For contact entries, the filename is derived from the full name of the contact.

XML Export SDK C Programming Guide

91

Chapter 3 Use the File Extraction API

92

 The characters _kvn, where n is an integer incremented from 0 for each extracted item.

 One of the following extensions:

Type email message calendar appointment contact entry task entry note journal entry distribution list posting note

File Extension

.mail

.cal

.cont

.task

.note

.jrnl

.dist

.post

 If the type cannot be determined for an MSG or PST file, the file is given

 If the type cannot be determined for a NSF file, the file is given a extension.

.tmp

 The format of a MAIL file is plain text by default, but can be set to RTF with the KVExtractionFlag_GetFormattedBody flag.

For example, an MSG mail message with the subject line RE: Product roadmap containing the Microsoft Excel attachment release_schedule.xls would be extracted as

RE produ_kv0.mail

release_schedule.xls

If an extracted message contains an embedded OLE object or any attachment that does not have a name, the object or attachment is extracted as _kv#.tmp.

Default Filename for Embedded OLE Objects

KeyView can apply a default name to an extracted embedded OLE object when a name is not supplied in the call to fpExtractSubFile(). When an embedded

OLE object is extracted, the extracted filename may include the following:

 The first valid eight characters of the main file. The following special characters are considered invalid and are ignored:

XML Export SDK C Programming Guide

Default Filenames for Extracted Sub Files any non-printing character with a value less than 0x1F angle brackets (< >) double quote (“) asterisk (*) back slash (\) colon (:) forward slash (/) pipe (|) question mark (?)

 The characters _kvn, where n is an integer incremented from 0 for each extracted object.

 If KeyView can determine the embedded OLE is a Microsoft Office document, the original extension is used. If the file type cannot be determined, the file is given a .tmp

extension.

For example, let us say a Microsoft Word document (sales_quarterly.doc) contains two embedded OLE objects: a Microsoft Excel file called west_region.xls

, and a Bitmap created in the Word document. The embedded objects would be extracted as sales_qu_kv0.xls

sales_qu_kv1.tmp

XML Export SDK C Programming Guide

93

Chapter 3 Use the File Extraction API

94

• XML Export SDK C Programming Guide

C HAPTER 4

Use the XML Export API

This section describes how to perform some basic tasks using the XML Export

API. It contains the following topics:

Extract Metadata

Extract File Format Information

Convert Character Sets

Map Styles

Use Style Sheets

Display Vector Graphics on UNIX and Linux

Convert Revision Tracking Information

Convert PDF Files

Convert Spreadsheet Files

Convert XML Files

XML Export SDK C Programming Guide

95

96

Chapter 4 Use the XML Export API

Extract Metadata

When a file format supports metadata, KeyView can extract and process that information. Metadata includes document information fields such as title, author, creation date, and file size. Depending on the file’s format, metadata is referred to in a number of ways: for example, “summary information,” “OLE summary information,” “file information,” and “document properties.”

The metadata in mail formats (MSG and EML) and mail stores (PST, NSF, and

MBX) is extracted differently than other formats. For information on extracting metadata from these formats, see

“Extract Mail Metadata” on page .

NOTE KeyView can only extract metadata from a document if metadata is defined in the document, and the document reader can extract metadata for

the file format. The section “Supported Formats” on page lists the file

formats for which metadata can be extracted. KeyView does not generate metadata automatically from the document contents.

Extract Metadata Using the API

You can extract the metadata at the API level. The API extracts all valid metadata fields that exist in the file.

To extract metadata using the C API

1. Declare a pointer to the

KVSummaryInfoEx

structure. See

“KVSummaryInfoEx” on page

.

2. Call the fpGetSummaryInfo()

function. See “fpGetSummaryInfo()” on page .

Extract Metadata Using a Template File

When using a template file, KeyView recognizes two types of metadata: standard and non-standard. Standard metadata includes fields, such as Title, Author, and

Subject. The standard fields are enumerated from 1 to 41 in KVSumType in the header file kvtypes.h

. Non-standard metadata includes any field not listed from 1 to 41 in KVSumType , such as user-defined fields (for example, custom property fields in Microsoft Word documents), or fields that are unique to a particular file type (for example, “Artist” or “Genre” fields in MP3 files). Enumerated types 42 and greater are reserved for non-standard metadata.

XML Export SDK C Programming Guide

Extract Metadata

To extract metadata using a template file

1. Insert metadata tokens in a member of the

KVXMLTemplate

structure in the template files. This defines the point at which the metadata appears in the

XML output.

2. If you are using the $USERSUMMARY or $SUMMARY token, define the szUserSummary

member of the

KVXMLTemplate

structure in the template file.

This determines the markup and tokens generated when these metadata tokens are processed.

3. In your application, read the template file and write the data to the

KVXMLTemplate

structure. See

139 .

The following tokens can be used in the template files:

$SUMMARYNN Inserts the data from a specified metadata field. NN is a number from 00 through 33 that is enumerated in KVSumType in kvtypes.h

.

$SUMMARY Inserts the data from valid metadata fields in the range of 0 to 33 using the markup provided in pszUserSummary.

$USERSUMMARY Inserts the data from every valid non-standard metadata field using the markup provided in pszUserSummary.

$CONTENT

$NAME

Inserts the content of the metadata field specified by the $NAME token.

Inserts the name of a the metadata field, such as “Title,” “Author,” or

“Subject.”

Examples

$SUMMARYNN

The following markup displays the contents of the “Title” field at the top of the main XML file: szMainTop=$SUMMARY01

In KVSumType , 01 is the enumerated value for the “Title” metadata field.

$SUMMARY

The following markup extracts all standard fields, and includes them in the first H1

XML block: szFirstH1Start=$SUMMARY szUserSummary=<MetaData name="$NAME" content="$CONTENT" />

XML Export SDK C Programming Guide

97

98

Chapter 4 Use the XML Export API

This example extracts the field name (

$NAME

) and field content (

$CONTENT

) for standard metadata and includes it at the beginning of the first heading level 1 XML block.

The generated XML may look like this:

<MetaData name="CodePage" content="1252" \>

<MetaData name="Title" content="My design document" \>

<MetaData name="Subject" content="design specifications" \>

<MetaData name="Author" content="John Doe" \>

<MetaData name="Keywords" content="" \>

<MetaData name="Comments" content="" \>

<MetaData name="Template" content="Normal.dot" \>

<MetaData name="LastAuthor" content="lchapman" \>

<MetaData name="RevNumber" content="6" \>

<MetaData name="EditTime" content="01/01/1601, 0:08" \>

<MetaData name="LastPrinted" content="14/01/2002, 14:06" \>

<MetaData name="Create_DTM" content="27/08/2003, 10:31" \>

<MetaData name="LastSave_DTM" content="29/08/2003, 14:07" \>

<MetaData name="PageCount" content="1" \>

<MetaData name="WordCount" content="4062" \>

<MetaData name="CharCount" content="23159" \>

<MetaData name="AppName" content="Microsoft Word 9.0" \>

<MetaData name="Security" content="0" \>

<MetaData name="Category" content="software" \>

<MetaData name="LineCount" content="192" \>

<MetaData name="ParCount" content="46" \>

<MetaData name="ScaleCrop" content="FALSE" \>

<MetaData name="Manager" content="" \>

<MetaData name="Company" content="Autonomy" \>

<MetaData name="LinksDirty" content="FALSE" \>

$USERSUMMARY

The following markup extracts non-standard fields, and includes them at the bottom of the main XML file: szMainBottom=$USERSUMMARY szUserSummary=<MetaData name="$NAME" content="$CONTENT" />

This example extracts the field name ( $NAME ) and field content ( $CONTENT ) for non-standard metadata from a document, and includes it at the bottom of the main

XML file.

The generated XML may look like this:

<MetaData name="Telephone number" content="444-111-2222"

<MetaData name="Recorded date" content="07/03/2003, 23:00"

<MetaData name="Source" content="TRUE"

<MetaData name="my property" content="reserved"

XML Export SDK C Programming Guide

Extract File Format Information

Extract File Format Information

Export can detect a file’s format, and report the information to the API, which in turn reports the information to the developer’s application. This feature enables you to apply customized conversion settings based on a file’s format. See

“File

Format Detection” on page

for more information on format detection.

To extract file format information using the C API

1. Declare a pointer to the

KVStreamInfo

data structure. See

“KVStreamInfo” on

2. Call the fpGetStreamInfo()

function. See

“fpGetStreamInfo()” on page

.

Convert Character Sets

Export enables you to control the character set of both the input and the output text. This is accomplished by either

 setting the source and/or target character set in the API, or basing the input/output on the character set of the document (if the document character set is stored in the document and can be determined by the document reader).

The character sets are enumerated in KVCharSet of kvtypes.h

. Not all character sets can be used to specify the target character set. See

Table on for

a list of character sets that can be used as a target character set.

Determine the Character Set of the Output Text

To determine the output character set of a converted document, Export considers the following:

 Whether the reader can extract the character set from the document. This depends on whether the file format can provide character set information and whether the document actually contains character set information.

The section

“Supported Formats” on page

indicates the file formats for which character set information can be extracted. If character set information cannot be determined for your document type, you must set the source and/or target character set in the API.

Whether a source character set is set in the API.

XML Export SDK C Programming Guide

99

Chapter 4 Use the XML Export API

NOTE To set the source character set, you must specify a character set and set the bForceSrcCharSet member of the KVXMLOptions structure to TRUE.

 Whether a target character set is set in the API.

NOTE To set the target character set, you must specify a character set and set the parameter bForceOutputCharSet member of the KVXMLOptions structure to TRUE.

Guidelines for Character Set Conversion

Figure 8

shows how the output character set is determined when the document character set can be determined:

Figure 8 Document Character Set Can Be Determined

100

• XML Export SDK C Programming Guide

Convert Character Sets

Figure 9

shows how the output character set is determined when the document character set cannot be determined:

Figure 9 Document Character Set Cannot Be Determined

Examples of Character Set Conversion

The examples below demonstrate possible configurations for mapping character sets and the expected output for each scenario.

XML Export SDK C Programming Guide

101

102

Chapter 4 Use the XML Export API

Document Character Set Can be Determined

For the example in Table 8 , the document is an RTF file. The section

“Word

Processing Formats” on page indicates the document character set can be

obtained from this file type. The document character set is Traditional Chinese

(BIG5).

Source charset set

KVCS_GB

Target charset set

KVCS_UTF8

KVCS_GB

--

--

--

KVCS_UTF8

--

Output charset

KVCS_UTF8

Converts GB (Simplified Chinese) to

UTF-8. Output character set is the target character set specified in the API.

KVCS_GB

Converts BIG5 to GB (Simplified Chinese).

Output character set is the source character set specified in the API.

KVCS_UTF8

Converts BIG5 to UTF-8. Output character set is the target character set specified in the API.

KVCS_BIG5

Output character set is the document character set. No conversion.

XML Export SDK C Programming Guide

Convert Character Sets

Document Character Set Cannot be Determined

For the example in Table 9 , the document is an ASCII file. The section

“Word

Processing Formats” on page indicates the document character set cannot

be obtained from this file type. The document character set is KVCS_1251 .

Source charset set

KVCS_1252

Target charset set

KVCS_UTF8

KVCS_1252

KVCS_1252

--

--

KVCS_UNKNOWN

--

KVCS_1252

--

Output charset

KVCS_UTF8

Converts KVCS_1252 to KVCS_UTF8.

Output character set is the target character set specified in the API.

KVCS_1252

Output character set is the source character set specified in the API because

KVCS_UNKNOWN cannot be used. No conversion.

KVCS_1252

Output character set is the source character set specified in the API. No conversion.

KVCS_1252

Converts OS code page to KVCS_1252.

Output character set is the target character set specified in the API.

Output character set is OS code page. No conversion.

Set the Character Set During Conversion

You can convert the character set of a file at the time the file is converted.

To specify the source character set for documents from which the document character set cannot be obtained by the reader

1. Set the eSrcCharSet

member of the structure

KVXMLOptions

to one of the character sets enumerated in KVCharSet in kvtypes.h

. See

“KVXMLOptions” on page .

2. Set the bForceSrcCharSet member of the structure KVXMLOptions to TRUE.

See

“KVXMLOptions” on page

.

XML Export SDK C Programming Guide

103

Chapter 4 Use the XML Export API

104

To specify the target character set:

1. Set the eOutputCharSet

member of the

KVXMLOptions

structure to one of the character sets enumerated in KVCharSet in kvtypes.h

. See

“KVXMLOptions” on page .

2. Set the bForceOutputCharSet member of the structure KVXMLOptions to

TRUE. See “KVXMLOptions” on page .

Set the Character Set During File Extraction from a Container

You can convert the character set of a container sub file at the time the sub file is extracted from the container and before it is converted to XML. This is most often used to set the output character set of a mail message’s body text. See

“Use the

File Extraction API” on page .

To specify the source character set of a sub file, call the fpExtractSubFile() function, and set the KVExtractSubFileArg->srcCharset argument to any value in the enumerated list in

KVCharSet

of kvtypes.h

. See “fpExtractSubFile()” on page .

To specify the target character set of a sub file, call fpExtractSubFile()

, and set the KVExtractSubFileArg->trgCharSet argument to any value in the enumerated list in

KVCharSet

of kvtypes.h

. See

“fpExtractSubFile()” on page .

Map Styles

Export can map paragraph and character styles in any word processing format that contains styles (such as Microsoft Word, RTF, or Folio Flat File) to user-defined markup. With this feature, you can redact (hide) text in the source document, delete content, or change the overall structure of the output. You can also embed style sheet styles in the output defined in the XML.

To enable style mapping, you must indicate which paragraph and/or character styles are to be mapped, and define the starting and ending markup to be included in the XML output. For example, if the source Microsoft Word document contains the character style “Recipe,” and the content of the style in Microsoft

Word is “Brownies,” you can specify that the starting markup be <recipe> and the ending markup

</recipe>

. This would result in the output XML containing:

<recipe> Brownies </recipe> .

You can also use style mapping to control the look of the XML output by either using a Cascading Style Sheet (CSS) or defining the style directly in the starting markup. For example, if a Word document contains the paragraph style “Colorful”, you can have markup of the form <div class=”rainbow”> inserted at the front

XML Export SDK C Programming Guide

Map Styles of the paragraph and markup of the form

</div>

inserted at the end of the paragraph. “Rainbow” is a CSS style defined in an externally provided CSS file referenced at the top of the XML output.

If you map styles to elements or attributes that are not defined in the DTD, you must add the new elements or attributes to the DTD. You must also ensure the new markup is defined in the API, either by entering the markup directly in the classes, or populating the classes using the template files.

Use the C API

To map styles using the C API

1. Define the KVStyle

structure. See “KVStyle” on page . The information in

this structure includes:

 the markup to be added to the beginning and end of a paragraph or character style.

 the name of the word processing style (for example, “Heading 1”) to which style mapping applies. Style names are case sensitive.

 the flag which defines instructions on how to process the content associated with a paragraph or character style. The flags are defined in kvtypes.h

and described in

Table 107 .

2. Call the fpSetStyleMapping()

function. See “fpSetStyleMapping()” on page .

Use a Template file

To map styles using a template file

1. Use the

KVStyle

parameter to specify how many styles are being mapped.

For example, if there are nine mapped heading levels, add the following:

[KVStyle]

NumStyles=9

2. For each style, there must be a [Style X ] entry that contains the markup that appears at the start and end of the defined style. For example, the first heading level is defined as follows:

[Style1]

StyleName=Colorful

MarkUpStart=<div class="colorful">

MarkUpEnd=<!-- end of colorful --></div>

These values are used in StyleName , MarkUpStart , and MarkUpEnd in the

KVStyle

structure. See

241 .

XML Export SDK C Programming Guide

105

106

Chapter 4 Use the XML Export API

3. For each style, define the flag that applies. Flags define instructions on how to process the content associated with a paragraph or character style. They are defined in kvtypes.h

and described in

Table 107 . This value is

used in dwflags of the KVStyle structure. See

“KVStyle” on page . The

value associated with each flag is a hexadecimal number. You can set an option by either entering the converted decimal value or entering the flag’s text.

Flags=0

A finished entry in a template file could look like this:

[KVStyle]

NumStyles=3

[Style1]

StyleName=Colorful

MarkUpStart=<div class="Colorful">

MarkUpEnd=<!-- End of Colorful --></div>

Flags=0

[Style2]

StyleName=RedactPara

MarkUpStart=<div class="RedactPara">

MarkUpEnd=<!-- End of RedactPara --></div>

Flags=2048

[Style3]

StyleName=Code

MarkUpStart=<pre>

MarkUpEnd=<!-- End of Code --></pre>

Flags=KVSTYLE_PRE

XML Export SDK C Programming Guide

Map Styles

Flag Description

KVSTYLE_PRE

KVSTYLE_HEADING[1-6]

The KVSTYLE_PRE flag specifies that white space should be preserved (treated as characters, not word separators), and that mode changes, such as changes in font size within a paragraph, should be ignored. This allows the tags <pre> and </pre> to be used.

The flags KVSTYLE_HEADING[1-6] specify that a given style is to be detected and processed as a heading. Heading flags are exclusive. This means a style cannot be processed as both H1 and

H2.

By default, Export maps the heading style “Heading 1” to <h1></ h1> , and so on, for heading levels 1 through 6. If you use style mappings, the default mapping is overridden. Therefore, you must supply markup for all heading levels. Export uses heading levels to define the overall structure of the XML output.

KVSTYLE_ORDERLIST The KVSTYLE_ORDERLIST flag specifies that the style should be tagged as an ordered list. Currently not implemented.

KVSTYLE_UNORDEREDLIST The KVSTYLE_UNORDERLIST flag specifies that the style should be tagged as an unordered list. Currently not implemented.

KVSTYLE_DELETECONTENT The KVSTYLE_DELETECONTENT flag specifies that the content associated with the style tag should be deleted from the output.

KVSTYLE_ONCONSECUTIVE

PARAGRAPHS

The KVSTYLE_ONCONSECUTIVEPARAGRAPHS flag specifies that the style should be applied to consecutive paragraphs of the document. If this flag is used, and two or more paragraphs require the same style, the opening and closing tags that normally appear between each paragraph are not generated.

KVSTYLE_REDACT The KVSTYLE_REDACT flag is used to hide sensitive or confidential information in the source document. It specifies that the text associated with the style tag should be replaced in the XML output with a selected character. The default replacement character is “X,” but you can specify a different replacement character by setting cRedact

. See “cRedact” on page .

XML Export SDK C Programming Guide

107

Chapter 4 Use the XML Export API

108

Use Style Sheets

XML is a content-based metalanguage designed to structure data. XML does not include information about how a document should be displayed in a browser. To view an XML document in a browser, information about how its displayed must be provided by style sheets. These are coded using either Cascading Style Sheets

(CSS) or Extensible Stylesheet Language (XSL).

The style sheet options are enumerated in KVXMLStyleSheetType .

Use Extensible Style Sheet Language (XSL)

You can use XSL style sheets to specify how XML data is displayed in a browser.

Existing XSL style sheets can be used, but unlike CSS, style sheet information cannot be written to an external XSL file during the conversion.

Both CSS and XSL style sheets can be used to format XML documents. However,

XSL can also transform XML documents. For example, list items can be transformed to display in alphabetical order, words can be replaced by other words, or empty elements can be replaced by text.

To use an existing XSL style sheet

1. Set eStyleSheetType

to

XML_XSL

to enable XSL style sheet mapping.

2. Set bUseExistingStyleSheet to TRUE to apply a pre-existing style sheet to an XML document. Pre-existing style sheets are not validated.

3. Specify the path and filename of the style sheet file in pszStyleSheet .

If bUseExistingStyleSheet

is set to TRUE and pszStyleSheet

is not specified, a default XSL style sheet that is appropriate for the source document type is used.

The following are default XSL style sheets:

 wp.xsl

(for word processing documents)

 ss.xs

l (for spreadsheets)

 pg.xsl

(for presentation graphics)

Use Cascading Style Sheets (CSS)

In addition to XSL style sheets, Export can write style sheet information to an external CSS file. The C sample program xmlini provides an example of how to use an existing style sheet, and output formatting data to an external file. See

“xmlini” on page .

XML Export SDK C Programming Guide

Display Vector Graphics on UNIX and Linux

To enable CSS mapping and output the resulting formatting data in an external file

1. Set eStyleSheetType to XML_CSS .

2. Use the

KVXMLSetStyleSheet()

function to set the path and filename of the external style sheet. See

“KVXMLSetStyleSheet()” on page

.

To enable CSS mapping and use an existing CSS file:

1. Set eStyleSheetType

to

XML_CSS

.

2. Set bUseExistingStyleSheet to TRUE to specify a pre-existing style sheet for an XML document.

3. Specify the path and filename of the style sheet file in pszStyleSheet .

If bUseExistingStyleSheet

is set to TRUE and pszStyleSheet

or

SetExternalStyleFile is not specified, a CSS style sheet is created.

NOTE Cascading style sheets can only be used with word processing documents.

Display Vector Graphics on UNIX and Linux

Export offers the option of rasterizing vector graphic content from source documents into a variety of graphics formats including JPEG, PNG, WMF, and

CGM. This solution is implemented with Windows Graphical Device Interface

(GDI) code, and therefore is not portable to other platforms.

The output format of vector graphics is defined by the member eOutputVectorGraphicType

of the structure

KVXMLOptions

, and the options are enumerated in KVXMLGraphicType in kvxml.h

. See

“KVXMLOptions” on page and

279 .

To display vector graphics in presentation, word processing, and spreadsheet files on UNIX and Linux, Export can convert the files directly to JPEG using a Java program named kvraster.class

. This program uses the Java Abstract

Windowing Toolkit (AWT). The AWT requires access to an X Server.

NOTE If you are using KeyView 10.5.0.0 or Java 1.6, you do not have to set up an X Server; however, if you are using a version of KeyView lower than 10.4 with a version of Java lower than 1.6, you must set up an X

Server.

XML Export SDK C Programming Guide

109

Chapter 4 Use the XML Export API

110

To set up an X Server, do one of the following

 Run a virtual X Server, such as the Xvfb utility. This utility is included in the

X11R6 distribution or can be downloaded from the following site:

 http://www.x.org/Downloads.html

For example, to run the Xvfb utility on a 512 Mb, Solaris 2.8 platform, follow these steps: a. Start Xvfb at root:

/usr/X11R6/bin/Xvfb :1 -screen 0 1152x900x8 & b. Set the display environment variable: setenv DISPLAY:1.0

Make an X display available to the Java runtime using the DISPLAY environment variable. No windows appear on the display. For example, set the

DISPLAY environment variable as follows: setenv DISPLAY computername:0.0

or setenv DISPLAY ipaddress:0.0

After the X Server is set up, the file can be converted.

To convert the file

1. Add the location of the JRE to the

PATH

environment variable.

2. Set eOutputVectorGraphicType to JPEG in the template file or directly in the

API.

3. Convert the document to XML. The graphics in the document are converted to

JPEG and stored in the output directory.

Convert Revision Tracking Information

The revision tracking feature in applications—such as Microsoft Word’s Track

Changes—marks changes to a document (typically, strikethrough for deleted text and underline for inserted text) and tracks each change by reviewer name and date.

If revision tracking was enabled when changes were made to a document, Export can be configured to convert the deleted text and graphics and include revision tracking information in the XML output. (The deleted content and revision tracking information is excluded from the XML output by default.)

XML Export SDK C Programming Guide

Convert Revision Tracking Information

Content that was added to the document is identified by

<ins>

tags. Content that was deleted from the document is identified by <del> tags. The <ins> and <del> tags include cite

and datetime

attributes which define the name of the reviewer who made the change and the date the change was made respectively. (The date is in ISO-8601 format:

YYYY-MM-DDThh:mm:ss

.) The tags also include a title attribute which allows you to display the author and date information in a browser.

These elements are included in the verity.dtd

.

The following markup is generated for inserted text:

<ins title=”Inserted: JohnD, 2006-04-24Tl4:47:00” cite="mailto:JohnD" datetime="2006-04-24T14:47:00">This text was added</ins> in a previous version.

The following markup is generated for deleted text:

<del title=”Deleted: JohnD, 2006-04-24Tl4:56:00” cite="mailto:JohnD" datetime="2006-04-24T14:56:00">This text was deleted</del> in a previous version.

To convert deleted text and graphics and include revision tracking information

1. Call the fpInit() function. See

“fpInit()” on page .

2. Call the fpXMLConfig()

function with the following arguments (see

“KVXMLConfig()” on page ):

Argument Parameter nType nValue pData

KVCFG_INCLREVISIONMARK

TRUE (non-zero)

NULL

For example:

(*fpXMLConfig)(pKVXML, KVCFG_INCLREVISIONMARK, TRUE, NULL);

The xmlini

sample program demonstrates this function. See

“xmlini” on page .

3. Call the fpConvertStream()

or

KVXMLConvertFile()

function. See

“fpConvertStream()” on page

or

214

.

XML Export SDK C Programming Guide

111

112

Chapter 4 Use the XML Export API

Convert PDF Files

Export has special configuration options that allow greater control over the conversion of PDF files. These options can improve the accuracy of the XML output.

Convert PDF Files to a Logical Reading Order

The PDF format is primarily designed for presentation and printing of brochures, magazines, forms, reports, and other materials with complex visual designs. Most

PDF files do not contain the logical structure of the original document—the correct reading order, for example, and the presence and meaning of significant elements such as headers, footers, columns, tables, and so on.

KeyView can convert a PDF file by either using the file’s internal unstructured paragraph flow, or by applying a structure to the paragraphs to reproduce the logical reading order of the visual page. Logical reading order enables KeyView to output PDF files containing languages that read from right-to-left (such as Hebrew and Arabic) in the correct reading direction.

NOTE The algorithm used to reproduce the reading order of a PDF page is based on common page layouts. The paragraph flow generated for PDFs with unique or complex page designs may not emulate the original reading order exactly.

For example, page design elements such as drop caps, callouts that cross column boundaries, and significant changes in font size, may disrupt the logical flow of the output text.

Logical Reading Order and Paragraph Direction

By default, KeyView produces an unstructured text stream for PDF files. This means PDF paragraphs are extracted in the order in which they are stored in the file, not the order in which they appear on the visual page. For example, a three-column article could be output with the headers and the title at the end of the output file, and the second column extracted before the first column. Although this output does not represent a logical reading order, it accurately reflects the internal structure of the PDF.

You can configure KeyView to produce a structured text stream that flows in a specified direction. This means PDF paragraphs are extracted in the order (logical reading order) and direction (left-to-right or right-to-left) in which they appear on the page.

XML Export SDK C Programming Guide

Convert PDF Files

The following paragraph direction options are available:

Paragraph

Direction Option

Left-to-right

Right-to-left

Dynamic

Description

Paragraphs flow logically and read from left to right. This option should be specified when most of your documents are in a language using a left-to-right reading order, such as English or

German.

Paragraphs flow logically and read from right to left. This option should be specified when most of your documents are in a language using a right-to-left reading order, such as Hebrew or

Arabic.

Paragraphs flow logically. The PDF reader determines the paragraph direction for each PDF page, and then sets the direction accordingly. When a paragraph direction is not specified, this option is used.

NOTE Conversions may be slower when logical reading order is enabled. For optimal speed, use an unstructured paragraph flow.

The paragraph direction options control the direction of paragraphs on a page; they do not control the text direction in a paragraph. For example, let us say a

PDF file contains English paragraphs in three columns that read from left to right, but 80% of the second paragraph contains Hebrew characters. If the left-to-right logical reading order is enabled, the paragraphs are ordered logically in the output—title paragraph, then paragraph 1, 2, 3, and so on—and flow from the top left of the first column to the bottom right of the third column. However, the text direction of the second paragraph is determined independently of the page by the

PDF reader, and is output from right to left.

NOTE Extraction of metadata is not affected by the paragraph direction setting. The characters and words in metadata fields are extracted in the correct reading direction regardless of whether logical reading order is enabled.

Enable Logical Reading Order

You can enable logical reading order using either the API or the formats_e.ini

file. Setting the direction in the API overrides the setting in the formats_e.ini

file.

XML Export SDK C Programming Guide

113

114

Chapter 4 Use the XML Export API

Use the C API

To enable PDF logical reading order in the C API

1. Call the fpInit() function. See

“fpInit()” on page .

2. Call the fpXMLConfig()

function with the following arguments (See

“KVXMLConfig()” on page ):

Argument Parameter nType nValue pData

KVCFG_LOGICALPDF

Set to one of the following flags which are defined in kvtypes.h. (see

“LPDF_DIRECTION” on page ):

LPDF_LTR—Logical reading order and left-to-right paragraph direction.

LPDF_RTL—Logical reading order and right-to-left paragraph direction.

LPDF_AUTO—Logical reading order. The PDF reader determines the paragraph direction for each PDF page, and then sets the direction accordingly. When a paragraph direction is not specified, this option is used.

LPDF_RAW—Unstructured paragraph flow. This is the default behavior. If logical reading order is enabled, and you want to return to an unstructured paragraph flow, set this flag.

NULL

For example:

(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_RTL, NULL);

The cnv2xml

sample program demonstrates this function. See

“cnv2xml” on page .

3. Call the fpConvertStream()

or

KVXMLConvertFile()

function. See

“fpConvertStream()” on page

or

214

.

Use the formats_e.ini File

The formats_e.ini

file is in the directory install \ OS \bin , where install is the pathname of the Export installation directory and OS is the name of the operating system.

To enable logical reading order using the formats_e.ini

file

1. Change the PDF reader entry in the

[Formats]

section of the formats_e.ini

file as follows:

XML Export SDK C Programming Guide

Convert PDF Files

[Formats]

200=lpdf

2. Optionally, add the following section to the end of the formats_e.ini

file:

[pdf_flags] pdf_direction=paragraph_direction where paragraph_direction is one of the following:

Flag

LPDF_LTR

LPDF_RTL

LPDF_AUTO

LPDF_RAW

Description

Left-to-right paragraph direction

Right-to-left paragraph direction

The PDF reader determines the paragraph direction for each PDF page, and then sets the direction accordingly. When a paragraph direction is not specified, this option is used.

Unstructured paragraph flow. This is the default behavior. If logical reading order is enabled, and you want to return to an unstructured paragraph flow, set this flag.

Control Hyphenation

There are two types of hyphens in a PDF document:

 A soft hyphen is added to a word by a word processor to divide the word across two lines. This is a discretionary hyphen and is used to ensure proper text flow in justified text.

 A hard hyphen is intentionally added to a word regardless of the word’s position in the text flow. It is required by the rules of grammar and/or word usage. For example, compound words, such as “three-week vacation” and

“self-confident,” contain hard hyphens.

By default, KeyView maintains the source document’s soft hyphens in the output

XML to more accurately represent the source document’s layout. However, if you are using Export to generate text output for an indexing engine or are not concerned with maintaining the document’s layout, it is recommended you remove soft hyphens from the XML output. To remove soft hyphens, you must enable the soft hyphen flag.

NOTE If the soft hyphen flag is enabled, every hyphen at the end of a line is considered a soft hyphen and removed from the XML output. If a hard hyphen appears at the end of a line, it will also be removed. This may result in an intentionally hyphenated word being extracted without a hyphen.

XML Export SDK C Programming Guide

115

Chapter 4 Use the XML Export API

116

To remove soft hyphens from the XML output

1. Call the fpInit()

function. See

“fpInit()” on page .

2. Call the KVXMLConfig() function, with the following arguments (see

“KVXMLConfig()” on page ):

Argument Parameter nType nValue pData

KVCFG_DELSOFTHYPHEN

TRUE (non-zero)

NULL

For example:

(*fpXMLConfig)(pKVXML, KVCFG_DELSOFTHYPHEN, TRUE, NULL);

3. Call the fpConvertStream() or KVXMLConvertFile() function. See

“fpConvertStream()” on page

or

214

.

Improve Performance for PDFs with Many Small Images

To improve performance when converting PDF files containing many small pixel images, you can specify in the formats_e.ini

file the minimum pixel height and width for images that are converted to JPEG. If an image is smaller than the minimum height and width, KeyView does not generate a JPEG file for the image.

For example, to specify that images 16 pixels in height and width and less are not converted, you would add the following to the [pdf_flags] section of the formats_e.ini

:

[pdf_flags] process_images_with_min_height=17 process_images_with_min_width=17

Extract Custom Metadata from PDF Files

To extract custom metadata from your PDF files, add the custom metadata names to the pdfsr.ini

file provided, and copy the modified file to the \bin directory.

You can then extract metadata as you normally would.

The pdfsr.ini

is in the directory samples\pdfini , and has the following structure:

<META>

<TOTAL>total_item_number</TOTAL>,

/metadata_tag_name datatype,

</META>

XML Export SDK C Programming Guide

Convert Spreadsheet Files

Parameter total item number metadata_tag_name datatype

For example:

<META>

<TOTAL> 4 </TOTAL>

/part_number

/volume

INT4

INT4

/purchase_date

/customer

DATETIME

STRING

</META>

Description

The total number of metadata tags that are listed.

The metadata tag name used in the PDF files.

The data type of the metadata field. Data types are defined in KVSumInfoType. See

“KVSumInfoType” on page .

Convert Spreadsheet Files

Export has special configuration options that allow greater control over the conversion of spreadsheet files.

Convert Hidden Text in Microsoft Excel Files

Normally, Export does not convert hidden text from a Microsoft Excel spreadsheet because it is assumed the text should not be exposed. You can change this default behavior, and convert text in hidden rows, columns, and sheets by adding the following lines to the formats_e.ini

file:

[Options] gethiddeninfo=1

Convert Headers and Footers in Microsoft Excel 2003 Files

Normally, Export does not convert headers and footers from Microsoft Excel 2003 spreadsheets. You can change this default behavior and convert headers and footers by adding the following lines to the formats_e.ini file:

[Options]

ShowHeaderFooter=1

XML Export SDK C Programming Guide

117

118

Chapter 4 Use the XML Export API

Specify Date and Time Format on UNIX Systems

System date and time format information is not stored in Microsoft Excel files. On

Windows systems, you can specify a locale setting to determine the date and time format. However, on UNIX systems, the date and time format is set to the U.S. short date format by default (mm/dd/yyyy). To change the format, you must use a formats_e.ini

option.

To specify the system date and time format on UNIX systems

 In the formats.ini file, set the SysDateTime option in the

[LocaleSetting] section. For example:

SysDateTime=%d/%m/%Y

In this example, dates and times are extracted in the following format:

28/02/2008

The format arguments are the same as those for the strftime() function.

Refer to the following Web page for more information. http://linux.die.net/man/3/strftime

Extract Microsoft Excel Formulas

Normally, the actual value of a formula is extracted from an Excel spreadsheet; the formula from which the value is derived is not included in the output. However,

KeyView enables you to include the value as well as the formula in the output. For example, if Export is configured to extract the formula and the formula value, the output may look like this:

245 = SUM(B21:B26)

The calculated value from the cell is

245

and the formula from which the value is derived is SUM(B21:B26) .

NOTE Depending on the complexity of the formulas, enabling formula extraction may result in slightly slower performance.

To set the extraction option for formulas, add the following lines to the formats_e.ini

file:

[Options] getformulastring=option

XML Export SDK C Programming Guide

Convert Spreadsheet Files where option is one of the following:

Option

0

1

2

Description

Extract the formula value only. This is the default.

If formula extraction is enabled, and you want to return to the default, set this option.

Extract the formula only.

Extract the formula and the formula value.

NOTE If a function in a formula is not supported or is invalid, and option 1 or 2 is specified, only the calculated value is extracted. See

Table for a

list of supported functions.

When formula extraction is enabled, Export can extract Microsoft Excel formulas containing the functions listed in

Table :

=ABS()

=ASIN()

=CELL()

=CODE()

=ACOS()

=ATAN2()

=CHAR()

=COLUMN()

=COS()

=DATEVALUE()

=COUNT()

=DAVERAGE()

=DDB() =DMAX()

=DSTDEV() =DSUM()

=EXP()

=FIXED()

=HOUR()

=INDIRECT()

=ISERR()

=ISREF()

=FACT()

=FV()

=ISBLANK()

=INT()

=ISERROR()

=ISTEXT()

=AND()

=ATAN2()

=CHOOSE()

=COLUMNS()

=COUNTA()

=DAY()

=DMIN()

=DVAR()

=AREAS()

=AVERAGE()

=CLEAN()

=CONCATENATE()

=DATE()

=DCOUNT()

=DOLLAR()

=EXACT()

=FALSE()

=GROWTH()

=FIND()

=HLOOKUP()

=IF() =INDEX()

=IPMT() =IRR()

=ISNA()

=LEFT()

=ISNUMBER()

=LEN()

XML Export SDK C Programming Guide

119

120

Chapter 4 Use the XML Export API

=LINEST()

=LOGEST()

=MAX()

=MINUTE()

=MOD()

=NOT()

=OFFSET()

=PPMT()

=RATE()

=ROUND()

=SEARCH()

=SLN()

=SUM()

=TEXT()

=TRANSPOSE()

=TYPE()

=VLOOKUP()

=LN()

=LOOKUP()

=MDETERM()

=MINVERSE()

=LOG()

=LOWER()

=MID()

=MIRR()

=MONTH()

=NOW()

=N()

=NPER()

=OR() =PI()

=PRODUCT() =PROPER()

=REPLACE()

=ROUND()

=SECOND()

=SQRT()

=SYD()

=TIME()

=TREND()

=UPPER()

=WEEKDAY()

=REPT()

=ROW()

=SIGN()

=STDEV()

=LOG10()

=MATCH()

=MIN()

=MMULT()

=NA()

=NPV()

=PMT()

=PV()

=RIGHT()

=ROWS()

=SIN()

=SUBSTITUTE()

=T() =TAN()

=TIMEVALUE() =TODAY()

=TRIM()

=VALUE()

=YEAR()

=TRUE()

=VAR()

Convert XML Files

Export enables you to extract all or selected content from source XML files (see

“Configure Element Extraction for XML Documents” on page ). It detects the

following XML formats:

 generic XML

 Microsoft Office 2003 XML (Word, Excel, and Visio)

 StarOffice/OpenOffice XML (text document, presentation, and spreadsheet)

See

Appendix E for more information on format detection.

XML Export SDK C Programming Guide

Convert XML Files

Configure Element Extraction for XML Documents

When converting XML files, you can specify which elements and attributes are extracted according to the file’s format ID or root element. This is useful when you want to extract only relevant text elements, such as abstracts from reports, or a list of authors from an anthology.

A root element is an element in which all other elements are contained. In the

XML sample below, book

is the root element:

<book>

<title>XML Introduction</title>

<product id="33-657" status="draft">XML Tutorial</product>

<chapter>Introduction to XML

<para>What is HTML</para>

<para>What is XML</para>

</chapter>

<chapter>XML Syntax

<para>Elements must have a closing tag</para>

<para>Elements must be properly nested</para>

</chapter>

</book>

For example, you could specify that when converting files with the root element book

, the element title

is extracted as metadata, and only product

elements with a status attribute value of draft are extracted. When you extract an element, the child elements within the element are also extracted. For example, if you extract the element chapter from the sample above, the child element para is also extracted.

Export defines default element extraction settings for the following XML formats:

 generic XML

 Microsoft Office 2003 XML (Word, Excel, and Visio)

 StarOffice/OpenOffice XML (text document, presentation, and spreadsheet)

These settings are defined internally and are used when converting these file formats; however, you can modify their values.

In addition to the default extraction settings, you can also add custom settings for your own XML document types. If you do not define custom settings for your own

XML document types, the settings for the generic XML are used.

XML Export SDK C Programming Guide

121

122

Chapter 4 Use the XML Export API

Modify Element Extraction Settings

You can modify configuration settings for XML documents through either the API or the kvxconfig.ini

file.

NOTE You can only use customized element extraction settings when converting files in process. When converting out of process, the default extraction settings are used.

Use the C API

You can use the C API to modify the settings for the standard XML document types or add configuration settings for your own XML document types.

To modify settings

1. Call the fpInit() function. See

“fpInit()” on page .

2. Define the

KVXConfigInfo

data structure. See

“KVXConfigInfo” on page

.

3. Call the KVXMLConfig() function with the following arguments (see

“KVXMLConfig()” on page ):

Argument Parameter nType nValue pData

KVCFG_SETXMLCONFIGINFO

0 address of the KVXConfigInfo structure

For example:

KVXConfigInfo xinfo; /* populate xinfo */

(*fpXMLConfig)(pKVXML, KVCFG_SETXMLCONFIGINFO, 0, &xinfo);

4. Repeat steps 2 and 3 until the settings for all the XML document types you want to customize are defined.

5. Call the function fpConvertStream() or KVXMLConvertFile() . See

“fpConvertStream()” on page

or

214

.

Use an Initialization File

You can use the initialization file to modify the settings for the standard XML document types or add configuration settings for your own XML document types.

To modify settings

1. Modify the kvxconfig.ini

file.

XML Export SDK C Programming Guide

Convert XML Files

2. Use the template file when processing the XML file. See

“Modify Element

Extraction Settings in the kvxconfig.ini File” on page .

The sample program ( xmlini

) demonstrates how to use a template file during the conversion process. See

“xmlini” on page

.

Modify Element Extraction Settings in the kvxconfig.ini File

The kvxconfig.ini

file contains default element extraction settings for supported XML formats. The file is in the directory install

\ OS \bin

, where install is the pathname of the Export installation directory and OS is the name of the operating system. For example, the following entry defines extraction settings for the Microsoft Visio 2003 XML format:

[config3] eKVFormat=MS_Visio_XML_Fmt szRoot= szInMetaElement=DocumentProperties szExMetaElement=PreviewPicture szInContentElement=Text szExContentElement= szInAttribute=

The following options are available:

Configuration Option eKVFormat szRoot szInMetaElement

Description

The format ID as detected by the KeyView detection module.

This determines the file type to which these extraction

settings apply. See Appendix E for more information on

format ID values.

If you are adding configuration settings for a custom XML document type, this is not defined.

The file’s root element. When the format ID is not defined, the root element is used to determine the file type to which these settings apply.

To further qualify the element, specify its namespace. See

“Specify an Element’s Namespace and Attribute” on page .

The elements extracted from the file as metadata. All other elements are extracted as text.

Multiple entries must be separated by commas. To further qualify the element, specify its namespace and/or attributes.

See “Specify an Element’s Namespace and Attribute” on page .

XML Export SDK C Programming Guide

123

124

Chapter 4 Use the XML Export API

Configuration Option szExMetaElement

Description

The child elements in the included metadata elements that are not extracted from the file as metadata. For example, the default extraction settings for the Visio XML format extracts the DocumentProperties element as metadata. This element includes child elements such as Title, Subject,

Author , Description, and so on. However, the child element PreviewPicture is defined in szExMetaElement because it is binary data and should not be extracted.

You cannot exclude any metadata elements from the output for

StarOffice files. All metadata is extracted regardless of this setting.

Multiple entries must be separated by commas. To further qualify the element, specify its namespace and/or attributes.

See “Specify an Element’s Namespace and Attribute” on page .

szInContentElement The elements extracted from the file as content text. Enter an asterisk (*) to extract all elements including child elements.

Multiple entries must be separated by commas. To further qualify the element, specify its namespace and/or attributes.

See “Specify an Element’s Namespace and Attribute” on page .

szExContentElement The child elements in the included content elements that are not extracted from the file as content text.

Multiple entries must be separated by commas. To further qualify the element, specify its namespace and/or attributes.

See “Specify an Element’s Namespace and Attribute” on page .

szInAttribute The attribute values extracted from the file. If attributes are not defined here, attribute values are not extracted.

Enter the namespace (if used), element name, and attribute name in the following format:

namespace:elementname@attributename

For example:

Autonomy:division@name

Multiple entries must be separated by commas.

XML Export SDK C Programming Guide

Convert XML Files

Specify an Element’s Namespace and Attribute

To further qualify an element, you can specify that the element exist in a certain namespace and/or contain a specific attribute. To define the namespace and attribute of an element, enter the following:

ns_prefix:elemname@attribname=attribvalue

Attribute values containing spaces must be enclosed in quotation marks.

For example, the following entry: bg:language@id=xml extracts a language

element in the namespace bg

that contains the attribute name id with the value of “xml” . This entry extracts the following element from an XML file:

<bg:language id="xml">XML is a simple, flexible text format derived from SGML</bg:language> but does not extract:

<bg:language id="sgml">SGML is a system for defining markup languages.</bg:language> or

<adv:language id="xml">The namespace should be a Uniform Resource

Identifier (URI).</adv:language>

Add Configuration Settings for Custom XML Document Types

You can define element extraction settings for custom XML document types by adding the settings to the kvxconfig.ini

file. For example, for files containing the root element autonomyxml

, we could add the following section to the end of the initialization file:

[config101] eKVFormat= szRoot=autonomyxml szInMetaElement=dc:title,dc:meta@title,dc:meta@name=title szExMetaElement= szInContentElement=autonomy:division@name=dev,autonomy:division@n ame=export,p@style="Heading 1" szExContentElement= szInAttribute=autonomy:division@name

The custom extraction settings must be preceded by a section heading named

[config N ] , where N is an integer starting at 100 and increasing by 1 for each additional file type, as in

[config100]

,

[config101]

,

[config102]

, and so on.

The default extraction settings for the supported XML formats are numbered config0

to config99

. Currently only

0

to

6

are used.

XML Export SDK C Programming Guide

125

Chapter 4 Use the XML Export API

126

Since a custom XML document type is not recognized by the KeyView detection module, the format ID is not defined. The file type is identified by the file’s root element only.

If a custom XML document type is not defined in the kvxconfig.ini

file or by the

KVXMLConfig()

function, then the default extraction settings for a generic XML document are used.

Show Hidden Data

Microsoft Word, Excel, or PowerPoint documents contain hidden information, some of which is shown by default when exported and some of which is hidden by default. There are several options that allow you to determine exactly which types of hidden data are exported.

Hidden Data in Microsoft Documents

You can show or display four types of hidden data from Microsoft Word, Excel, and PowerPoint documents, each of which has a corresponding flag in the

KVXMLConfig()

function, which you can toggle to determine whether the hidden data is shown or not.

Table

lists each data type, its default behavior, and its corresponding configuration API flag.

Hidden Data Type

Microsoft Word

Comments a

Hidden text

Date field codes

File name field codes

Microsoft Excel

Hidden information

Comments

Formulas

Microsoft PowerPoint

Hidden slides

Default Behavior Configuration API Flag

Shown b

Hidden

Calculated date

Document file name

Hidden

Hidden

Calculated value

Shown

KVCFG_WP_NOCOMMENTS

KVCFG_WP_SHOWHIDDENTEXT

KVCFG_WP_SHOWDATEFIELDCODE

KVCFG_WP_SHOWFILENAMEFIELDCODE

KVCFG_SS_SHOWHIDDENINFOR

KVCFG_SS_SHOWCOMMENTS

KVCFG_SS_SHOWFORMULA

KVCFG_PG_HIDEHIDDENSLIDE

XML Export SDK C Programming Guide

Show Hidden Data

Hidden Data Type

Comments

Comments slide

Slide notes e

Default Behavior Configuration API Flag

Shown c

Hidden

Hidden

KVCFG_PG_HIDECOMMENT

KVCFG_PG_SHOWCOMMENTSSLIDE d

KVCFG_PG_SHOWSLIDENOTES a. Word comment settings can also be toggled with a configuration parameter in the formats_e.ini file.

See “Toggle Word Comment Settings in the formats_e.ini File” on page

.

b. Shown by default in Microsoft Word 97 to 2003 documents.

c. Shown by default in Microsoft PowerPoint 97 to 2000 documents.

d. This setting affects PowerPoint 2003 and 2007 only.

e. PowerPoint slide note settings can also be toggled with a configuration parameter in the formats_e.ini

file. See “Toggle PowerPoint Slide Note Settings in the formats_e.ini File” on page .

To toggle the display of any type of hidden data

 Use the configuration API and set the third parameter to TRUE or FALSE:

(*fpXMLConfig)(pKVXML, KVCFG_WP_NOCOMMENTS, TRUE, NULL)

In this example, comments will not be exported from Word documents.

NOTE The third parameter affects the default behavior.

To change the default behavior, set it to TRUE.

For more information, see

“KVXMLConfig()” on page .

Toggle Word Comment Settings in the formats_e.ini File

Microsoft Word 97 to 2003 comment settings can also be controlled through a parameter in the formats_e.ini file.

The formats_e.ini file is in the directory install\OS\bin, where install is the pathname of the Export installation directory and OS is the name of the operating system.

To toggle comment output in formats_e.ini

1. Open the formats_e.ini file in a text editor.

2. Under [Options], add the WP_NOCOMMENTS parameter and set it to 0 to show comments or 1 to hide comments. For example:

[Options]

XML Export SDK C Programming Guide

127

128

Chapter 4 Use the XML Export API

WP_NOCOMMENTS=1

NOTE The configuration API flag

KVCFG_WP_NOCOMMENTS overrides the setting in formats_e.ini

.

Toggle PowerPoint Slide Note Settings in the formats_e.ini File

Microsoft PowerPoint slide note settings can also be controlled through a parameter in the formats_e.ini file.

The formats_e.ini file is in the directory install\OS\bin, where install is the pathname of the Export installation directory and OS is the name of the operating system.

To toggle slide note output in formats_e.ini

1. Open the formats_e.ini file in a text editor.

2. Under [Options], add the ShowSlideNotes parameter and set it to 1 to show slide notes or 0 to hide slide notes. For example:

[Options]

ShowSlideNotes=1

NOTE The configuration API flag

KVCFG_PG_SHOWSLIDENOTES overrides the setting in formats_e.ini

.

XML Export SDK C Programming Guide

Show Hidden Data

Show Hidden Data

Microsoft Word, Excel, and PowerPoint documents contain hidden information, some of which is shown by default when exported and some of which is hidden by default. There are several options that allow you to determine which types of hidden data are shown.

Hidden Data in Microsoft Documents

You can show several types of hidden data from Microsoft Word, Excel, and

PowerPoint documents, each of which has a corresponding flag in the

KVXMLConfig()

function, which you can toggle to determine whether the hidden data is shown or not.

Table

lists each data type, its default behavior, and its corresponding configuration API flag.

Hidden Data Type

Microsoft Word

Comments a

Hidden text

Date field codes

File name field codes

Microsoft Excel

Hidden information

Comments

Formulas

Microsoft PowerPoint

Hidden slides

Comments

Comments slide

Slide notes e

Default Behavior

Shown b

Hidden

Calculated date

Document file name

Hidden

Hidden

Calculated value

Shown

Shown c

Hidden

Hidden

Configuration API Flag

KVCFG_WP_NOCOMMENTS

KVCFG_WP_SHOWHIDDENTEXT

KVCFG_WP_SHOWDATEFIELDCODE

KVCFG_WP_SHOWFILENAMEFIELDCODE

KVCFG_SS_SHOWHIDDENINFOR

KVCFG_SS_SHOWCOMMENTS

KVCFG_SS_SHOWFORMULA

KVCFG_PG_HIDEHIDDENSLIDE

KVCFG_PG_HIDECOMMENT

KVCFG_PG_SHOWCOMMENTSSLIDE d

KVCFG_PG_SHOWSLIDENOTES a. Word comment settings can also be toggled with a configuration parameter in the formats_e.ini file.

See “Toggle Word Comment Settings in the formats_e.ini File” on page .

b. Shown by default in Microsoft Word 97 to 2003 documents.

c. Shown by default in Microsoft PowerPoint 97 to 2000 documents.

XML Export SDK C Programming Guide

129

Chapter 4 Use the XML Export API

130

• d. This setting affects PowerPoint 2003 and 2007 only.

e. PowerPoint slide note settings can also be toggled with a configuration parameter in the formats_e.ini file. See

“Toggle PowerPoint Slide Note Settings in the formats_e.ini File” on page .

To toggle the display of any type of hidden data

 Use the configuration API and set the third parameter to TRUE or FALSE:

(*fpHTMLConfig)(pKVHTML, KVCFG_WP_NOCOMMENTS, TRUE, NULL)

In this example, comments will not be exported from Word documents.

NOTE The third parameter affects the default behavior.

To change the default behavior, set it to TRUE.

For more information, see

“KVXMLConfig()” on page .

Toggle Word Comment Settings in the formats_e.ini File

Microsoft Word 97 to 2003 comment settings can also be controlled through a parameter in the formats_e.ini file.

The formats_e.ini file is in the directory install\OS\bin, where install is the pathname of the Export installation directory and OS is the name of the operating system.

To toggle comment output in formats_e.ini

1. Open the formats_e.ini file in a text editor.

2. Under [Options], add the WP_NOCOMMENTS parameter and set it to 0 to show comments or 1 to hide comments. For example:

[Options]

WP_NOCOMMENTS=1

NOTE The configuration API flag

KVCFG_WP_NOCOMMENTS overrides the setting in formats_e.ini

.

Toggle PowerPoint Slide Note Settings in the formats_e.ini File

Microsoft PowerPoint slide note settings can also be controlled through a parameter in the formats_e.ini file.

The formats_e.ini file is in the directory install\OS\bin, where install is the pathname of the Export installation directory and OS is the name of the operating system.

XML Export SDK C Programming Guide

Show Hidden Data

To toggle slide note output in formats_e.ini

1. Open the formats_e.ini file in a text editor.

2. Under [Options], add the ShowSlideNotes parameter and set it to 1 to show slide notes or 0 to hide slide notes. For example:

[Options]

ShowSlideNotes=1

NOTE The configuration API flag

KVCFG_PG_SHOWSLIDENOTES overrides the setting in formats_e.ini

.

XML Export SDK C Programming Guide

131

Chapter 4 Use the XML Export API

132

• XML Export SDK C Programming Guide

C HAPTER 5

Sample Programs

This section describes the sample programs provided with XML Export. It contains the following topics:

Introduction

tstxtract

cnv2xml

cnv2xmloop

metadata xmlindex

xmlini

xmlcallback xmlonefile

xmlmulti

Export Demo

Introduction

The sample programs demonstrate how to use the C and Visual Basic implementations of XML Export. The sample code is intended to provide a starting point for your own applications or to be used for reference purposes.

XML Export SDK C Programming Guide

133

134

Chapter 5 Sample Programs

The source code and makefile for each program are in the directory

install\xmlexport\programs\program_name where install is the pathname of the Export installation directory, and program_name is the name of the sample program.

C Sample Programs

The C sample programs demonstrate how to use the C implementation of XML

Export. The sample code is intended to provide a starting point for your own applications or to be used for reference purposes.

The following C sample programs are provided:

tstxtract

cnv2xml

cnv2xmloop

metadata xmlindex

xmlini

xmlcallback xmlonefile

xmlmulti

The source code and makefile for each program are in the directory

install\xmlexport\programs\program_name where install is the pathname of the Export installation directory, and program_name is the name of the sample program.

NOTE The sample programs do not parse white space in filenames. If your filenames contain spaces, use quotation marks around the entire path name. Inserting quotation marks around the filename only does not work.

To compile the C sample programs, use the makefiles provided in the sample programs directories. Ensure the XML Export include directory is specified in the include path of the project. Once the executables are compiled and built, they must be placed in the same directory as the XML Export libraries.

XML Export SDK C Programming Guide

tstxtract

Compile the Visual Basic Sample Program

To compile Export Demo, use the Visual Studio project file ( demo_vb.vbp

) in the directory install

\xmlexport\programs\ExportDemo

, where install is the pathname of the Export installation directory.

tstxtract

The tstxtract sample program demonstrates the File Extraction API. It opens a file, extracts sub files from the file, and repeats the extraction process until all sub files are extracted. It also demonstrates how to extract the default set of metadata and pass integer or string names to extract specific metadata. After the files are extracted, you can convert the files using one of the conversion sample programs.

The source code for the tstxtract sample program is the same for the Filter and Export SDKs. A flag in the makefile specifies whether the program is compiled for Filter, HTML Export, or XML Export.

To run tstxtract, type the following command line: tstxtract [options] input_file output_directory bin_directory where options is one or more of the following:

Option Description

-c charset

-cf keyfile1, keyfile2,...

-l logfile

Specify the target character set, for example KVCS_SJIS. See

“Coded Character Sets” on page

for a full list of supported character sets.

Specify one or more credential files (private keys) to use to decrypt encrypted .EML, .MBX, .PST, or .MSG files.

Specify the path and filename of the log file in which metadata is written.

Retrieve metadata and write the data to the log file.

-lm

-lms

metaname1,

metaname2,...

Retrieve metadata with string metanames and write the data to the log file for .MSG, .EML, .MBX, and .NSF files.

-lmi metaint1,

metaint2,...

Retrieve metadata with integer (hexadecimal) metanames and write the data to the log file for .PST files.

-lma Retrieve all metadata from an .NSF file and write the data to the log file.

XML Export SDK C Programming Guide

135

Chapter 5 Sample Programs

136

Option

-r

-msg

-f

-t

-h

Description

Recursively extract second-level subfiles to the specified output directory. For example, if a .ZIP file contains a Microsoft Word file and the Word file contains an embedded Microsoft Excel file, set the -r option to extract both the Word and Excel files.

If this option is not set, only first-level subfiles are extracted. For the example above, only the Word file would be extracted.

Extract mail messages in a .PST file as an .MSG file, including all of its attachments. If this flag is not set, the mail message is extracted as text. This applies to PST files on Windows only.

Extract the formatted version of the message body (HTML or

RTF) from mail files when possible. If neither an HTML nor RTF version of the message body exists in the mail file, then it is extracted as plain text. If this flag is not set, the message body is extracted as plain text when possible.

Preserve the timestamp of embedded files when possible.

Extract hidden text.

input_file is the full path and filename of the source document.

output_directory is the directory to which the files will be extracted. bin_directory is the path to the Export bin directory. This is required if you do not run the program from the install\Export SDK\bin directory.

cnv2xml

The cnv2xml

sample program creates a single, formatted XML output file. It is called by the Export Demo sample program, but can also be used on its own. This program runs on both Windows and UNIX platforms.

To run cnv2xml , type the following command line: cnv2xml [options] inputfile outputfile where, options

is one or more of the options listed in Table .

inputfile is the full path and filename of the source document.

outputfile is the full path and filename of the first XML output file.

XML Export SDK C Programming Guide

cnv2xmloop

The following options are available:

Option

-pdfltr

-pdfrtl

-pdfauto

-pdfraw

14 cnv2xml

Sample Program

Description

-c KVCFG_SUPPRESSIMAGES

-c KVCFG_ENABLEPOSITIONINFO

-c KVCFG_DELSOFTHYPHEN

Specifies that XML output includes verbose markup, but no images. If this option is not set, then embedded images in a document are regenerated as separate files and stored in the output directory. See

“KVXMLConfig()” on page

.

Specifies that a position element is included in the markup for PDF documents. The position element defines the absolute position of the text relative to the bottom left corner of the page, and includes additional information such as font and color. See

“KVXMLConfig()” on page

.

Specifies that soft hyphens in PDF files are deleted from the

converted output. See “Control Hyphenation” on page .

Specifies that PDF files are output in a logical reading order, and the paragraph direction is left to right. See

“Convert PDF

Files to a Logical Reading Order” on page .

Specifies that PDF files are output in a logical reading order, and the paragraph direction is right to left. See

“Convert PDF

Files to a Logical Reading Order” on page .

Specifies that PDF files are output in a logical reading order.

The PDF reader determines the paragraph direction

(left-to-right or right-to-left) for each PDF page, and then sets the direction accordingly. See

“Convert PDF Files to a

Logical Reading Order” on page .

Specifies that PDF files are output in an unstructured paragraph flow. This is the default. If logical reading order is enabled, and you want to return to an unstructured paragraph flow, set this flag. See

“Convert PDF Files to a

Logical Reading Order” on page .

cnv2xmloop

The cnv2xmloop

sample program creates a single, formatted XML output file, but unlike cnv2xml , it converts the file out of process. See

“Convert Files Out of

Process” on page

for more information on out of process conversions. This program runs on both Windows and UNIX platforms.

To run cnv2xmloop

, type the following command line: cnv2xmloop [options] inputfile outputfile

XML Export SDK C Programming Guide

137

Chapter 5 Sample Programs where, options

is one or more of the options listed in Table .

inputfile is the full path and filename of the source document.

outputfile is the full path and filename of the XML output file.

The following options are available:

Option

15 cnv2xmloop

Sample Program

Description

-c KVCFG_SUPPRESSIMAGES

-c KVCFG_ENABLEPOSITIONINFO

Specifies that XML output includes verbose markup, but no images. If this option is not set, then embedded images in a document are regenerated as separate files and stored in the output directory.

See

“KVXMLConfig()” on page

.

Specifies that a position element is included in the markup for PDF documents. The position element defines the absolute position of the text relative to the bottom left corner of the page, and includes additional information such as font and color. See

“KVXMLConfig()” on page .

metadata

The metadata

sample program converts a source document into a single XML file that only contains the document metadata (Author, Subject, Title, and so on). This program runs on both Windows and UNIX platforms.

To run metadata , type the following command line: metadata inputfile outputfile where, inputfile is the full path and filename of the source document.

outputfile is the full path and filename of the first XML output file.

138

xmlindex

The xmlindex

sample program produces stripped-down XML output suitable for use with indexing engines. It converts a source document into a single, largely unformatted XML file. This program runs on both Windows and UNIX platforms.

XML Export SDK C Programming Guide

xmlini

To run index

, type the following command line: xmlindex inputfile outputfile where, inputfile is the full path and filename of the source document.

outputfile is the full path and filename of the first XML output file.

xmlini

The xmlini sample program is used in conjunction with template files to produce well-formed XML documents. For more information, see

“Set Conversion Options

Using the Template Files” on page . Sample template files are in the directory

programs\ini

. This program runs on both Windows and UNIX platforms.

To run xmlini , type the following command line: xmlini [options] inifile inputfile outputfile where, options

is one or more of the options listed in Table .

inifile is the full path and filename of a template file.

inputfile is the full path and filename of the source document.

outputfile is the full path and filename of the first XML output file.

XML Export SDK C Programming Guide

139

140

Chapter 5 Sample Programs

The following options are available:

Option

16 xmlini Sample Program

Description

-s stylesheetfile

-rm

Reads style sheet information from an existing style sheet file, or writes the information to an external CSS file. See

“Use Style Sheets with xmlini” on page .

-x xmlconfig_filename Converts an XML file using customized element extraction settings defined in the kvxconfig.ini file. If you do not enter the full path to the template file, the program looks for the file in the current working directory

( install \ OS \bin , where install is the pathname of the Export installation directory and OS is the name of the operating system). See

“Convert XML Files” on page .

If this is set, text and graphics that were deleted from a document with a revision tracking feature enabled are converted and revision tracking information is included in the XML output. See

“Convert Revision Tracking

Information” on page .

-oop

-fl

Runs the conversion out of process.

Prints a list of converted files in the console.

If the XML file is output to a directory other than the directory programs\tempout

, you must update the XML markup so that, the browser can find images used by the template (such as backgrounds or corporate logos) and the style sheet. The markup contains relative references to the image files ( ..\images ).

Use Style Sheets with xmlini

The xmlini sample program provides an option that allows XML Export to read

Cascading Style Sheet (CSS), or Extensible Stylesheet Language (XSL) style sheet information from an existing style sheet file, or to write CSS information to an external CSS file. If the CSS does not exist, it is created. The style sheet name is referenced in the output XML, for example:

<?xml-stylesheet href="c:\mystyle.css" type="text/css"?>

This type of conversion makes the XML output document significantly smaller and allows you to use the same style sheet for many conversions.

XML Export SDK C Programming Guide

xmlcallback

To apply an existing style sheet to a conversion using the xmlini

sample program

1. In the template file, set eStyleSheetType to either XML_CSS or XML_XSL . This specifies that the formatting data is stored in either a CSS, or an XSL style sheet.

2. At the command prompt, type: xmlini -s stylesheetname inifile inputfile outputfile where stylesheetname is the path and filename of the CSS or XSL file.

xmlcallback

The xmlcallback sample program demonstrates how you can control the conversion to generate specialized output while it is in progress. The program employs developer-defined callbacks and memory management functions during conversion. This program runs on Windows platforms only.

To run xmlcallback , type the following command line: xmlcallback inputfile outputfile where, inputfile is the full path and filename of the source document.

outputfile is the full path and filename of the first XML output file.

xmlonefile

The xmlonefile

sample program converts a source document into a single, formatted XML file. This program runs on Windows platforms only.

To run xmlonefile

, type the following command line: xmlonefile inputfile outputfile where, inputfile is the full path and filename of the source document.

outputfile is the full path and filename of the first XML output file.

XML Export SDK C Programming Guide

141

142

Chapter 5 Sample Programs

xmlmulti

The xmlmulti

sample program creates multiple XML files from a source document. The main file contains the table of contents. Each H1 heading is contained within its own file. The main file contains hyperlinks to each H1 block; each H1 file contains navigation to the table of contents, as well as to the previous and next blocks. This program runs on Windows platforms only.

To run multi , type the following command line: xmlmulti inputfile outputfile where, inputfile is the full path and filename of the source document.

outputfile is the full path and filename of the first XML output file.

Export Demo

Export Demo is a Visual Basic program that provides an easy-to-use graphical user interface to the KeyView Export technology. It allows you to select files, convert them to XML, and view the result in a browser object. The output options that control the look of the output files are pre-defined in Export Demo and cannot be changed in the user interface.

Export Demo accesses the Export functionality by returning to the operating system and running a C program named cnv2xml

. To adapt the sample program to your needs, modify the GUI using Visual Basic, and the cnv2xml program

using C. See “cnv2xml” on page .

To launch Export Demo, select Export Demo from Start | Programs | Autonomy

| Export SDK | XML Export.

The source code for the program is in the directory install \xmlexport\ programs\ExportDemo

, where install is the pathname of the Export installation directory. Export Demo is for Windows only.

See

“Use the Export Demo Program” on page

for more information.

XML Export SDK C Programming Guide

P ART 3

C API Reference

This section provides detailed reference information for the

C-language implementation of the File Extraction and Export

APIs. It includes the following chapters:

File Extraction API Functions

File Extraction API Structures

XML Export API Functions

XML Export API Callback Functions

XML Export API Structures

Enumerated Types

Part 3 C API Reference

144

• XML Export SDK C Programming Guide

C HAPTER 6

File Extraction API Functions

This section describes the functions in the File Extraction API. The File Extraction functions open a container file, and extract the container’s sub files so that the sub files are exposed and available for conversion. Sub files may be files within a Zip archive, messages in a mail store, attachments in a mail message, or OLE objects embedded in a compound document. See

“Sub File Extraction” on page for

more information.

Each function appears as a function prototype followed by a description of its arguments, its return value, and a discussion of its use. This section contains the following topics.

KVGetExtractInterface()

fpCloseFile()

fpExtractSubFile()

fpFreeStruct()

fpGetMainFileInfo()

fpGetSubFileInfo()

fpGetSubFileMetaData()

fpOpenFile()

XML Export SDK C Programming Guide

145

146

Chapter 6 File Extraction API Functions

KVGetExtractInterface()

This function is the entry point to obtain the file extraction functions. It supplies pointers to the file extraction functions, and in the case of out-of-process mode starts the kvoop.exe server and initializes out-of-process extraction services.

When KVGetExtractInterface() is called, it assigns the function pointers in the structure KVExtractInterface to the functions described in this section.

Syntax int pascal KVGetExtractInterface (

void *pContext,

KVExtractInterface pIextract);

Arguments pContext pIextract

Pointer returned from fpInit().

Pointer to the structure KVExtractInterface, which contains function pointers that KVGetExtractInterface() assigns to all

other file extraction functions. See “KVExtractInterface” on page .

Before initializing the KVExtractInterface structure, use the macro KVStructInit to initialize the KVStructHead structure. See

“KVStructHead” on page

.

Returns

If the call is successful, the return value is KVERR_Success.

If the call is not successful, the return value is an error code.

Example fpKVGetExtractInterface =

(int (pascal *)( void *, KVExtractInterface))myGetProcAddress(hKVExport,

(char*)"KVGetExtractInterface");

/*Initialize file extraction interface structure using KVStructInit*/

KVStructInit(&extractInterface);

/* Retrieve file extraction interface */ error = (*fpKVGetExtractInterface)(pExport,&extractInterface))

XML Export SDK C Programming Guide

fpCloseFile()

fpCloseFile()

This function frees the memory allocated by fpOpenFile() and closes the file.

See

“fpOpenFile()” on page .

Syntax int (pascal *fpCloseFile) (void *pFile);

Arguments pFile Identifier of the file. This is a file handle returned from fpOpenFile().

See

“fpOpenFile()” on page .

Returns

If the file is closed, the return value is KVERR_Success.

If the file is not closed, the return value is an error code.

Example extractInterface->fpCloseFile(pFile); pFile = NULL;

XML Export SDK C Programming Guide

147

Chapter 6 File Extraction API Functions

148

fpExtractSubFile()

This function extracts a sub file from a container file to a user-defined path or output stream. This call returns file format information when file is extracted to a path.

Syntax int (pascal *fpExtractSubFile) (

void *pFile,

KVExtractSubFileArg

KVSubFileExtractInfo

extractArg,

*extractInfo);

Arguments pFile extractArg

Identifier of the file. This is a file handle returned from fpOpenFile() . See

“fpOpenFile()” on page .

Pointer to the structure KVExtractSubFileArg, which defines the sub file to be extracted. See

“KVExtractSubFileArg” on page .

Before initializing the KVExtractSubFileArg structure, use the macro KVStructInit to initialize the KVStructHead structure. See

“KVStructHead” on page

.

extractInfo Pointer to the structure KVSubFileExtractInfo, which defines

information about the extracted sub file. See “KVSubFileExtractInfo” on page .

Returns

 If the sub file is extracted from the container file, the return value is

KVERR_Success .

 If the sub file is not extracted from the container file, the return value is an error code.

Discussion

 After the file is extracted, call fpFreeStruct() to free the memory allocated

by this function. See “fpFreeStruct()” on page .

 If the sub file is embedded in the main file as a link and is stored externally, extractInfo->infoFlag is set to

KVSubFileExtractInfoFlag_External . For example, the sub file may be an object that was embedded in a Word document using “Link to File,” or

XML Export SDK C Programming Guide

fpExtractSubFile() an attachment that is referenced in an MBX message. This type of sub file cannot be extracted. You must write code to access the sub file based on the path in the member extractInfo->filePath or extractInfo->fileName

. See “KVSubFileExtractInfo” on page

.

Example

KVSubFileExtractInfo extractInfo = NULL;

KVStructInit(&extractArg); extractArg.index = index; extractArg.extractionFlag = KVExtractionFlag_CreateDir |

KVExtractionFlag_Overwrite; extractArg.filePath = subFileInfo->subFileName;

/*Extract this sub file*/ error=extractInterface->fpExtractSubFile(pFile,&extractArg,&extractInfo); if ( error )

{

extractInterface->fpFreeStruct(pFile,extractInfo);

subFileInfo = NULL;

}

XML Export SDK C Programming Guide

149

150

Chapter 6 File Extraction API Functions

fpFreeStruct()

This function frees the memory allocated by fpGetMainFileInfo(), fpGetSubFileInfo() , fpGetSubFileMetadata(), and fpExtractSubFile() .

Syntax int (pascal *fpFreeStruct) (

void *pFile,

void *obj);

Arguments pFile obj

Identifier of the file. This is a file handle returned from fpOpenFile().

See

“fpOpenFile()” on page

.

Pointer to the result object returned by fpGetMainFileInfo(), fpGetSubFileInfo() , fpGetSubFileMetaData, or fpExtractSubFile() .

Returns

If the allocated memory is freed, the return value is KVERR_Success.

Otherwise, the return value is an error code.

Example

The example below frees the memory allocated by fpGetSubFileInfo(): if ( subFileInfo )

{

extractInterface->fpFreeStruct(pFile,subFileInfo);

subFileInfo = NULL;

}

XML Export SDK C Programming Guide

fpGetMainFileInfo()

fpGetMainFileInfo()

This function determines whether a file is a container file—that is, whether it contains sub files—and should be extracted further.

Syntax int (pascal *fpGetMainFileInfo) (

void *pFile,

KVMainFileInfo *fileInfo);

Arguments pFile Identifier of the file. This is a file handle returned from fpOpenFile().

See “fpOpenFile()” on page .

fileInfo Pointer to the structure KVMainFileInfo. This structure contains information about the file. See

“KVMainFileInfo” on page .

Returns

If the file information is retrieved, the return value is KVERR_Success.

If the file information is not retrieved, the return value is an error code.

Discussion

 After the file information is retrieved, call fpFreeStruct() to free the memory allocated by this function. See

“fpFreeStruct()” on page

.

 If the file is a container (fileInfo->numSubFiles is non-zero), call fpGetSubFileInfo() and fpExtractSubFile() for each sub file. See

“fpGetSubFileInfo()” on page

and

148

.

 If the file is not a container (fileInfo->numSubFiles is 0) and contains text (fileInfo->infoFlag is set to

KVMainFileInfoFlag_HasContent) , pass the file directly to the conversion functions. See

“XML Export API Functions” on page .

Example

KVMainFileInfo fileInfo = NULL; if( (error=extractInterface->fpGetMainFileInfo(pFile,&fileInfo)))

{

XML Export SDK C Programming Guide

151

Chapter 6 File Extraction API Functions

/* Free result object allocated in fileInfo */

extractInterface->fpFreeStruct(pFile,fileInfo);

fileInfo = NULL;

}

152

• XML Export SDK C Programming Guide

fpGetSubFileInfo()

fpGetSubFileInfo()

This function gets information about a sub file in a container file.

Syntax int (pascal *fpGetSubFileInfo) (

void *pFile,

int index,

KVSubFileInfo *subFileInfo);

Arguments pFile index

Identifier of the main file. This is a file handle returned from fpOpenFile()

. See “fpOpenFile()” on page .

The index number of the sub file for which information will be retrieved. subFileInfo Pointer to the structure KVSubFileInfo, which defines information about the sub file. See

“KVSubFileInfo” on page .

Returns

If the file information is retrieved, the return value is KVERR_Success.

If the file information is not retrieved, the return value is an error code.

Discussion

 After the sub file information is retrieved, call fpFreeStruct() to free the memory allocated by this function. See

“fpFreeStruct()” on page

.

 If the root node is not enabled, the first sub file is index 0. If the root node is enabled, the first sub file is index 1. The root node is required to recreate a file’s hierarchy. See

“Create a Root Node” on page .

 The members subFileInfo->parentIndex and subFileInfo->childArray enable you to recreate a file’s hierarchy. Since childArray only retrieves the first-level children in the sub file, you must call fpGetSubFileInfo() repeatedly until information for the leaf-node children is extracted. See

“Recreate a File’s Hierarchy” on page .

 If the sub file is embedded in the main file as a link and is stored externally, subFileInfo->infoFlag is set to KVSubFileInfoFlag_External. For example, the sub file may be an object that was embedded in a Word

XML Export SDK C Programming Guide

153

Chapter 6 File Extraction API Functions document using “Link to File,” or an attachment that is referenced in an MBX message. This type of sub file cannot be extracted. You must write code to access the sub file based on the path in the member subFileInfo->subFileName

. See “KVSubFileInfo” on page

.

The KVSubFileInfoFlag_External flag will not be set for an OLE object that is embedded as a link in a Microsoft PowerPoint file. KeyView can only detect linked objects in a Microsoft PowerPoint file when the object is extracted. See

“fpExtractSubFile()” on page .

Example

KVSubFileInfo for ( index = 0; index < fileInfo->numSubFiles; index++)

{

error=extractInterface->fpGetSubFileInfo(pFile,index,&subFileInfo);

if ( error )

{

extractInterface->fpFreeStruct(pFile,subFileInfo);

subFileInfo = NULL;

}

154

• XML Export SDK C Programming Guide

fpGetSubFileMetaData()

fpGetSubFileMetaData()

This function extracts metadata from mail stores, mail messages, and non-mail

items in an NSF file. See “Extract Mail Metadata” on page .

Syntax int (pascal *fpGetSubFileMetaData) (

void *pFile,

KVGetSubFileMetaArg

KVSubFileMetaData

metaArg,

*metaData);

Arguments pFile metaArg metaData

Identifier of the file. This is a file handle returned from fpOpenFile().

See “fpOpenFile()” on page .

Pointer to the structure KVGetSubFileMetaArg, which defines metadata tags whose values are retrieved. See

“KVGetSubFileMetaArg” on page .

Before initializing the KVGetSubFileMetaArg structure, use the macro KVStructInit to initialize the KVStructHead structure. See

“KVStructHead” on page

.

Pointer to the structure KVSubFileMetaData, which contains the retrieved metadata values. See

“KVSubFileMetaData” on page .

Returns

If the metadata is retrieved, the return value is KVERR_Success.

If the metadata is not retrieved, the return value is an error code.

Discussion

 After the metadata is retrieved, call fpFreeStruct() to free the memory allocated by this function. See

“fpFreeStruct()” on page .

When you pass in 0 for metaArg->metaNameCount, and NULL for metaArg->metaNameArray , a set of default metadata is retrieved. See

“Extract Mail Metadata” on page .

If a field is repeated in an EML or MBX mail header, the values in each instance of the field are concatenated and returned as one field. The values are separated by five pound signs (#####) delimiter.

XML Export SDK C Programming Guide

155

156

Chapter 6 File Extraction API Functions

Example

KVSubFileMetaData metaData = NULL;

KVStructInit(&metaArg);

/* retrieve all the default metadata elements */ metaArg.metaNameCount = 0; metaArg.metaNameArray = NULL; metaArg.index = Index; error = extractInterface->fpGetSubFileMetaData(pFile,&metaArg,&metaData);

...

extractInterface->fpFreeStruct(pFile,metaData); metaData = NULL;

/* retrieve specific metadata fields */

KVMetaName pName[2];

KVMetaNameRec names[2]; names[0].type = KVMetaNameType_Integer; names[0].name.iname = KVPR_SUBJECT; names[1].type = KVMetaNameType_Integer; names[1].name.iname = KVPR_DISPLAY_TO; pName[0] = &names[0]; pName[1] = &names[1]; metaArg.metaNameCount = 2; metaArg.metaNameArray = pName; metaArg.index = Index; error = extractInterface->fpGetSubFileMetaData

(pFile,&metaArg,&metaData);

...

extractInterface->fpFreeStruct(pFile,metaData); metaData = NULL;

XML Export SDK C Programming Guide

fpOpenFile()

fpOpenFile()

This function opens a file to make the file accessible for sub file extraction or conversion.

Syntax int (pascal *fpOpenFile) (

void *pContext,

KVOpenFileArg

void

openArg,

**pFile);

Arguments pContext openArg pFile

Pointer returned from fpInit().

Pointer to the structure KVOpenFileArg. This structure defines the input parameters necessary to open a file for extraction, such as

credentials, and the default extraction directory. See “KVOpenFileArg” on page .

Before initializing the KVOpenFileArg structure, use the macro

KVStructInit to initialize the KVStructHead structure. See

“KVStructHead” on page

.

Handle for the opened file. This handle is used in subsequent file extraction calls to identify the source file.

Returns

Discussion

Call fpCloseFile() to free the memory allocated by this function. See

“fpCloseFile()” on page .

Example

If the file is opened, the return value is KVERR_Success.

If the file is not opened, the return value is an error code and pFile is NULL.

KVOpenFileArgRec openArg;

/*Initialize the structure using KVStructInit*/

KVStructInit(&openArg);

XML Export SDK C Programming Guide

157

Chapter 6 File Extraction API Functions openArg.extractDir = destDir; openArg.filePath = srcFile;

/*Open the main file */ if ( (error = extractInterface->fpOpenFile(pExport,&openArg,&pFile)))

{

extractInterface->fpCloseFile(pFile);

pFile = NULL;

}

158

• XML Export SDK C Programming Guide

C HAPTER 7

File Extraction API Structures

This section provides information on the structures used by the File Extraction

API. These structures define the input and output parameters required to extract sub files from a container file, and are defined in kvxtract.h. This section contains the following topics.

KVCredential

KVCredentialComponent

KVExtractInterface

KVExtractSubFileArg

KVGetSubFileMetaArg

KVMainFileInfo

KVMetadataElem

KVMetaName

KVOpenFileArg

KVOutputStream

KVSubFileExtractInfo

KVSubFileInfo

KVSubFileMetaData

XML Export SDK C Programming Guide

159

Chapter 7 File Extraction API Structures

KVCredential

This structure contains a count of the number of credential elements, and a pointer to the first element of the array of individual elements. It is initialized by calling fpOpenFile(). See

“fpOpenFile()” on page

. It is defined in kvxtract.h

.

typedef struct tag_KVCredential

{

int itemCount;

KVCredentialComponent

}

*items;

KVCredentialRec, *KVCredential;

Member Descriptions itemCount The number of credentials defined for this file.

items Pointer to the structure KVCredentialComponent. This structure contains the individual credential elements used to open a protected file.

See

“KVCredentialComponent” on page

.

160

• XML Export SDK C Programming Guide

KVCredentialComponent

KVCredentialComponent

This structure contains the value of a credential item. It is defined in kvxtract.h

.

typedef struct tag_KVCredentialComponent

{

KVCredKeyType

union

keytype;

{

void

char

*pkey;

*skey;

unsigned int

}

ikey;

keyobj;

}

KVCredentialComponentRec, *KVCredentialComponent;

Member Descriptions keytype pkey skey ikey

The type of credential (such as a user name or password). The types are defined by the enumerated type KVCredKeyType. See

“KVCredKeyType” on page .

Pointer to a structure defining credentials. Reserved for future use.

Pointer to a string credential key.

An integer credential key.

XML Export SDK C Programming Guide

161

162

Chapter 7 File Extraction API Structures

KVExtractInterface

The members of this structure are pointers to the file extraction functions

described in “File Extraction API Functions” on page . When the function

KVGetExtractInterface() is called, this structure assigns pointers to the functions. The structure is defined in kvxtract.h. See

“KVGetExtractInterface()” on page

.

typedef struct tag_KVExtractInterface

{

KVStructHeader;

int void **pFileHandle);

int (pascal *fpCloseFile) (void *pFileHandle);

int (pascal *fpGetMainFileInfo) (void *pFile, KVMainFileInfo

*MainFileInfo);

int (pascal *fpGetSubFileInfo) (void *pFile, int index,

KVSubFileInfo *subFileInfo);

int (pascal *fpGetSubFileMetaData) (void *pFile,

KVGetSubFileMetaArg metaArg, KVSubFileMetaData *metaData);

int (pascal *fpExtractSubFile) (void *pFile,

KVExtractSubFileArg extractArg, KVSubFileExtractInfo

*extractInfo);

int (pascal *fpFreeStruct) (void *pFile, void *obj);

}

KVExtractInterfaceRec, *KVExtractInterface;

Member Descriptions

The member functions are described in

“File Extraction API Functions” on page .

Discussion

Before initializing a File Extraction structure, use the macro KVStructInit to initialize the KVStructHead structure. This sets the revision number of the File

Extraction API and supports binary compatibility with future releases. See

“KVStructHead” on page .

XML Export SDK C Programming Guide

KVExtractSubFileArg

KVExtractSubFileArg

This structure defines the input parameters required to extract a sub file. See

“fpExtractSubFile()” on page . It is defined in kvxtract.h.

typedef struct tag_KVExtractSubFileArg

{

KVStructHeader;

int index;

KVCharSet

KVCharSet

srcCharset;

trgCharset;

int isMSBLSB;

DWORD

char

extractionFlag

*filePath;

char *extractDir;

KVOutputStream *stream;

}

KVExtractContainerSubFileArgRec, *KVExtractContainerSubFileArg;

Member Descriptions

KVStructHeader The KeyView version of the structure. See

“KVStructHead” on page

.

index The index number of the sub file to be extracted.

srcCharset trgCharset isMSBLSB

Specifies the source character set of the sub file when the file format’s reader cannot determine the character set. The character sets are enumerated in KVCharSet of kvtypes.h. See

“Discussion” below.

If the file type is KVFileType_Main, this is the target character set of the extracted file. Otherwise, this is ignored. The character sets are enumerated in KVCharSet in kvtypes.h. See

“Discussion” below.

This flag indicates whether the byte order for Unicode text is Big

Endian (MSBLSB) or Little Endian (LSBMSB).

XML Export SDK C Programming Guide

163

164

Chapter 7 File Extraction API Structures extractionFlag A bitwise flag defining additional parameters for file extraction.

The following flags are available:

KVExtractionFlag_CreateDir

Indicates whether the directory structure of a sub file should be created. If this is set, the path defined in filePath is created if it does not already exist. If this is not set, the path is not created, and the function returns FALSE.

KVExtractionFlag_Overwrite

If this is set, and the file being extracted has the same name as a file in the target path, the file in the target path is overwritten without warning. If this is not set, and a sub file has the same name as a file in the target path, the error

KVError_OutputFileExists is generated.

KVExtractionFlag_ExcludeMailHeader

If this is set, header information (To, From, Sent, and so on) in a mail file is not included in the extracted data. If this is not set, the extracted data contains header information and the message’s body text. See

“Exclude Metadata from the

Extracted Text File” on page .

KVExtractionFlag_GetFormattedBody

If this is set, the formatted version of the message body (HTML or RTF) is extracted from mail files when possible. If neither an

HTML nor RTF version of the message body exists in the mail file, then it is extracted as plain text. If this flag is not set, the message body is extracted as plain text when possible.

Note: When an HTML or RTF message body is extracted, the message’s mail headers (such as “From,” “To,” and “Subject,”) are extracted, saved in the same format, and added to the beginning of the sub file. This applies to PST (MAPI-based reader), MSG and NSF files only.

KVExtractionFlag_SaveAsMSG

If this is set, the mail message is extracted as an MSG file, including all of its attachments. If this flag is not set, the mail message is extracted as text. This applies to PST files on

Windows only.

Note: In file mode, when the application sets this flag in fpExtractSubFile() , it must also check the

KVSubFileExtractInfo structure’s filePath parameter to verify the filename used for extraction. See

“fpExtractSubFile()” on page and

“KVSubFileExtractInfo” on page

.

XML Export SDK C Programming Guide

KVExtractSubFileArg filePath extractDir stream

Pointer to the suggested path or filename to which the sub file is extracted. This can be a filename, partial path, or full path. This can be used in conjunction with extractDir to create the full output path. See

“Discussion” below.

Pointer to the directory to which sub files are extracted. This directory must exist. If this is set, the path specified in

KVOpenFileArg->extractDir is ignored. This is used in conjunction with filePath to create the full output path.

Pointer to an output stream defined by KVOutputStream. See

“KVOutputStream” on page

. See

“Discussion” below.

Discussion

 The KVSubFileExtractInfoFlag_CharsetConverted flag in the

KVSubFileExtractInfo structure indicates whether the character set of

the sub file was converted during extraction. See “KVSubFileExtractInfo” on page .

If the document character set is detected and is also specified in srcCharset , the detected character set is overridden by the specified character set. If the source character set is not detected and is not specified, character set conversion does not occur. The section

“Supported Formats” on page lists the formats for which the source character set can be

determined.

The following applies when the output is to a file:

 If filePath is a valid full path, filePath is the output path, and the path in extractDir is ignored.

 If filePath is a filename or partial path, the target directory specified in either KVExtractSubFileArg->extractDir or

KVOpenFileArg->extractDir is used to create the full path. See

“KVOpenFileArg” on page .

 If filePath is a full path or partial path, and createDir is TRUE, the directory is created if it does not already exist.

 If filePath is not specified, a default name and the target directory specified in either KVExtractSubFileArg->extractDir or

KVOpenFileArg->extractDir are used to create a full path.

 If both filePath and extractDir are not specified or are invalid, an error is returned.

 If filePath is valid, but extractDir is not valid, an error is returned.

The following applies when the output is to a stream:

XML Export SDK C Programming Guide

165

Chapter 7 File Extraction API Structures

Set filePath and extractDir to NULL.

The file format (docInfo) and extraction file path (filePath) are not returned in KVSubFileExtractInfo. See

“KVSubFileExtractInfo” on page .

The flags KVExtractionFlag_CreateDir and

KVExtractionFlag_Overwrite are ignored.

166

• XML Export SDK C Programming Guide

KVGetSubFileMetaArg

KVGetSubFileMetaArg

This structure defines the metadata tags whose values are retrieved by fpGetSubFileMetaData() . See

“fpGetSubFileMetaData()” on page . It is

defined in kvxtract.h.

typedef struct tag_KVGetSubFileMetaArg

{

KVStructHeader;

int index;

int metaNameCount;

KVMetaName

KVCharSet

*metaNameArray;

srcCharset;

KVCharSet trgCharset;

int isMSBLSB;

}

KVGetSubFileMetaArgRec, *KVGetSubFileMetaArg;

Member Descriptions

KVStructHeader The KeyView version of the structure. See

“KVStructHead” on page .

index metaNameCount

The index number of the sub file for which metadata is extracted.

The number of metadata fields to be extracted. metaNameArray srcCharset trgCharset isMSBLSB

Pointer to the structure KVMetaName containing an array of metadata tags whose values are retrieved. See

“KVMetaName” on page .

Specifies the source character set of the metadata when the format’s reader cannot determine the character set. The character sets are enumerated in KVCharSet of kvtypes.h.

See “Discussion” below.

The target character set of the extracted metadata.

The character sets are enumerated in KVCharSet in kvtypes.h

.

This flag indicates whether the byte order for Unicode text is Big

Endian (MSBLSB) or Little Endian (LSBMSB).

Discussion

 If the character set is detected and is also specified in srcCharset, the detected character set is overridden by the specified character set. If the

XML Export SDK C Programming Guide

167

Chapter 7 File Extraction API Structures source character set is not detected and is not specified, character set

conversion does not occur. The section “Supported Formats” on page

lists the formats for which the source character set can be determined.

 To retrieve a pre-defined list of metadata, pass 0 for metaNameCount and

NULL for metaNameArray. The metadata in

Table 73 is extracted.

168

• XML Export SDK C Programming Guide

KVMainFileInfo

KVMainFileInfo

This structure contains information about a main file that is open for extraction. It is initialized by calling fpGetMainFileInfo(). See

“fpGetMainFileInfo()” on page . It is defined in kvxtract.h.

typedef struct tag_KVMainFileInfo

{

KVStructHeader;

int numSubFiles;

ADDOCINFO

KVCharSet

docInfo;

charset;

int isMSBLSB;

unsigned long

}

infoFlag;

KVMainFileInfoRec, *KVMainFileInfo;

Member Descriptions

KVStructHeader The KeyView version of the structure. See

“KVStructHead” on page

.

numSubFiles The number of sub files in the main file.

docInfo charset isMSBLSB infoFlag

The file’s major format (such as Microsoft Word or Corel

Presentation) as defined by the structure ADDOCINFO. See

“ADDOCINFO” on page .

The character set of the main file.

This flag indicates whether the byte order for Unicode text is Big

Endian (MSBLSB) or Little Endian (LSBMSB).

A bitwise flag providing additional information about the main file.

The following flag is available:

KVMainFileInfoFlag_HasContent —The main file contains text that can be converted. Below are some examples of how this flag is used:

 For an MSG file without attachments, numSubFiles is 1

(message body text), and this flag is FALSE because the

MSG file itself does not contain text.

 For a Zip file with three files, numSubFiles is 3, and this flag is FALSE because a Zip file does not contain text.

 For a Microsoft Word file with an embedded OLE object, numSubFiles is 1 (OLE object), and this flag is TRUE (Word file contains text to be converted).

XML Export SDK C Programming Guide

169

Chapter 7 File Extraction API Structures

Discussion

 If numSubFiles is 0, the file does not contain sub files and does not need to be extracted further. If the KVMainInfoFlag_HasContent flag is set, the file contains body text and can be passed directly to the conversion functions.

See

“XML Export API Functions” on page .

If numSubFiles is non-zero, get information on the sub file by calling fpGetSubFileInfo() , and then extract the sub files using fpExtractSubFile()

. See “fpGetSubFileInfo()” on page and

“fpExtractSubFile()” on page .

If openFlag is set to KVOpenFileFlag_CreateRootNode in the call to fpOpenFile() , numSubFiles also includes the root object (index 0) which is created by KeyView for reconstructing the file’s hierarchy. See

“KVOpenFileArg” on page .

170

• XML Export SDK C Programming Guide

KVMetadataElem

KVMetadataElem

This structure contains metadata field values extracted from a mail file. It is defined in kvtypes.h. typedef struct tag_KVMetadataElem

{

int isDataValid;

int dataID;

KVMetadataType

char*

dataType;

strType;

void* data;

int dataSize;

}

KVMetadataElem;

Member Descriptions isDataValid dataID dataType strType data dataSize

Specifies whether the metadata returned from the API is valid data.

The integer name of the extracted metadata field.

The data type of the metadata field. The types are defined in

KVMetadataType in kvtypes.h. See

“KVMetadataType” on page .

Pointer to the string name of the metadata field.

The contents of the metadata field.

If the type member is KVMetadata_Int4 or

KVMetadata_Bool , this member contains the actual value.

Otherwise, this member is a pointer to the actual value.

KVMetadata_DateTime points to an 8-byte value.

KVMetadata_String and KVMetadata_Unicode point to the beginning of the string containing the text. The strings are NULL terminated.

KVMetadata_Binary points to the first element of a byte array.

The byte count of data when the type is KVMetadata_Binary,

KVMetadata_Unicode or KVMetadata_String.

XML Export SDK C Programming Guide

171

172

Chapter 7 File Extraction API Structures

KVMetaName

This structure defines the names of the metadata fields to be extracted from a mail file. It is defined in kvxtract.h.

typedef struct tag_KVMetaName

{

KVMetaNameType

union

type;

{

void

char

}name;

*pname;

int iname;

*sname;

}

KVMetaNameRec, *KVMetaName;

Member Descriptions type pname iname sname

The type of metadata name (such as integer or string). The types are defined by the enumerated type KVMetaNameType. See

“KVMetaNameType” on page

. Note MAPI property names are of type integer.

Pointer to a structure defining the metadata fields to be retrieved.

The name of a metadata field of type integer.

Pointer to the name of a metadata field of type string.

Discussion

If you specify the MAPI tag name (for example, PR_CONVERSATION_TOPIC), you must include the Windows header files mapitags.h and mapidefs.h in which

PR_CONVERSATION_TOPIC is defined as 0x0070001e.

XML Export SDK C Programming Guide

KVOpenFileArg

KVOpenFileArg

This structure defines the input arguments necessary to open a file for extraction.

It is initialized by calling fpOpenFile(). See

“fpOpenFile()” on page . It is

defined in kvxtract.h.

typedef struct tag_KVOpenFileArg

{

KVStructHeader;

KVCredential cred;

KVInputStream

char

*stream;

*filePath;

char

DWORD

*extractDir;

openFlag;

DWORD

void

reserved;

*pReserved;

}

KVOpenFileArgRec, *KVOpenFileArg;

Member Descriptions

KVStructHeader

The KeyView version of the structure. See “KVStructHead” on page .

cred The credentials required to open a protected PST or NSF file. This is a pointer to the KVCredential structure. Your application can define multiple credentials to this member for multiple formats.

See

“KVCredential” on page .

stream filePath extractDir

Pointer to the developer-assigned instance of KVInputStream.

The structure KVInputStream defines the input stream

containing the source. See “KVInputStream” on page .

If you are using a file as input, this is NULL.

Pointer to the full file path to the source file.

If you are using a stream as input, this is NULL.

Pointer to the default directory to which sub files are extracted.

This directory must exist.

This is used in conjunction with

KVExtractSubFileArg->filePath to create the full output path. See

“KVExtractSubFileArg” on page .

XML Export SDK C Programming Guide

173

Chapter 7 File Extraction API Structures openFlag reserved pReserved

A bitwise flag defining additional parameters for opening the file.

The following flag is available:

KVOpenFileFlag_CreateRootNode —If this flag is set,

KeyView creates a root object when extracting this file’s sub files.

This root node does not have a parent and is at the highest level of the file’s tree structure. It is used internally to provide a reference point from which all other child nodes are determined, and the file’s hierarchy is created.

If you want to maintain the file’s hierarchy when you extract sub

files from a container, you must set this flag. See “Recreate a File’s

Hierarchy” on page

for more information.

The root node has an index of zero. Although not all container formats require an artificial root node, the root is created for all container formats regardless of whether the file itself contains a root directory or file.

Reserved for future use. It must be NULL.

Reserved for future use. It must be NULL.

174

• XML Export SDK C Programming Guide

KVOutputStream

KVOutputStream

This structure defines an output stream for the extracted sub file.

typedef struct tag_OutputStream

{

void *pOutputStreamPrivateData;

BOOL (pascal *fpCreate)(struct tag_OutputStream *,TCHAR *);

UINT (pascal *fpWrite) (struct tag_OutputStream *, BYTE *, UINT);

BOOL (pascal *fpSeek) (struct tag_OutputStream *, long, int);

long (pascal *fpTell) (struct tag_OutputStream *);

BOOL (pascal *fpClose) (struct tag_OutputStream *);

}

KVOutputStream;

Member Descriptions

All member functions are equivalent to their counterparts in the ANSI standard library.

XML Export SDK C Programming Guide

175

176

Chapter 7 File Extraction API Structures

KVSubFileExtractInfo

This structure contains information about an extracted sub file. It is initialized by calling fpExtractSubFile(). See

“fpExtractSubFile()” on page

. It is defined in kvxtract.h.

typedef struct tag_KVSubFileExtractInfo

{

KVStructHeader;

char *filePath;

char *fileName;

unsigned long infoFlag;

ADDOCINFO

}

docInfo;

KVSubFileExtractInfoRec, *KVSubFileExtractInfo;

Member Descriptions

KVStructHeader The KeyView version of the structure. See

“KVStructHead” on page

.

filePath The full path to which the sub file was extracted.

If the sub file is embedded in the main file as a link, this is the external path to the sub file.

If you output the data to a stream, the extraction path is not returned.

XML Export SDK C Programming Guide

KVSubFileExtractInfo fileName infoFlag docInfo

The original path and/or filename of the sub file.

If the sub file is embedded in the main file as a link, this is the external path to the sub file.

A bitwise flag providing additional information about the extracted sub file. The following flags are available:

KVSubFileExtractInfoFlag_NeedsExtraction —The file may contain sub files and should be extracted further.

KVSubFileExtractInfoFlag_FileCreated —The file was created on disk.

KVSubFileExtractInfoFlag_CharsetConverted —The sub file’s character set was converted.

KVSubFileExtractInfoFlag_External —The sub file is embedded in the main file as a link and is stored externally.

For example, the sub file may be an object that was embedded in a Word document using “Link to File,” or an attachment that is referenced in an MBX message. This type of file cannot be extracted. You must write code to access the sub file based on the path in the member filePath or fileName .

KVSubFileExtractInfoFlag_FolderCreated —A folder was created.

KVSubFileExtractInfoFlag_NonFormattedBodyExtra cted —Indicates that a plain text version of the message was extracted due to an error extracting the formatted version of the message.

The file’s major format (such as Microsoft Word or Corel

Presentation) as defined by the structure ADDOCINFO. See

“ADDOCINFO” on page .

If you output the data to a stream, the file format is not returned.

XML Export SDK C Programming Guide

177

Chapter 7 File Extraction API Structures

178

KVSubFileInfo

This structure contains information about a sub file in a container file. It is initialized by calling fpGetSubFileInfo(). See

“fpGetSubFileInfo()” on page . It is defined in kvxtract.h.

typedef struct tag_KVSubFileInfo

{

KVStructHeader;

char *subFileName;

int subFileType;

long subFileSize;

unsigned long infoFlag;

KVCharSet charset;

int isMSBLSB;

BYTE fileTime[8];

int parentIndex;

int childCount;

int *childArray;

}

KVContainerSubFileInfoRec, *KVSubFileInfo;

Member Descriptions

KVStructHeader The KeyView version of the structure. See

“KVStructHead” on page .

subFileName subFileType

The path and/or file name of the sub file.

If the sub file is the body text of a mail file or is an embedded OLE object, KeyView provides a default filename. See

“Default

Filenames for Extracted Sub Files” on page .

The sub file’s position in the container file’s hierarchy. The following options are available:

KVSubFileType_Main —The sub file is at the top level of the

main file. This is the default sub file type. See “Discussion”

below.

KVSubFileType_Attachment —The sub file is an attachment in a file.

KVSubFileType_OLE —The sub file is an embedded OLE object in a compound document.

KVSubFileType_Folder —The sub file is a folder or the

artificial root node (see “Create a Root Node” on page ).

XML Export SDK C Programming Guide

KVSubFileInfo subFileSize infoFlag charset isMSBLSB fileTime

The size of the sub file in bytes. This information may be useful if you do not want to extract very large files.

This value is approximate and is the maximum size of the sub file.

The sub file is usually smaller than this value when it is extracted.

A bitwise flag providing additional information about the sub file.

The following flags are available:

KVSubFileInfoFlag_NeedsExtraction —The sub file may contain sub files. It must be extracted further to conclusively determine whether it contains sub files.

KVSubFileInfoFlag_Secure —The sub file is secured and credentials (such as user name and password) are required to extract it. This flag applies to ZIP, RAR, and PDF files only.

KVSubFileInfoFlag_SMIME —The sub file is S/

MIME-encrypted and credentials are required to extract it. This applies to .eml and .pst files only.

KVSubFileInfoFlag_External —The sub file is embedded in the main file as a link and is stored externally. For example, the sub file may be an object that was embedded in a Word document using “Link to File,” or an attachment that is referenced in an MBX message. This type of file cannot be extracted. You must write code to access the sub file based on the path in the member subFileName.

KVSubFileInfoFlag_MailItem —When the sub file type is

KVSubFileType_Attachment , this indicates the attachment is a mail item. This flag applies to PST, MSG, and NSF files only.

If the sub file is not an attachment, this is the character set of the sub file. If the sub file is an attachment, the character set is

KVCS_UNKNOWN .

This flag indicates whether the byte order for Unicode text is Big

Endian (MSBLSB) or Little Endian (LSBMSB).

When the sub file is a mail message, this is the file’s Sent time.

Otherwise, it is the last modified time. The file time is not available for the following file types:

 EML attachments

 OLE objects in a Microsoft Office document

XML Export SDK C Programming Guide

179

180

Chapter 7 File Extraction API Structures parentIndex childCount childArray

The index number of this file’s parent. For example, this may be the index of a folder in which the sub file is stored, or file to which the sub file is attached. If a file does not have a parent, the parentIndex is -1.

The number of first-level children in the sub file.

Pointer to an array of first-level children in the sub file.

Discussion

 The KVSubFileType_Main type applies to the following for each file format:

File format

MSG and EML

Zip files

PST files

MBX files

NSF files

PDF files

KVSubFileType_Main applies to...

the message body.

a file inside the archive.

an item that is not an attachment, an OLE object, or a root node.

a message in the MBX file.

an item that is not an attachment, an OLE object, or a root node.

an item that is not an attachment or a root node.

 If the flag KVSubFileInfoFlag_NeedsExtraction is set, open the sub file and extract its children. See

“fpOpenFile()” on page and

“fpExtractSubFile()” on page .

 The members parentIndex and childArray provide information about the sub file’s parent and children. This information can be used to recreate the file hierarchy on extraction. Since childArray only retrieves the first-level children in the sub file, you must call fpGetSubFileInfo() repeatedly until information for the leaf-node children is extracted. See

“Recreate a File’s

Hierarchy” on page .

XML Export SDK C Programming Guide

KVSubFileMetaData

KVSubFileMetaData

This structure contains a count of the number of metadata elements extracted from a mail file, and a pointer to the first element of the array of elements. It is initialized by calling fpGetSubFileMetadata(). See

“fpGetSubFileMetaData()” on page . It is defined in kvxtract.h.

typedef struct tag_KVSubFileMetaData

{

KVStructHeader;

int nElem;

KVMetadataElem**

unsigned long

ppElem;

infoFlag;

}

KVSubFileMetaDataRec, *KVSubFileMetaData;

Member Descriptions

KVStructHeader

The KeyView version of the structure. See “KVStructHead” on page .

nElem The number of metadata fields contained in the array. ppElem infoFlag

Pointer to an array of pointers that are the memory addresses of metadata field values in the structure KVMetadataElem. See

“KVMetadataElem” on page

.

A bitwise flag defining additional properties of the extracted metadata. The following flag is available:

KVSubFileMetaInfoFlag_CharsetConverted —Indicates the metadata’s character set was converted.

XML Export SDK C Programming Guide

181

Chapter 7 File Extraction API Structures

182

• XML Export SDK C Programming Guide

C HAPTER 8

XML Export API Functions

This section describes the functions in the XML Export API. These functions manage the input and output streams, and perform the document conversion.

Each function appears as a function prototype followed by a description of its arguments, return value, and discussion of its use. This section contains the following topics:

KVXMLGetInterface()

fpConvertStream()

fpFileToInputStreamCreate()

fpFileToInputStreamFree()

fpFileToOutputStreamCreate()

fpFileToOutputStreamFree()

fpGetAnchor()

fpGetConvertFileList()

fpGetStreamInfo()

fpGetSummaryInfo()

fpInit()

fpSetStyleMapping()

fpShutDown()

fpValidateTemplate()

XML Export SDK C Programming Guide

183

Chapter 8 XML Export API Functions

KVXMLConfig()

KVXMLConvertFile()

KVXMLEndOOPSession()

KVXMLSetStyleSheet()

KVXMLStartOOPSession()

184

• XML Export SDK C Programming Guide

KVXMLGetInterface()

KVXMLGetInterface()

This function is exported by the Export definition file. It supplies function pointers to other Export functions. When KVXMLGetInterface() is called, it assigns the function pointers in the structure

KVXMLInterface

to other functions described in this chapter. For example, KVXMLInterface.fpInit

is assigned to point to

KVXMLInit()

.

Syntax void pascal KVXMLGetInterface (KVXMLInterface *pInterface);

Arguments pInterface Pointer to the structure KVXMLInterface. See

“KVXMLInterface” on page .

Returns

None.

Discussion

 One of the initial steps in using the XML Export API is to create an instance of a

KVXMLInterface

structure and use this function to gain access to other functions.

 The functions can be called directly. For example, you can call

KVXMLGetSummaryInfo() instead of using fpGetSummaryInfo() in

KVXMLInterface

. However, it is recommended that you assign the function pointers in KVXMLInterface to the functions for efficiency.

XML Export SDK C Programming Guide

185

186

Chapter 8 XML Export API Functions

fpConvertStream()

This function converts either a source stream or file to an output stream.

Syntax

BOOL pascal fpConvertStream(

void *pContext,

void *pCallingContext,

KVInputStream *pInput,

KVOutputStream

KVXMLTemplate

*pOutput,

*pTemplates,

KVXMLOptions *pOptions,

KVXMLTOCOptions *pTOCCreateOptions,

KVXMLCallbacks

BOOL

*pCallbacks,

bIndex,

KVErrorCode *pError );

Arguments pContext pCallingContext pInput pOutput pTemplates

Pointer returned from fpInit().

Pointer passed back to the callback functions.

Pointer to the developer-assigned instance of

KVInputStream . The structure KVInputStream defines the input stream containing the source for the conversion. See

“KVInputStream” on page .

Pointer to the developer-assigned instance of

KVOutputStream . The structure KVOutputStream defines the output stream to which Export writes the generated XML.

See

“KVOutputStream” on page .

Pointer to the data structure KVXMLTemplate. It defines the overall structure of the output. Individual elements within the structure define the markup written at specific points in the

output stream. See “KVXMLTemplate” on page .

If this pointer is NULL, the default values for the structure are used.

XML Export SDK C Programming Guide

fpConvertStream() pOptions pCallbacks

Pointer to the data structure KVXMLOptions. It defines the options that control the markup written in response to the general style and attributes (font, color, and so on) of the document. See

“KVXMLOptions” on page

.

If this pointer is NULL, the default values for the structure are used. pTOCCreateOptions Pointer to the data structure KVXMLTOCOptions. It specifies whether a heading is included in the table of contents. See

“KVXMLTOCOptions” on page

.

If this pointer is NULL, the default values for the structure are used.

Pointer to the data structure KVXMLCallbacks. It is a structure of functions that Export calls for specific, user-defined purposes. See

“KVXMLCallbacks” on page .

If callbacks are not used, then this can be NULL. bIndex pError

Set this to TRUE to generate output with minimal markup and without images. Since the generated output is minimized to textual content, it is suitable for an indexing engine. If bIndex is set to FALSE, embedded images in a document are regenerated as separate files and stored in the output directory.

This can be set through the bIndexOnly member of the structure KVXMLOptions. See

“KVXMLOptions” on page

.

To generate output with verbose markup and without images, set the nType argument of the function KVXMLConfig() to

KVCFG_SUPPRESSIMAGES

. See “KVXMLSetStyleSheet()” on page

.

Pointer to an error code if the call to fpConvertStream() fails.

Returns

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE.

Discussion

 Only pContext

, pInput

, pOutput

, and bIndex

are required. All other pointers should be NULL when they are not set.

XML Export SDK C Programming Guide

187

188

Chapter 8 XML Export API Functions

 If pCallbacks

is NULL, pOptions->pszDefaultOutputDirectory must be valid, except when bIndex is set to TRUE.

This function runs in-process or out of process. See

Process” on page

.

“Convert Files Out of

 When converting out of process, this function must be called after the call to

KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession() .

See

“KVXMLStartOOPSession()” on page and

“KVXMLEndOOPSession()” on page .

 When converting out of process, the values for the KVXMLTemplate,

KVXMLOptions , and KVXMLTOCOptions structures should be set to NULL.

These structures are already passed in the call to

KVXMLStartOOPSession()

.

See

“KVXMLStartOOPSession()” on page .

Example

The following sample code is from the cnv2xml sample program: if(!(*KVXMLInt.fpConvertStream)(

pKVXML,

NULL,

/* Pointer returned by fpInit() */

/* Pointer for callback functions */

&Input,

&Output,

/* Input stream

/* Output stream

*/

*/

NULL, /* Mark-up and related variables

&XMLOptions, /* Options

*/

*/

NULL,

NULL,

/* TOC options */

/* Pointer to callback functions */

FALSE, /* Index mode

&error))

*/

/* Error return value */

{

printf("Error converting %s to XML %d\n", argv[i - 1], error);

} else

{

printf("Conversion of %s to XML completed.\n\n", argv[i - 1]);

}

XML Export SDK C Programming Guide

fpFileToInputStreamCreate()

fpFileToInputStreamCreate()

This function creates an input stream from an input file.

Syntax

BOOL pascal _export fpFileToInputStreamCreate(

void *pContext,

char *pszFileName,

KVInputStream *pInput);

Arguments pContext pszFileName pInput

Pointer returned from fpInit().

Pointer to the name of the input file to be converted.

Pointer to the developer-assigned instance of KVInputStream.

The structure KVInputStream defines the input stream

containing the source for the conversion. See “KVInputStream” on page .

Returns

Discussion

After the conversion is complete, call fpFileToInputStreamFree()

to free the memory allocated by this function.

Example

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE. Processing is halted.

The following sample code is from the cnv2xml sample program: if(!(*KVXMLInt.fpFileToInputStreamCreate)(pKVXML, argv[i++],

&Input))

{

printf("Error creating input stream\n");

(*KVXMLInt.fpShutDown)(pKVXML);

mpFreeLibrary(hKVXML);

return (5);

}

XML Export SDK C Programming Guide

189

190

Chapter 8 XML Export API Functions

fpFileToInputStreamFree()

This function frees the memory used to create an input stream.

Syntax

BOOL pascal _export fpFileToInputStreamFree(

void *pContext,

KVInputStream *pInput);

Arguments pContext pInput

Pointer returned from fpInit().

Pointer to the developer-assigned instance of KVInputStream.

The structure KVInputStream defines the input stream containing

the source for the conversion. See “KVInputStream” on page .

Returns

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE. Processing is halted.

Discussion

After the conversion is complete, call this function to free the memory allocated by fpFileToInputStreamCreate() .

XML Export SDK C Programming Guide

fpFileToOutputStreamCreate()

fpFileToOutputStreamCreate()

This function creates an output stream from an output file.

Syntax

BOOL pascal _export fpFileToOutputStreamCreate(

void *pContext,

char *pszFileName,

KVOutputStream *pOutput );

Arguments pContext pszFileName pOutput

Pointer returned from fpInit().

Pointer to the name of the output file to be created.

Pointer to the developer-assigned instance of

KVOutputStream . The structure KVOutputStream defines the output stream to which Export writes the generated XML.

See

“KVOutputStream” on page .

Returns

Discussion

After the conversion is complete, call fpFileToOutputStreamFree()

to free the memory allocated by this function.

Example

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE. Processing is halted.

The following sample code is from the cnv2xml sample program: if (!(*KVXMLInt.fpFileToOutputStreamCreate)(pKVXML, argv[i],

&Output))

{

printf("Error creating output stream\n");

(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);

(*KVXMLInt.fpShutDown)(pKVXML);

mpFreeLibrary(hKVXML);

return 6;

}

XML Export SDK C Programming Guide

191

192

Chapter 8 XML Export API Functions

fpFileToOutputStreamFree()

This function frees the memory used to create the output stream.

Syntax

BOOL pascal _export fpFileToOutputStreamFree(

void *pContext,

KVOutputStream *pOutput );

Arguments pContext pOutput

Pointer returned from fpInit().

Pointer to the developer-assigned instance of

KVOutputStream . The structure KVOutputStream defines the output stream to which Export writes the generated XML.

See “KVOutputStream” on page .

Returns

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE. Processing is halted.

Discussion

After the conversion is complete, call this function to free the memory allocated by fpFileToOutputStreamCreate() .

XML Export SDK C Programming Guide

fpGetAnchor()

fpGetAnchor()

This function gets the filename automatically generated by Export and used for external graphics referenced with <a xmlns:xlink= xlink href=> tags and for heading-level table of contents entries.

Syntax

BOOL pascal fpGetAnchor(

void *pCallingContext,

KVXMLAnchorType

char

eAnchorType,

*pszAnchor,

int cbAnchorMax,

BYTE

UINT

*pcHTML,

cbHTML);

Arguments pCallingContext Pointer passed back to the callback functions.

eAnchorType Graphic or block anchor type for the output stream. It must be one of the enumerated types defined in KVXMLAnchorType.

See “KVXMLAnchorType” on page

.

pszAnchor cbAnchorMax pcHTML cbHTML

Pointer to the location in which the new anchor is stored.

Maximum number of bytes to place in pszAnchor.

Pointer to either the markup defining the contents of the table of contents entry, a pointer to the external graphic name, or NULL.

Number of valid bytes in pcHTML.

Returns

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE. Processing is halted.

Discussion

 pszAnchor must be assigned. It may be derived from the cbAnchorMax

, pcHTML , and cbHTML values that are also provided.

 pcHTML

may be NULL if the graphic is an internal part of the document.

XML Export SDK C Programming Guide

193

Chapter 8 XML Export API Functions

 This function is exposed so that it may be called from the

GetAnchor() callback function to obtain default behavior for anchor types the callback is not set to handle.

194

• XML Export SDK C Programming Guide

fpGetConvertFileList()

fpGetConvertFileList()

This function gets the list of files automatically converted to XML during a call to fpConvertStream() or KVXMLConvertFile() .

Syntax char ** pascal _export fpGetConvertFileList(

void *pContext,

int *pnSize );

Arguments pContext pnSize

Pointer returned from fpInit().

Pointer to the number of files generated by the conversion.

Returns

If no files are converted, the return value is a NULL pointer. Otherwise, the return value is a pointer to an array of strings that provides the available path information for each converted file.

Discussion

 The array of file path information includes all externally generated files, including graphic files. Note that the main output file is not included in the array, nor in the count of the number of files converted.

The memory used by the array of file path information is freed by the API.

The array is not valid after a call to fpShutDown()

.

This function runs in-process or out of process. See

Process” on page

.

“Convert Files Out of

 When converting out of process, this function must be called after the call to

KVXMLStartOOPSession()

and before the call to

KVXMLEndOOPSession()

.

See

“KVXMLStartOOPSession()” on page and

“KVXMLEndOOPSession()” on page .

XML Export SDK C Programming Guide

195

196

Chapter 8 XML Export API Functions

fpGetStreamInfo()

This function extracts file format information and character set from the source document.

Syntax

BOOL pascal _export fpGetStreamInfo (

void *pContext,

KVInputStream

KVStreamInfo

*pInput,

*pStreamInfo );

Arguments pContext pInput pStreamInfo

Pointer returned from fpInit().

Pointer to the developer-assigned instance of KVInputStream.

The structure KVInputStream defines the input stream containing

the source for the conversion. See “KVInputStream” on page .

Pointer to the developer-assigned instance of KVStreamInfo. The structure KVStreamInfo defines the input stream document type and character set. See

“KVStreamInfo” on page .

You can examine the fields in the structure to determine the appropriate template to use based on the document type.

Returns

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE.

XML Export SDK C Programming Guide

fpGetSummaryInfo()

fpGetSummaryInfo()

This function extracts all metadata from the input stream. See “Extract Metadata” on page for more information.

Syntax

BOOL pascal _export fpGetSummaryInfo(

void *pContext,

KVInputStream *pInput,

KVSummaryInfoEx *pSummary,

BOOL bFree );

Arguments pContext pInput pSummary bFree

Pointer returned from fpInit().

Pointer to the developer-assigned instance of

KVInputStream . The KVInputStream structure points to the input stream containing the source for the conversion. See

“KVInputStream” on page

.

Points to the developer-assigned instance of

KVSummaryInfoEx

. See “KVSummaryInfoEx” on page .

In this structure, nElem provides a count of the number of metadata elements, and pElem points to the first element of the array of individual elements as defined by the structure

KVSumInfoElemEx

. See “KVSumInfoElemEx” on page

.

Flag to free or fill the memory allocated to the document metadata.

Returns

 If the call is successful, the return value is TRUE. When the document does

not contain metadata, but the document reader can extract metadata from the specified format, then this function returns TRUE with nElem

set to 0.

 If this call is unsuccessful, the return value is FALSE. This function returns

FALSE when the document reader does not support metadata extraction for the specified format, or there is an error in extraction. The section

“Supported

Formats” on page

lists the file formats for which metadata can be determined.

XML Export SDK C Programming Guide

197

Chapter 8 XML Export API Functions

Discussion

 For metadata to be extracted by Export, metadata must be defined in the source document, and the document reader must be able to extract metadata

for the file format. The section “Supported Formats” on page

lists the file formats for which metadata can be determined. Export does not generate metadata automatically from the document contents.

This function runs in-process or out of process. See “Convert Files Out of

Process” on page

.

This function may be called any time after the call to KVXMLInit() .

When converting out of process, this function must be called after the call to

KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession() .

See

“KVXMLStartOOPSession()” on page and

“KVXMLEndOOPSession()” on page .

 Call this function with bFree

set to FALSE to return an array of

KVSummaryInfoEx structures, each containing an element of available document metadata.

 After processing the information in the structure, call this function with bFree set to TRUE to free the memory allocated to the document metadata.

198

• XML Export SDK C Programming Guide

fpInit()

fpInit()

This function initializes an Export session. Its return value, pContext

, is passed as the first parameter to the File Extraction interface and all other Export functions.

Syntax

Arguments void* pascal _export fpInit(

KVMemoryStream *pMemAllocator,

char

char

*pszKeyViewDir,

*pszDataFile,

KVErrorCode

DWORD

*pError,

dWord); pMemAllocator Pointer to a developer-defined memory allocator. If NULL is passed, then the default C run-time memory allocation is used.

pszKeyViewDir Pointer to the directory where the Export components are located.

This is normally the directory install \ OS \bin , where install is the pathname of the Export installation directory and OS is the name of the operating system.

pszDataFile Pointer to the directory and filename of the Export data file, formats_e.ini

. This file determines whether a format is supported. If a format does not exist in this file, the conversion fails.

The formats_e.ini file is normally stored in the directory install \ OS \bin , where install is the pathname of the Export installation directory and OS is the name of the operating system.

See

“File Format Detection” on page

for more information. pError dWord

Pointer to an error code defined in KVErrorCode or

KVErrorCodeEx in kvtypes.h. See

“KVErrorCode” on page

and

“KVErrorCodeEx” on page .

Reserved. Must be 0.

Returns

 If the call is successful, the return value is a pointer passed to all other functions.

 If the call is unsuccessful, the return value is a NULL pointer.

XML Export SDK C Programming Guide

199

200

Chapter 8 XML Export API Functions

Discussion

 If pszKeyViewDir

is NULL, the required components cannot be found. Ensure it is valid.

 If this function returns NULL, check stderr

for the KeyView installation error messages, “ KeyView Export SDK License Key has Expired ” and

KeyView Export SDK License Key is Invalid

”, and pass them to your application. See the Export SDK Installation Instructions for more information on the KeyView license feature.

 To ensure multi-threaded conversions are thread-safe, you must create a unique context pointer for every thread by calling fpInit()

. In addition, threads must not share context pointers, and the same context pointer must be used for all API calls in the same thread. Creating a context pointer for every thread does not affect performance because the context pointer uses minimal resources.

 When the conversion context is no longer required, it should be terminated by calling fpShutdown()

. See “fpShutDown()” on page

.

Example

The following sample code is from the cnv2xml sample program: pKVXML = (*KVXMLInt.fpInit)(NULL, ".", NULL, &error, 0); if(!pKVXML)

{

printf("Error initializing KVXML: %d\n", error);

mpFreeLibrary(hKVXML);

return 4;

}

XML Export SDK C Programming Guide

fpSetStyleMapping()

fpSetStyleMapping()

This function is used to set the mapping for user-defined styles. Export does not make a distinction between paragraph styles or character styles, but operates under the assumption that each style has a unique name.

Syntax

BOOL pascal _export fpSetStyleMapping(

void *pContext,

KVStyle *pStyles,

int iStyles,

BOOL bCopy);

Arguments pContext pStyles iStyles bCopy

Pointer returned from fpInit().

Pointer to the developer-assigned instance of KVStyle. See

“KVStyle” on page . The KVStyle structure defines the

elements of a custom style.

Number of elements in the pStyles array.

If Export is to allocate memory to copy the pStyles array, set this to TRUE. If pStyles remains valid throughout the conversion process, set this to FALSE.

Returns

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE.

Discussion

 Paragraph styles are presently implemented only for documents in Microsoft

Word, RTF, Folio Flat files, WordPro, and WordPerfect 6.x.

This function runs in-process or out of process. See

Process” on page

.

“Convert Files Out of

 When converting out of process, this function must be called after the call to

KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession() .

See

“KVXMLStartOOPSession()” on page and

“KVXMLEndOOPSession()” on page .

XML Export SDK C Programming Guide

201

Chapter 8 XML Export API Functions

 Once this API function is called, the styles are valid until fpShutDown()

is called, or until this function is called again with a new style or NULL.

202

• XML Export SDK C Programming Guide

fpShutDown()

fpShutDown()

This function terminates an Export session that was initialized by fpInit()

, and frees allocated system resources. It is called when the conversion context is no longer required.

Syntax void pascal _export fpShutDown(KVXMLContext *pContext);

Arguments

Pointer returned from fpInit().

pContext

Returns

None.

Discussion

After this function is called, the pContext pointer must not be passed to any XML

Export API.

XML Export SDK C Programming Guide

203

Chapter 8 XML Export API Functions

fpValidateTemplate()

This function is used to ensure that the markup is well-formed and valid according to the DTD. It is currently not implemented.

204

• XML Export SDK C Programming Guide

KVXMLConfig()

KVXMLConfig()

This function is called directly and provides a way to configure options prior to the document conversion. Currently, the function is used for the following configurations:

 Generate output without images

Generate output with verbose markup and without images. To generate output with minimal markup (ID and style paragraph attributes) and without images, set the bIndexOnly

member of the structure

KVXMLOptions

. See

“KVXMLOptions” on page .

Enable PDF position information

Include position information in the markup generated for a PDF document.

Configure PDF bookmarks

Specify whether bookmarks in a PDF file are converted to simple XLinks in the

XML output.

Configure Word bookmarks

Disable the conversion of Microsoft Word bookmarks to zone elements.

Designate temporary directory

Specify a directory in which temporary files created during XML conversion processes are stored.

NOTE On Windows systems, there is a 64K size limit to the temp directory. Once the limit is reached, you must either create a new directory or delete the contents of the existing directory; otherwise, you may receive an error message.

 Configure XML conversion

Specify the elements and attributes extracted from an XML document based on the files document type.

Enable PDF logical reading order

Convert paragraphs in PDF files in the order in which they appear on the page and with left-to-right or right-to-left paragraph direction. See

“Convert PDF

Files to a Logical Reading Order” on page .

Configure PDF soft hyphens

XML Export SDK C Programming Guide

205

Chapter 8 XML Export API Functions

206

Specify whether soft hyphens are removed from the XML output. See “Control

Hyphenation” on page .

Enable Revision Marks

Converts text and graphics that were deleted from a document with revision tracking enabled and includes revision tracking information in the XML output.

See

“Convert Revision Tracking Information” on page .

Protected file password

Specifies the password to use to open a password-protected file for export.

Syntax

Arguments

KVErrorCode pascal KVXMLConfig(

void *pContext,

int nType,

int nValue,

void *p ); pContext Pointer returned from fpInit().

nType The configuration flag. This is a symbolic constant defined in kvtypes.h

. The available options are described in

“Configuration

Flags” on page .

nValue p

Integer value defined for the flags above.

This is TRUE or FALSE for all flags except KVCFG_LOGICALPDF,

KVCFG_SETTEMPDIRECTORY , and KVCFG_SETXMLCONFIGINFO.

For KVCFG_LOGICALPDF, this is one of the paragraph direction options defined in the LPDF_DIRECTION enumerated type in kvtypes.h. See

“LPDF_DIRECTION” on page

.

For KVCFG_SETTEMPDIRECTORY and KVCFG_SETXMLCONFIGINFO, this is not set.

The data for the configuration flag.

This is NULL for all flags except KVCFG_SETTEMPDIRECTORY and

KVCFG_SETXMLCONFIGINFO .

For KVCFG_SETTEMPDIRECTORY, this is path to the directory where temporary files are stored.

For KVCFG_SETXMLCONFIGINFO, this is a pointer to the

KVXConfigInfo structure. See

“KVXConfigInfo” on page .

For KVCFG_SETPASSWORD, this is the source file password.

XML Export SDK C Programming Guide

KVXMLConfig()

Configuration Flags

The following flags are available for the nType argument in KVXMLConfig() .

These flags are defined in kvtypes.h

.

Flag

KVCFG_SUPPRESSIMAGES

KVCFG_ENABLEPOSITIONINFO

KVCFG_SUPPRESSTOCPRINTIMAGE

KVCFG_DISABLEZONE

Description

If KVCFG_SUPPRESSIMAGES is set, the XML output includes verbose markup, but no images. If this option is not set, then embedded images in a document are regenerated as separate files and stored in the output directory. To generate output with minimal markup (ID and style paragraph attributes) and without images, set the bIndexOnly member of the structure

KVXMLOptions to TRUE. See

“KVXMLOptions” on page .

If KVCFG_ENABLEPOSITIONINFO is set, then a position element is included in the markup for PDF documents. The position element defines the absolute position of the text relative to the bottom left corner of the page, and includes additional information such as font and color.

If the flag KVCFG_SUPPRESSTOCPRINTIMAGE is set, then bookmarks in a PDF file are not converted to simple XLinks in the XML output. By default, PDF bookmarks are converted to source and destination anchors. For example,

<a xmlns:xlink="http://www.w3.org/TR/xlink" xlink:href="#bmk1">Highlight File Format</a>

<a xmlns:xlink="http://www.w3.org/TR/xlink" name="bmk1"/><img src="pdf14640.jpg"/>

If the flag KVCFG_DISABLEZONE is set, the conversion of

Microsoft Word bookmarks to zone elements (<zone name

=“xxx” >) in the output XML is disabled.

A bookmark in Microsoft Word documents is a name given to a selected area of the document. The bookmark may enclose words, paragraphs, tables, table cells, lists, list items or the entire document. In XML Export, bookmarks are converted to zone elements (<Zone name="xxx">) using the KeyView

KVT_ZONE token.

Depending on how bookmarks are defined in the original document, the creation of zone elements may result in malformed XML. In this case, you can disable zone creation to avoid these validity errors. Zone element creation is enabled by default.

XML Export SDK C Programming Guide

207

208

Chapter 8 XML Export API Functions

Flag

KVCFG_SETTEMPDIRECTORY

KVCFG_SETXMLCONFIGINFO

KVCFG_LOGICALPDF

KVCFG_DELSOFTHYPHEN

Description

The flag KVCFG_SETTEMPDIRECTORY enables you to specify the directory in which temporary files created during conversion processes are stored. By default, the system temporary directory is used.

To define a directory for temporary files generated during an out-of-process conversion, set the tempfilepath parameter in the formats_e.ini file. See

“Convert Files Out of Process” on page

.

NOTE: On Windows systems, there is a 64K size limit to the temp directory. Once the limit is reached, you must either create a new directory or delete the contents of the existing directory; otherwise, you may receive an error message.

The flag KVCFG_SETXMLCONFIGINFO enables you to define which elements and attributes are extracted from XML documents with a specified format ID or root element. This can be used to override the default settings for the supported XML formats (see

“Convert XML Files” on page ), or to define

settings for custom XML document types.

The settings are defined in the KVXConfigInfo structure (see

“KVXConfigInfo” on page

). To set custom settings for more than one document type, call the KVXMLConfig() function once for each type.

Element extraction settings can also be modified using the kvxconfig.ini

file. See

“Configure Element Extraction for

XML Documents” on page .

The flag KVCFG_LOGICALPDF converts paragraphs in a PDF file in the order in which they appear on the page (logical reading order). The nValue argument specifies the paragraph direction. See

“Convert PDF Files to a Logical Reading Order” on page

.

If the flag KVCFG_DELSOFTHYPHEN is set, soft hyphens in the source document are removed, and the hyphenated words are joined in the XML output. By default, soft hyphens are

maintained. See “Control Hyphenation” on page .

It is recommended you remove soft hyphens if you use Export to generate text output for an indexing engine or are not concerned with maintaining the document’s layout. See

“fpConvertStream()” on page or

“KVXMLConvertFile()” on page for more information on running Export in index mode.

XML Export SDK C Programming Guide

KVXMLConfig()

Flag Description

KVCFG_INCLREVISIONMARK

KVCFG_WP_NOCOMMENTS

KVCFG_PG_HIDEHIDDENSLIDE

KVCFG_PG_HIDECOMMENT

If this flag is set to TRUE, text and graphics that were deleted from a document with a revision tracking feature enabled is converted, and revision tracking information is included in the

XML output.

To reset the flag and exclude deleted content and revision tracking information from the XML output, set the flag to FALSE.

See

“Convert Revision Tracking Information” on page . The

default is FALSE.

Set to TRUE not to export text from comments in Microsoft Word documents. Comment text is exported by default from Microsoft

Word 97 to 2003 files.

Comment output can also be toggled by modifying the formats_e.ini

file. See

“Show Hidden Data” on page .

KVCFG_WP_SHOWHIDDENTEXT

KVCFG_WP_SHOWDATEFIELDCODE

KVCFG_SS_SHOWCOMMENTS

KVCFG_SS_SHOWFORMULA

Set to TRUE to export hidden text from Microsoft Word documents.

Set to TRUE to export date field codes from Microsoft Word documents.

KVCFG_WP_SHOWFILENAMEFIELDCODE Set to TRUE to export the file name field code from Microsoft

Word documents.

KVCFG_SS_SHOWHIDDENINFOR Set to TRUE to export hidden information from Microsoft Excel files.

Set to TRUE to export comments from Microsoft Excel files.

Set to TRUE to export formulas from Microsoft Excel files.

Set to TRUE not to export hidden slides from Microsoft

PowerPoint files.

Set to TRUE not to export comments from Microsoft PowerPoint files. Comments are exported by default from PowerPoint 97 to

2000 files.

XML Export SDK C Programming Guide

209

Chapter 8 XML Export API Functions

210

Flag

KVCFG_PG_SHOWCOMMENTSSLIDE

KVCFG_PG_SHOWSLIDNOTES

KVCFG_SETPASSWORD

Description

Set to TRUE to export comments slides from Microsoft

PowerPoint 2003 and 2007 files.

Set to TRUE to export slide notes from Microsoft PowerPoint files.

Slide note output can also be toggled by modifying the formats_e.ini

file. See

“Show Hidden Data” on page .

This flag enables you to define a password used to open a password-protected file for export. See

“Export Password

Protected Files” on page .

nValue is TRUE.

p is the source file password, which can have a maximum length of 255 characters (the final byte is null).

Returns

The return value is one of the error codes defined in

KVErrorCode

in kvtypes.h

.

Discussion

 This function must be called after the call to fpInit() and before the call to fpConvertStream()

or

KVXMLConvertFile()

.

This function runs in-process or out of process. See

Process” on page

.

“Convert Files Out of

 When converting out of process, this function must be called after the call to

KVXMLStartOOPSession()

and before the call to

KVXMLEndOOPSession()

.

See

“KVXMLStartOOPSession()” on page and

“KVXMLEndOOPSession()” on page .

Examples

 To generate verbose markup, but no images:

(*fpXMLConfig)(pKVXML, KVCFG_SUPPRESSIMAGES, TRUE, NULL);

To specify bookmarks in a PDF file are not converted to XLinks in the XML output:

(*fpXMLConfig)(pKVXML, KVCFG_SUPPRESSTOCPRINTIMAGE, TRUE,

NULL);

To disable the conversion of zone elements:

XML Export SDK C Programming Guide

KVXMLConfig()

(*fpXMLConfig)(pKVXML, KVCFG_DISABLEZONE, TRUE, NULL);

To set a directory for temporary files:

 char tmpDir[250]; strcpy (tmpDir, "c:\\temp\\xmlexport");

(*fpXMLConfig)(pKVXML, KVCFG_SETTEMPDIRECTORY, 0, tmpDir);

To specify custom extraction settings for conversion of an XML file:

KVXConfigInfo xinfo; /* populate xinfo */

(*fpXMLConfig)(pKVXML, KVCFG_SETXMLCONFIGINFO, 0, &xinfo);

To specify PDF files are converted to a logical reading order, and the paragraph direction for the PDF output is left to right:

(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_LTR, NULL);

To specify PDF files are converted to a logical reading order, and the paragraph direction for the PDF output is right to left:

(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_RTL, NULL);

To specify PDF files are converted to a logical reading order, and the paragraph direction for the PDF output is determined on the fly for each page:

(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_AUTO, NULL);

To specify soft hyphens are removed from the XML output:

(*fpXMLConfig)(pKVXML, KVCFG_DELSOFTHYPHEN, TRUE, NULL);

To convert text and graphics that are identified by revison marks:

(*fpXMLConfig)(pKVXML, KVCFG_INCLREVISIOMARK, TRUE, NULL);

To toggle hidden data output from Microsoft Word documents, use one of the

KVCFG_WP flags:

(*fpXMLConfig)(pKVXML, KVCFG_WP_NOCOMMENTS, TRUE, NULL);

To toggle hidden data output from Microsoft Excel documents, use one of the

KVCFG_SS flags:

(*fpXMLConfig)(pKVXML, KVCFG_SS_SHOWHIDDENINFOR, TRUE, NULL);

To toggle hidden data output from Microsoft PowerPoint documents, use one of the KVCFG_PG flags:

(*fpXMLConfig)(pKVXML, KVCFG_PG_HIDEHIDDENSLIDE, TRUE, NULL);

To specify a password to open a password-protected file for export:

(*fpXMLConfig)(pKVXML, KVCFG_SETPASSWORD, TRUE, password); where password is a null-terminated string of 255 or fewer characters.

XML Export SDK C Programming Guide

211

Chapter 8 XML Export API Functions

212

 To include a position element in the markup for PDF documents:

(*fpXMLConfig)(pKVXML, KVCFG_ENABLEPOSITIONINFO, TRUE, NULL);

Using the PDF position element significantly changes the generated markup.

For example, without the option, the XML output from a section of a PDF document looks like this:

<?xml version="1.0" encoding="utf-8" ?>

<!DOCTYPE VerityXMLExport (View Source for full doctype...)>

- <VerityXMLExport>

- <WP>

- <p id="p1" font-size="33pt">

<img src="ecpe.pdf38760.jpg" height="140px" width="292px" />

Economic Fiscal Update

<font size="18pt" color="#777777">Theand</font>

<font size="14pt" color="#ffffff">October 30, 2002</font>

<font size="29pt" color="#a4a4a4">Overview</font>

</p>

With the option enabled, the same section of the PDF document looks like this:

<?xml version="1.0" encoding="utf-8" ?>

<!DOCTYPE VerityXMLExport (View Source for full doctype...)>

- <VerityXMLExport>

- <WP>

<Position style="position:absolute;top:534px;left:254px;font-family:'Times New

Roman';font-size:33pt;white-space:nowrap;" />

<Position style="position:absolute;top:393px;left:254px;white-space:nowrap;" />

<img src="ecpe.pdf36000.jpg" height="140px" width="292px" />

<Position style="position:absolute;top:308px;left:256px;font-family:'Times New

Roman';font-size:33pt;white-space:nowrap;" />

Economic

<Position style="position:absolute;top:346px;left:256px;font-family:'Times New

Roman';font-size:33pt;white-space:nowrap;" />

Fiscal Update

<Position style="position:absolute;top:298px;left:281px;font-family:'Times New

Roman';font-size:18pt;color:#777777;background-color:#ffffff;white-space:nowrap;"

/>

The

<Position style="position:absolute;top:336px;left:299px;font-family:'Times New

Roman';font-size:18pt;color:#777777;background-color:#ffffff;white-space:nowrap;"

/>

and

<Position style="position:absolute;top:543px;left:397px;font-family:'Times New

Roman';font-size:14pt;color:#ffffff;background-color:#000000;white-space:nowrap;"

/>

October 30, 2004

XML Export SDK C Programming Guide

KVXMLConfig()

<Position style="position:absolute;top:627px;left:382px;font-family:'Times New

Roman';font-size:29pt;color:#a4a4a4;background-color:#ffffff;white-space:nowrap;"

/>

Overview

XML Export SDK C Programming Guide

213

214

Chapter 8 XML Export API Functions

KVXMLConvertFile()

This function is called directly and converts a source file to an output file.

Syntax

BOOL pascal KVXMLConvertFile (

void *pContext,

void

char

*pCallingContext,

*pInFileName,

char *pOutFileName,

KVXMLTemplate *pTemplates,

KVXMLOptions *pOptions,

KVXMLTOCOptions *pTOCCreateOptions,

KVXMLCallbacks

BOOL

*pCallbacks,

bIndex,

KVErrorCode *pError)

Arguments pContext pCallingContext pInFileName pOutFileName pTemplates pOptions

Pointer returned from fpInit().

Pointer passed back to the callback functions.

Pointer to the input file.

Pointer to the output file.

Pointer to the data structure KVXMLTemplate. It defines the overall structure of the output. Individual elements within the structure define the markup written at specific points in the

output stream. See “KVXMLTemplate” on page

.

If this pointer is NULL, the default values for the structure are used.

Pointer to the data structure KVXMLOptions. It defines the options that control the markup written in response to the general style and attributes (font, color, and so on) of the

document. See “KVXMLOptions” on page .

If this pointer is NULL, the default values for the structure are used.

XML Export SDK C Programming Guide

KVXMLConvertFile() pTOCCreateOptions Pointer to the data structure KVXMLTOCOptions. It specifies whether a heading is included in the table of contents. See

“KVXMLTOCOptions” on page .

If this pointer is NULL, the default values for the structure are used. pCallbacks Pointer to the data structure KVXMLCallbacks. It is a structure of functions that Export calls for specific, user-defined

purposes. See “KVXMLCallbacks” on page .

If callbacks are not used, then this can be NULL. bIndex pError

Set this to TRUE to generate output with minimal markup and without images. Since the generated output is minimized to textual content, it is suitable for an indexing engine. If bIndex is set to FALSE, embedded images in a document are regenerated as separate files and stored in the output directory.

This can also be set through the bNoPictures member in the template files.

Pointer to an error code if the call to KVXMLConvertFile() fails.

Returns

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE.

Discussion

 Only pContext

, pInFileName

, pOutFileName

, and bIndex

are required. All other pointers should be NULL when they are not set.

 If pCallbacks

is NULL, pOptions->pszDefaultOutputDirectory valid, except when bIndex is set to TRUE.

must be

This function runs in-process or out of process. See

Process” on page

.

“Convert Files Out of

 When converting out of process, this function must be called after the call to

KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession() .

See

“KVXMLStartOOPSession()” on page and

“KVXMLEndOOPSession()” on page .

 When converting out of process, the values for the KVXMLTemplate,

KVXMLOptions , and KVXMLTOCOptions structures should be set to NULL.

These structures are already passed in the call to

KVXMLStartOOPSession()

.

See

“KVXMLStartOOPSession()” on page .

XML Export SDK C Programming Guide

215

Chapter 8 XML Export API Functions

Example if(!(*KVXMLInt.KVXMLConvertFile)(

pKVXML, /* Pointer returned by fpInit() */

NULL,

&InputFile,

&OutputFile,

/* Input file */

/* Output file */

&XMLTemplates,

&XMLOptions,

/* Mark-up and related variables

/* Options

*/

*/

NULL,

NULL,

/* TOC options */

/* Pointer to callback functions */

FALSE, /* Index mode

&error))

*/

/* Error return value */

{

printf("Error converting %s to XML %d\n", argv[i - 1], error);

} else

{

printf("Conversion of %s to XML completed.\n\n", argv[i - 1]);

}

216

• XML Export SDK C Programming Guide

KVXMLEndOOPSession()

KVXMLEndOOPSession()

This function terminates the current out-of-process conversion session, and releases the source data and resources related to the session.

Syntax

BOOL pascal KVXMLEndOOPSession(

void *pContext,

BOOL bKeepServantAlive,

KVErrorCodeEx *pError

DWORD

void

dwOptions,

*pReserved1,

void *pReserved2 );

Arguments pContext pError

Pointer returned from fpInit().

bKeepServantAlive Set this to TRUE to keep a Servant process active after the

Export out-of-process session is terminated. If the Servant remains active, subsequent conversion requests are processed more quickly because the Servant is already prepared to receive data.

Set this to FALSE to terminate the Export out-of-process session and the associated Servant process.

Pointer to an error code defined in KVErrorCodeEx in kvtypes.h

. dwOptions pReserved1 pReserved2

Reserved for future use.

Reserved for future use.

Reserved for future use.

Returns

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE.

Example

The following sample code is from the cnv2xmloop sample program:

XML Export SDK C Programming Guide

217

218

Chapter 8 XML Export API Functions

/* declare endsession function pointer */

BOOL (pascal *fpKVXMLEndOOPSession)( void

BOOL ,

KVErrorCode *,

DWORD

void

,

*,

void *);

*,

/* assign OOP endsession function pointer */ fpKVXMLEndOOPSession = (BOOL (pascal *)( void

BOOL ,

KVErrorCode *,

*,

DWORD

void

,

*,

void * ))mpGetProcAddress(hKVXML,

"KVXMLEndOOPSession"); if(!fpKVXMLEndOOPSession)

{

printf("Error assigning KVXMLEndOOPSession() pointer\n");

(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);

(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);

mpFreeLibrary(hKVXML);

return 8;

}

/********END OOP SESSION, DO NOT KEEP SERVANT ALIVE *********/ if(!(*fpKVXMLEndOOPSession)(pKVXML,

FALSE,

&error,

0,

NULL,

NULL))

{

printf("Error calling fpKVXMLEndOOPSession \n");

(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);

(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);

(*KVXMLInt.fpShutDown)(pKVXML);

mpFreeLibrary(hKVXML);

return 10;

}

XML Export SDK C Programming Guide

KVXMLSetStyleSheet()

KVXMLSetStyleSheet()

This function is called directly and is used to specify the full path and filename of an external Style Sheet (XSL or CSS).

Syntax

BOOL pascal KVXMLSetStyleSheet(

void *pContext,

char

char

*pszStyleSheetName,

*pszRef);

Arguments pContext Pointer returned from fpInit().

pszStyleSheetName Pointer to the full path and filename of the style sheet.

pszUrlRef Pointer to the URL or filename of style sheet.

Returns

If the call is successful, the return value is TRUE.

If this call is unsuccessful, the return value is FALSE.

Discussion

 When the value for eStyleSheetType in KVXMLOptions is set to XML_XSL or

XML_CSS

, an external style sheet is referenced by a processing instruction of the form:

<?xml-stylesheet href="pszRef" type="text/xsl"?> or

<?xml-stylesheet href="pszRef" type="text/css"?>

If the value for pszStyleSheetName includes the output directory, the href only consists of the filename since the XML output resides in the same directory as the style sheet file.

 If the value for pszStyleSheetName

points to a directory other than the output directory, the href consists of the full path and filename.

 Style sheet information cannot be written to an external can only reference an existing XSL style sheet.

XSL

file. XML Export

XML Export SDK C Programming Guide

219

220

Chapter 8 XML Export API Functions

 When

XML_CSS

is specified, a CSS file can be created based on pszStyleSheetName .

 If the name of the CSS is not specified by using this function, a CSS style file is created with an automatically-generated filename.

This function runs in-process or out of process. See “Convert Files Out of

Process” on page

.

If this function is used to specify the name of the style file, that file is referenced in the processing instruction.

If the CSS file does not exist in the specified location, it is created.

If it exists, but is empty, CSS styles are written to it.

If the CSS file exists and is not empty, the file is not altered. There is no attempt made to validate the file.

If there are multiple calls made to fpConvertStream()

or

KVXMLConvertFile() , and the name of the style sheet has been set using

KVXMLSetStyleSheet

, the filename can be disabled by calling

KVXMLSetStyleSheet again with the pszStyleSheetName and pszRef set to

NULL. The filename can then be set to a different value by calling

KVXMLSetStyleSheet with the new filename prior to the next call to fpConvertStream()

or

KVXMLConvertFile()

.

When converting out of process, this function must be called after the call to

KVXMLStartOOPSession()

and before the call to

KVXMLEndOOPSession()

.

See

“KVXMLStartOOPSession()” on page and

“KVXMLEndOOPSession()” on page .

XML Export SDK C Programming Guide

KVXMLStartOOPSession()

KVXMLStartOOPSession()

This function performs the following:

 Initializes the out-of-process session.

Specifies the input stream or file.

Sets conversion options in the KV X

KVXMLTOCOptions

data structures.

MLTemplate , KVXMLOptions , and

Creates a Servant process.

Establishes a communication channel between the application thread and the

Servant.

 Sends the data to the Servant.

Syntax

BOOL pascal KVXMLStartOOPSession(

void *pContext,

KVInputStream

char

*pInputStream,

*pFileName,

KVXMLTemplate

KVXMLOptions

*pTemplates,

*pOptions,

KVXMLTOCOptions

DWORD

*pTOCCreateOptions

*pPID,

KVErrorCode

DWORD

*pError

dwOptions,

void

void

*pReserved1,

*pReserved2 );

XML Export SDK C Programming Guide

221

Chapter 8 XML Export API Functions

222

Arguments pContext pInputStream pFileName pTemplatesEx dwOptions pReserved1 pReserved2

Pointer returned from fpInit().

Pointer to the developer-assigned instance of

KVInputStream . The structure KVInputStream defines the input stream containing the source for the conversion.

If pInput is defined, then pFileName must be NULL. The input data can be defined as a data stream or file, but not both.

Pointer to the file to be converted. The file must exist on the same file system as the Servant.

If pFileName is defined, then pInput must be NULL. The input data can be defined as a data stream or file, but not both.

Pointer to the data structure KVXMLTemplate. It defines the overall structure of the output. Individual elements within the structure define the markup written at specific points in the output stream. See

“KVXMLTemplate” on page .

If this pointer is NULL, the default values for the structure are used.

pOptionsEx Pointer to the data structure KVXMLOptions. It defines the options that control the markup written in response to the general style and attributes (font, color, and so on) of the

document. See “KVXMLOptions” on page .

If this pointer is NULL, the default values for the structure are used.

pTOCCreateOptions Pointer to the data structure KVXMLTOCOptions. It specifies whether a heading is included in the table of contents. See

“KVXMLTOCOptions” on page .

If this pointer is NULL, the default values for the structure are used.

pPID pError

Address of a DWORD into which the Servant process ID is returned.

Pointer to an error code defined in KVErrorCode in kvtypes.h

.

Reserved for future use.

Reserved for future use.

Reserved for future use.

XML Export SDK C Programming Guide

KVXMLStartOOPSession()

Returns

Discussion

 After the out-of-process session is started successfully, all conversion functions can be called. The data is then processed on the Servant until the

session is terminated by a call to KVXMLEndOOPSession()

.

 All functions that can run out of process must be called within the out-of-process session, that is, after the call to

KVXMLStartOOPSession()

, and before the call to KVXMLEndOOPSession() .

 The

KVXMLConvertFile()

, and fpGetSummary() called once in a single out-of-process session.

functions can only be

 Since the KVXMLTemplate,

KVXMLOptions

, and

KVXMLTOCOptions

data structures are passed by this function, the same pointers in the call to

KVXMLConvertFile()

are ignored.

Example

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE.

The following sample code is from the cnv2xmloop sample program:

/* declare OOP startsession function pointer */

BOOL (pascal *fpKVXMLStartOOPSession)( void *,

KVInputStream

char

*,

*,

KVXMLTemplate

KVXMLOptions

*,

*,

KVXMLTOCOptions

DWORD

*,

*,

KVErrorCode

DWORD

*,

,

void

void

*,

* );

/* assign OOP startsession function pointer */ fpKVXMLStartOOPSession = (BOOL (pascal *)( void *,

KVInputStream

char

*,

*,

KVXMLTemplate

KVXMLOptions

*,

*,

KVXMLTOCOptions

DWORD

*,

*,

KVErrorCode *,

XML Export SDK C Programming Guide

223

224

Chapter 8 XML Export API Functions

DWORD

void

,

*,

void * ))mpGetProcAddress(hKVXML,

"KVXMLStartOOPSession"); if(!fpKVXMLStartOOPSession)

{

printf("Error assigning KVXMLStartOOPSession() pointer\n");

(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);

(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);

mpFreeLibrary(hKVXML);

return 7;

}

/********START OOP SESSION *****************/ if(!(*fpKVXMLStartOOPSession)(pKVXML,

&Input,

NULL,

&XMLTemplates,

&XMLOptions,

/* Mark-up and related variables */

/* Options */

NULL, /* TOC options */

&oopServantPID,

&error,

0,

NULL,

NULL))

{

printf("Error calling fpKVXMLStartOOPSession \n");

(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);

(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);

(*KVXMLInt.fpShutDown)(pKVXML);

mpFreeLibrary(hKVXML);

return 9;

}

XML Export SDK C Programming Guide

C HAPTER 9

XML Export API Callback

Functions

This section describes the XML Export API callback functions. It contains the following topics:

Introduction

Continue()

GetAnchor()

GetAuxOutput()

UserCB()

XML Export SDK C Programming Guide

225

226

Chapter 9 XML Export API Callback Functions

Introduction

The fpConvertStream() and KVXMLConvertFile() functions enable you to specify a callback function. A callback function controls the conversion while it is in progress. For example, you can specify a callback function to report progress during the conversion.

To use the API callback functions, declare one or more instances of the

KVXMLCallbacks

structure (see “KVXMLCallbacks” on page ). Each

member of this instance may then be initialized by assigning a function pointer to the application-defined callback functions, cast to the appropriate function prototype. Each instance of KVXMLCallbacks may define unique callback functions. Alternatively, the functions may be common to all instances of

KVXMLCallbacks ; these functions will take appropriate action, depending on the value of the pointer pCallingContext.

The second parameter (pCallingContext) of the call to fpConvertStream() and KVXMLConvertFile() provides a void pointer used to identify the context of this call. If more than one call to fpConvertStream() or KVXMLConvertFile() is made within a single application, any resulting callbacks are identified by the first parameter of the callback function. This allows the callback function to take any appropriate action, depending on which calling context is returned.

The seventh parameter (pCallbacks) of the call to fpConvertStream() and

KVXMLConvertFile() must be set to the address of the KVXMLCallbacks structure to be used for this call.

For sample code, see the sample program xmlcallback.c. It creates an XML stream and demonstrates the use of the callback functions.

XML Export SDK C Programming Guide

Continue()

Continue()

When fpConvertStream() or KVXMLConvertFile() is called control is not returned to the application until the entire document is processed. This callback function provides a means of monitoring progress and terminating the conversion process before the conversion is completed.

Syntax

BOOL (pascal *Continue) (

void *pCallingContext,

int nPercentComplete);

Arguments pCallingContext Pointer passed back to the caller-provided callback functions.

This pointer, which may be NULL, is specified as the second parameter of the call to fpConvertStream() and

KVXMLConvertFile() .

nPercentComplete Approximate percentage of the current conversion that is completed.

You can monitor the progress of the conversion by checking the value of nPercentDone, which indicates how many blocks out of the total number of blocks have been processed.

Returns

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE. Processing is halted.

Discussion

 There is a callback to this function for every entry that appears in the generated table of contents.

 The application is free to execute any required code in the callback function, with the exception of fpShutDown().

XML Export SDK C Programming Guide

227

228

Chapter 9 XML Export API Callback Functions

GetAnchor()

This function gets the filename automatically generated by Export and used for external graphics referenced with <a xmlns:xlink= xlink href=> tags, heading-level table of contents entries and external files (such as, CSS files and revision summary files).

Syntax

BOOL (pascal *GetAnchor) (

void *pCallingContext,

KVXMLAnchorType eAnchorType,

char *pszAnchor,

int cbAnchorMax,

BYTE

UINT

*pcHTML,

cbHTML);

Arguments pCallingContext Pointer that gets passed back to the caller-provided callback functions. This pointer, which may be NULL, is specified as the second parameter of the call to fpConvertStream().

eAnchorType The anchor type for the output stream. It must be one of the enumerated types defined in KVXMLAnchorType. See

“KVXMLAnchorType” on page

.

pszAnchor cbAnchorMax pcHTML cbHTML

Pointer to the location where the new anchor is stored.

Maximum number of bytes to place in pszAnchor.

This is either NULL or a pointer to one of the following:

 markup defining the contents of a table of contents entry

 the external graphic filename

 the external filename

Number of valid bytes in pcHTML.

Returns

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE. Processing is halted.

XML Export SDK C Programming Guide

GetAnchor()

Discussion

 If this callback is NULL, default anchor names are generated. The generated names are unique across the document.

 This function is called once per block, block chunk, graphic anchor, or extra file. Any required code may be executed here as long as a unique value for pszAnchor is assigned. If this string is not unique, an existing file may be overwritten, producing undesirable results. The callback function should contain the functionality to verify whether files already exist.

 If you want to specify graphic anchor names, but use default anchor names for all other anchors, provide the graphic names when eAnchorType is

VectorPictureAnchor or RasterPictureAnchor. For all other anchor types, call with the same parameters you were passed.

 pszAnchor must be assigned. It may be derived from the cbAnchorMax, pcHTML , and cbHTML values, which are also provided.

 pcHTML may be null if the graphic is an internal part of the document.

XML Export SDK C Programming Guide

229

Chapter 9 XML Export API Callback Functions

GetAuxOutput()

This callback function allows the calling application to specify an auxiliary output stream for a block or graphic.

Syntax

BOOL (pascal *GetAuxOutput) (

void *pCallingContext,

KVXMLAnchorType eAnchorType,

char *pszAnchor,

KVOutputStream *pNewOutput);

Arguments pCallingContext Pointer passed back to the caller-provided callback functions.

This pointer, which may be NULL, is specified as the second parameter of the call to fpConvertStream(). eAnchorType Graphic or block anchor as defined by the enumerated types in

KVXMLAnchorType . See

“KVXMLAnchorType” on page .

pszAnchor pNewOutput

Pointer to location where a new anchor is stored. pszAnchor is based on the call to GetAnchor().

Pointer to a KVOutputStream structure that may be used to write data to the current block.

230

Returns

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE. Processing is halted.

Discussion

 If GetAuxOutput() is NULL, the pszDefaultOutputDirectory member of the instance of KVXMLOptions is used as the base storage location for auxiliary output files. If pszDefaultOutputDirectory is also NULL, auxiliary files are placed in the current working directory. See

“KVXMLOptions”

 For each pszAnchor provided, create (malloc) an appropriate I/O structure.

Assign pNewOutput->pOutputStreamPrivateData to point to that structure. Each remaining member of the KVOutputStream should then be initialized by assigning a function pointer to the additional application-defined

XML Export SDK C Programming Guide

GetAuxOutput() functions, cast to the appropriate function prototype for Create(), Write(),

Seek() , Tell(), and Close(). Memory allocated to the I/O structure must be tracked and may be freed up within the call to Close(). See the callback.c

sample program.

XML Export SDK C Programming Guide

231

232

Chapter 9 XML Export API Callback Functions

UserCB()

This callback function is triggered by including the $USERCB token in a member of

KVXMLTemplate . For example, placing “$USERCB=my_callback “ in pszFirstH1Start results in a callback at the point when pszFirstH1Start is processed. The user callback function is identified by the text assigned to

$USERCB , which in this example is my_callback. This identifier is passed to the argument pszUserCBid.

Syntax

BOOL (pascal *UserCB) (

void *pCallingContext,

char *pszUserCBid,

KVOutputStream *pNewOutput

void *pReserved);

Arguments pCallingContext Pointer that gets passed back to the caller-provided callback function. This pointer, which may be NULL, is specified as the second parameter of the call to fpConvertStream().

pszUserCBid Pointer to a string that identifies the source of the callback. The identifier must be delimited by a trailing white space. For example, "my_callback ".

pNewOutput pReserved

Pointer to a KVOutputStream structure that can be used to write data to the current block.

Reserved for future use.

Returns

If the call is successful, the return value is TRUE.

If the call is unsuccessful, the return value is FALSE. Processing is halted.

XML Export SDK C Programming Guide

C HAPTER 10

XML Export API Structures

This section provides information on the structures used by the XML Export API.

These structures are defined in kvxml.h

, kvtypes.h

, and adinfo.h

. It contains the following topics:

ADDOCINFO

KVMemoryStream

KVSTR

KVStructHead

KVSumInfoElemEx

KVXConfigInfo

KVXMLHeadingInfo

KVXMLOptions

KVXMLTOCOptions

KVInputStream

KVOutputStream

KVStreamInfo

KVStyle

KVSummaryInfoEx

KVXMLCallbacks

KVXMLInterface

KVXMLTemplate

XML Export SDK C Programming Guide

233

234

Chapter 10 XML Export API Structures

ADDOCINFO

This structure provides the format, file class, and version number of the source document. It is defined in adinfo.h

, and is initialized by calling the function fpGetStreamInfo()

. See 196 .

typedef struct

{

ENdocClass

ENdocFmt

eClass;

eFormat;

long lVersion;

unsigned long ulAttributes;

} ADDOCINFO, *ADDOCINFOPTR;

Member Descriptions eClass eFormat

Source document’s file class (for example, spreadsheet, word processor or encapsulation format) as defined by the enumerated type ENDocClass.

Source document’s major format (for example Microsoft Word XML format, or Corel Presentation) as defined by the enumerated type

ENdocFmt in adinfo.h. The ENdocFmt type provides a unique ID for each major format. lVersion Version number of the file format. The number is multiplied by 1,000, so, for example, 1.02 is represented by 1020.

ulAttributes Other attributes of the document as defined by the enumerated type

ENdocAttributes .

Discussion

As format detection is enhanced in future releases, new format IDs may be added to the ENdocFmt enumerated type. When using this type, your code should ensure binary compatibility with future releases. For example, if you use an array to access format information based on a format ID, your code should check the format ID is less than

Max_Fmt

before accessing the data. This ensures new format codes are detected when you add KeyView binary files from new releases to your existing installation.

XML Export SDK C Programming Guide

KVInputStream

KVInputStream

This structure defines an input stream for the XML conversion. typedef struct tag_InputStream

{

void *pInputStreamPrivateData;

long lcbFilesize;

BOOL (pascal *fpOpen) (struct tag_InputStream *);

UINT (pascal *fpRead) (struct tag_InputStream *, BYTE *, UINT);

BOOL (pascal *fpSeek) (struct tag_InputStream *, long, int);

long (pascal *fpTell) (struct tag_InputStream *);

BOOL (pascal *fpClose)(struct tag_InputStream *);

}

KVInputStream;

Member Descriptions

All member functions are equivalent to their counterparts in the ANSI standard library, except fpOpen()

, which returns FALSE on failure. On fpOpen()

, if the size of the stream is known, assign that value to lcbFilesize . Otherwise, set lcbFilesize

to

0

.

XML Export SDK C Programming Guide

235

236

Chapter 10 XML Export API Structures

KVMemoryStream

This structure defines an optional memory allocator to be used by XML Export. It is initialized by calling fpInit() . See

“fpInit()” on page .

typedef struct tag_MemoryStream

{

void *pMemoryStreamPrivateData;

void * (pascal *fpMalloc)(struct tag_MemoryStream*,size_t);

void (pascal *fpFree) (struct tag_MemoryStream*, void *);

void * (pascal *fpRealloc)(struct tag_MemoryStream*,void *, size_t);

void * (pascal *fpCalloc)(struct tag_MemoryStream*, size_t, size_t);

}

KVMemoryStream;

Member Descriptions

All member functions are equivalent to their counterparts in the ANSI standard library.

Discussion

 fpRealloc()

must handle a NULL pointer.

 For systems that do not support sample program, which demonstrates how to use the memory management features.

fpRealloc() , refer to the xmlcallback

 If

KVMemoryStream allocation is used.

is not provided, then the default C run-time memory

XML Export SDK C Programming Guide

KVOutputStream

KVOutputStream

This structure defines an output stream for the XML conversion.

typedef struct tag_OutputStream

{

void *pOutputStreamPrivateData;

BOOL (pascal *fpCreate)(struct tag_OutputStream *,TCHAR *);

UINT (pascal *fpWrite) (struct tag_OutputStream *, BYTE *, UINT);

BOOL (pascal *fpSeek) (struct tag_OutputStream *, long, int);

long (pascal *fpTell) (struct tag_OutputStream *);

BOOL (pascal *fpClose) (struct tag_OutputStream *);

}

KVOutputStream;

Member Descriptions

All member functions are equivalent to their counterparts in the ANSI standard library.

XML Export SDK C Programming Guide

237

Chapter 10 XML Export API Structures

KVSTR

This structure is used to identify string types (string text and byte count) for the first three members of KVStyle . See

“KVStyle” on page .

typedef struct tag_KVSTR

{

char *pcString;

int cbString;

}

KVSTR;

Member Descriptions pcString cbString

Text string.

Length of pcString, excluding the terminating NULL(s). This allows

UNICODE or double bytes to be employed.

238

• XML Export SDK C Programming Guide

KVStreamInfo

KVStreamInfo

This structure defines a document’s character set and format. The structure is initialized by calling the function fpGetStreamInfo() . See

“fpGetStreamInfo()” on page .

typedef struct tag_KVStreamInfo

{

KVCharSet

ADDOCINFO

charset;

adInfo;

}

KVStreamInfo;

Member Descriptions charset adInfo

Character set of the source document, if that information is ascertainable.

This member is an integer corresponding to the KVCharSet enumerated type in kvtypes.h.

File class, major format, and version of the source document. Pointer to the ADDOCINFO structure. The structure of ADDOCINFO is defined in adinfo.h

. See “ADDOCINFO” on page .

 adInfo.eClass

represents the source document’s class as defined by the enumerated type ENDocClass.

 adInfo.eFormat

represents the source document’s format as defined by the enumerated type ENdocFmt.

 adInfo.lVersion

represents the version number of the file format.

The number is multiplied by 1,000, so, for example, 1.02 is represented by 1020.

 adInfo.ulAttributes

represents other attributes of the document as defined by the enumerated type ENdocAttributes.

Discussion

As format detection is enhanced in future releases, new format IDs may be added to the ENdocFmt enumerated type. When using this type, your code should ensure binary compatibility with future releases. For example, if you use an array to access format information based on a format ID, your code should check the format ID is less than

Max_Fmt

before accessing the data. This ensures new format codes are detected when you add KeyView binary files from new releases to your existing installation.

XML Export SDK C Programming Guide

239

240

Chapter 10 XML Export API Structures

KVStructHead

This structure contains the current KeyView version number and is the first member of other structures. It enables Autonomy to modify the structures in future releases, but to maintain backward compatibility. Before initializing a structure that contains the KVStructHead structure, use the macro KVStructInit to initialize

KVStructHead

. The structure and macro are defined in kvtypes.h

.

typedef struct _KVStructHead

{

WORD

WORD

version;

size;

DWORD

void

reserved;

*internal;

} KVStructHeadRec, *KVStructHead;

Member Descriptions version size reserved internal

The current KeyView version number. This is a symbolic constant

(KeyviewVersion) defined in kvxtract.h. This constant will be updated for each KeyView release.

The size of the KVStructHeadRec.

Reserved for internal use.

Reserved for internal use.

Example

KVStructInit(&openArg);

XML Export SDK C Programming Guide

KVStyle

KVStyle

This structure defines the style mapping support for

KVSTR

-defined styles. The first three members of KVStyle are KVSTR

structures (see “KVSTR” on page ).

Each KVSTR structure contains the text string and byte count for

StyleName

,

MarkUpStart , and MarkUpEnd . The structure is initialized by calling the function fpSetStyleMapping()

.

See

“fpSetStyleMapping()” . See

“Map Styles” on page

for more information on mapping styles.

XML Export supports both paragraph styles and character styles. It works on the assumption that each style has a unique name. Only one paragraph style may be active at one time; therefore, the opening of a new paragraph style automatically closes the previous paragraph style. By contrast, several character styles may be active at once. When XML Export receives an EndCharStyle token from the format parser, the most recent character style is terminated.

typedef struct tag_KVStyles

{

KVSTR

KVSTR

StyleName;

MarkUpStart;

KVSTR

DWORD

}

KVStyle;

MarkUpEnd;

dwFlags;

XML Export SDK C Programming Guide

241

242

Chapter 10 XML Export API Structures

Member Descriptions

StyleName

MarkUpStart

MarkUpEnd dwFlags

The name of the word processing style (for example, “Heading

1”) to which style mapping applies. A pointer to the KVSTR structure. See

“KVSTR” on page

.

Style names are case sensitive.

The markup added to the beginning of a paragraph or character style. A pointer to the KVSTR structure. See

“KVSTR” on page .

The markup added to the end of a paragraph or character style.

A pointer to the KVSTR structure. See

“KVSTR” on page .

Instructions on how to process the content associated with a paragraph or character style. The flag can be one of the types defined in kvtypes.h. They are described in

Table on page .

The value associated with each flag is a hexadecimal number.

You can set an option by either entering the converted decimal value or entering the flag’s text (for example, KVSTYLE_PRE)

The value of Flags in the template files is passed to this member of KVStyle.

Discussion

This structure applies to word processing documents only.

By default, XML Export maps the heading style “Heading 1” to <h1></h1> , and so on, for heading levels 1 through 6. If you use style mappings, the default mapping is overridden. Therefore, you must supply markup for all heading levels.

 When the user-defined markup in KVStyle conflicts with other markup generated by XML Export, the user-defined markup takes precedence.

XML Export SDK C Programming Guide

KVSumInfoElemEx

KVSumInfoElemEx

This structure defines the individual metadata elements. typedef struct tag_KVSumInfoElemEx

{

int isValid;

KVSumInfoType

void

type;

*data;

char

}

*pcType;

KVSumInfoElemEx;

Member Descriptions isValid type data pcType

Specifies whether the data value is present in the document. The setting

1 specifies the value is valid and exists.

Data type of the metadata element. The types are defined in the structure KVSumInfoType in kvtypes.h. See

“KVSumInfoType” on page .

The content of the metadata field.

If the type member is KV_Int4 or KV_Bool, then this member contains the actual value. Otherwise, this member is a pointer to the actual value.

KV_DateTime and KV_IEEE8 point to an 8-byte value.

KV_String and KV_Unicode point to the beginning of the string containing the text.

Pointer to the name of the metadata field.

XML Export SDK C Programming Guide

243

244

Chapter 10 XML Export API Structures

KVSummaryInfoEx

This structure provides a count of the number of metadata elements in a document, and a pointer to the first element of the array of elements. The structure is initialized by calling the function fpGetSummaryInfo()

. See

“fpGetSummaryInfo()” on page .

typedef struct tag_KVSummaryInfoEx

{

int nElem;

KVSumInfoElemEx

}

*pElem;

KVSummaryInfoEx;

Member Descriptions nElem pElem

Number of metadata elements contained in the array. nElem may be zero.

This indicates that the document did not contain metadata, such as an

ASCII text document.

Points to the first element of the array of document metadata elements defined by the structure KVSumInfoElemEx. See

“KVSumInfoElemEx” on page .

XML Export SDK C Programming Guide

KVXConfigInfo

KVXConfigInfo

This structure defines the document type of a source XML file, and the element extraction settings for that type. The settings can be applied based on the file format ID, or the file’s root element. This structure is in kvtypes.h

and is initialized by calling the function KVXMLConfig()

. See “Convert XML Files” on page and

205

.

typedef struct TAG_KVXConfigInfo

{

ENdocFmt

char*

eKVFormat;

pszRoot;

char*

char*

char*

char*

pszInMeta;

pszExMeta;

pszInContent;

pszExContent;

pszInAttribute; char*

}

KVXConfigInfo;

Member Descriptions eKVFormat pszRoot pszInMeta

The format ID as detected by the KeyView detection module. This determines the file type to which these extraction settings apply.

The format ID is defined by the enumerated type ENdocFmt. See

“File Format Detection” on page for more information on

format ID values.

If you are adding configuration settings for a custom XML document type, this is not defined.

The file’s root element. When the format ID is not defined, the root element is used to determine the file type to which these settings apply.

To further qualify the element, specify its namespace. See

“Specify an Element’s Namespace and Attribute” on page .

The elements extracted from the file as metadata. All other elements are extracted as text. Multiple entries must be separated by commas.

To further qualify the element, specify its namespace and/or attributes. See

“Specify an Element’s Namespace and Attribute” on page

.

XML Export SDK C Programming Guide

245

246

Chapter 10 XML Export API Structures pszExMeta pszInContent pszExContent

The child elements in the included metadata elements that are not extracted from the file as metadata. For example, the default extraction settings for the Visio XML format, extracts the

DocumentProperties element as metadata. This element includes child elements such as Title, Subject, Author,

Description , and so on. However, the child element

PreviewPicture is defined in pszExMeta because it is binary data and should not be extracted.

You cannot exclude any metadata elements from the output for

StarOffice files. All metadata is extracted regardless of this setting.

To further qualify the element, specify its namespace and/or attributes. See

“Specify an Element’s Namespace and Attribute” on page

.

The elements extracted from the file as content text. An asterisk

(*) extracts all elements including child elements.

To further qualify the element, specify its namespace and/or attributes. See

“Specify an Element’s Namespace and Attribute” on page

.

The child elements in the included content elements that are not extracted from the file as content text.

To further qualify the element, specify its namespace and/or attributes. See

“Specify an Element’s Namespace and Attribute” on page

.

pszInAttribute The attribute values extracted from the file. If attributes are not defined, attribute values are not extracted. The namespace (if used), element name and attribute name must be defined in the following format:

namespace:elementname@attributename

For example:

Autonomy:division@name

XML Export SDK C Programming Guide

KVXMLCallbacks

KVXMLCallbacks

This structure provides all callbacks that can result from a call to fpConvertStream() or KVXMLConvertFile()

. See “fpConvertStream()” on page and

214 . Any and all of the function

pointers can be NULL. typedef BOOL (pascal *KVXMLCB_CONTINUE)(

void *pcallingContext,

int nPercentDone); typedef BOOL (pascal *KVXMLCB_GETANCHOR)(

void *pCallingContext,

KVXMLAnchorType

char

eAnchorType,

*pszAnchor,

Int cbAnchorMax,

BYTE

UINT

*pcHTML,

cbHTML); typedef BOOL (pascal *KVXMLCB_GETAUXOUTPUT)(

void *pCallingContext,

KVXMLAnchorType

char

eAnchorType,

*pszAnchor,

KVOutputStream *pNewOutput); typedef BOOL (pascal *KVXMLCB_USERCB) (

void

char

*pCallingContext,

*psUserCBid,

KVOutputStream

void

*pOutput,

*pReserved); typedef struct tag_KVXMLCallbacks

{

KVXMLCB_CONTINUE

KVXMLCB_GETANCHOR

fpContinue;

fpGetAnchor;

KVXMLCB_GETAUXOUTPUT

KVXMLCB_USERCB

fpGetAuxOutput;

fpUserCB;

}

KVXMLCallbacks;

Member Descriptions

 The members of this structure are function pointers to the functions described in

“XML Export API Callback Functions” on page .

 If fpGetAuxOutput()

is NULL, the pszDefaultOutputDirectory

member of the instance of KVXMLOptions is used as the base storage location for auxiliary output files. If pszDefaultOutputDirectory

is also NULL, auxiliary files are placed in the current working directory. See

“KVXMLOptions” on page .

XML Export SDK C Programming Guide

247

248

Chapter 10 XML Export API Structures

KVXMLHeadingInfo

This structure defines how XML Export creates heading information based on the source document’s content and attributes. Source text is converted to a heading and included in the table of contents if it meets all the criteria defined by this structure, and the headingCreateType

member of automatic heading generation.

KVXMLTOCOptions

is set to allow

XML Export evaluates the text against each member in the order in which the members appear below.

See

“KVXMLTOCOptions” on page for more information on automatic

generation of headings.

typedef struct tag_KVXMLHeadingInfo

{

int minParaLen;

int maxParaLen;

int fontSizeMin;

int fontSizeMax;

BOOL

BOOL

BOOL

BOOL

bMustBeBold;

bMustBeItalic;

bMustBeUnderlined;

bNonZeroIndent;

BOOL

BOOL

bNoTabs;

bNoMultiSpaces;

int nSpaceBefore;

int nSpaceAfter;

}

KVXMLHeadingInfo;

XML Export SDK C Programming Guide

KVXMLHeadingInfo

Member Descriptions minParaLen maxParaLen bNonZeroIndent bNoTabs

The minimum number of characters that a paragraph in the source document can contain for the text to meet the criteria for heading conversion.

Applies to word processing documents only.

The default is 3 for heading levels 1 to 3.

The maximum number of characters that a paragraph in the source document can contain for the text to meet the criteria for heading conversion.

Applies to word processing documents only.

The default is 80 for heading levels 1 to 3. fontSizeMin fontSizeMax bMustBeBold

The minimum font size of text in the source document for the text to meet the criteria for heading conversion.

The default is 14 for heading level 1, and 12 for heading levels

2 and 3.

The maximum font size of text in the source document for the text to meet the criteria for heading conversion.

The default is 20 for heading level 1, and 14 for heading levels

2 and 3.

If this is set to TRUE, the text in the source document must be bold to meet the criteria for heading conversion.

The default is TRUE for heading levels 1 and 2, and FALSE for heading level 3.

bMustBeItalic If this is set to TRUE, the text in the source document must be italic to meet the criteria for heading conversion.

The default is FALSE. bMustBeUnderlined If this is set to TRUE, the text in the source document must be underlined to meet the criteria for heading conversion.

The default is FALSE.

If this is set to TRUE, the text in the source document must be indented to meet the criteria for heading conversion. If set to

FALSE, the text must be aligned left.

The default is FALSE.

If this is set to TRUE, the text in the source document must not contain tabs to meet the criteria for heading conversion.

The default is FALSE.

XML Export SDK C Programming Guide

249

Chapter 10 XML Export API Structures bNoMultiSpaces nSpaceBefore nSpaceAfter

If this is set to TRUE, the text in the source document must not contain two or more contiguous white spaces to meet the criteria for heading conversion.

The default is FALSE.

The amount of space in TWIPS (20th of a point) that must come before a paragraph in the source document for the text to meet the criteria for heading conversion. If –1 is used, the amount of space before the paragraph is not considered in the heading generation.

The default is 0.

The amount of space in TWIPS (20th of a point) that must follow a paragraph in the source document for the text to meet the criteria for heading conversion. If –1 is used, the amount of space after the paragraph is not considered in the heading generation.

The default is 0.

250

• XML Export SDK C Programming Guide

KVXMLInterface

KVXMLInterface

The members of this structure are pointers to the API functions described in

“XML

Export API Functions” on page .

typedef void* (pascal *KVXML_INIT) (

KVMemoryStream *pMemAllocator,

char

char

*pszKeyViewDir,

*pszDataFile,

KVErrorCode

DWORD

*pError,

dWord); typedef void (pascal *KVXML_SHUTDOWN)(void*); typedef BOOL (pascal *KVXML_CONVERT_STREAM) (

void *pContext,

void *pCallingContext,

KVInputStream

KVOutputStream

*pInput,

*pOutput,

KVXMLTemplate

KVXMLOptions

*pTemplates,

*pOptions,

KVXMLTOCOptions

KVXMLCallbacks

*pTOCCreateOptions,

*pCallbacks,

BOOL bIndex,

KVErrorCode *pError); typedef char** (pascal *KVXML_GET_FILE_LIST)(

void *pContext,

int *pnSize ); typedef BOOL (pascal *KVXML_GET_STREAM_INFO)(

void *pContext,

KVInputStream

KVStreamInfo

*pInput,

*pStreamInfo ); typedef BOOL (pascal *KVXML_GET_ANCHOR) (

void *pCallingContext,

KVXMLAnchorType

char

eAnchorType,

*pszAnchor,

int cbAnchorMax,

BYTE

UINT

*pcHTML,

cbHTML); typedef BOOL (pascal *KVXML_INPUTSTREAM_CREATE) (

void *pContext,

char *pszFileName,

KVInputStream *pInput); typedef BOOL (pascal *KVXML_INPUTSTREAM_FREE) (

void *pContext,

KVInputStream *pInput); typedef BOOL (pascal *KVXML_OUTPUTSTREAM_CREATE) (

void

char

*pContext,

*pszFileName,

KVOutputStream *pOutput );

XML Export SDK C Programming Guide

251

252

Chapter 10 XML Export API Structures typedef BOOL (pascal *KVXML_OUTPUTSTREAM_FREE)(

void *pContext,

KVOutputStream *pOutput ); typedef KVLanguageID (pascal *KVXML_LANGUAGE_ID)(void *pContext); typedef BOOL (pascal *KVXML_GET_SUMMARY_INFO)(

void *pContext,

KVInputStream *pInput,

KVSummaryInfoEx *pSummary,

BOOL bFree ); typedef BOOL (pascal *KVXML_SET_STYLE_MAPPING) (

void *pContext,

KVStyle *pStyles,

int iStyles,

BOOL bCopy); typedef BOOL (pascal *KVXML_VALIDATE_TEMPLATE)(

void *pContext,

KVOutputStream *pOutput,

KVXMLTemplate

KVXMLOptions

*pTemplate,

*pOptions,

KVXMLTOCOptions

KVXMLCallbacks

*pTOCOptions,

*pCallBalls,

KVMemoryStream *pMemStream) typedef struct tag_KVXMLInterface

{

KVXML_INIT fpInit;

KVXML_SHUTDOWN fpShutDown;

KVXML_CONVERT_STREAM

KVXML_GET_FILE_LIST

fpConvertStream;

fpGetConvertFileList;

KVXML_GET_STREAM_INFO

KVXML_GET_ANCHOR

fpGetStreamInfo;

fpGetAnchor;

KVXML_INPUTSTREAM_CREATE

KVXML_INPUTSTREAM_FREE

fpFileToInputStreamCreate;

fpFileToInputStreamFree;

KVXML_OUTPUTSTREAM_CREATE

KVXML_OUTPUTSTREAM_FREE

fpFileToOutputStreamCreate;

fpFileToOutputStreamFree;

KVXML_GET_SUMMARY_INFO

KVXML_SET_STYLE_MAPPING

fpGetSummaryInfo;

fpSetStyleMapping;

KVXML_VALIDATE_TEMPLATE

}

fpValidateTemplate;

KVXMLInterface;

Member Descriptions

 The members of this structure are function pointers to the functions described in

“XML Export API Functions” on page .

KVXML_VALIDATE_TEMPLATE

is currently not implemented.

XML Export SDK C Programming Guide

KVXMLOptions

KVXMLOptions

This structure defines the options that control the XML markup written in response to the general style and attributes (font, color, and so on) of the document. The structure is initialized by calling the function fpConvertStream()

or

KVXMLConvertFile()

. See 186

or

“KVXMLConvertFile()” on page

. typedef struct tag_KVXMLOptions

{

BOOL

char

bUseVerityDTD;

*pszVerityDTDPath;

KVXMLStyleSheetType

BOOL

eStyleSheetType

bUseExistingStyleSheet;

char

BOOL

*pszStyleSheet;

bIndexOnly;

KVCharSet

BOOL

eOutputCharSet;

bForceOutputCharSet;

KVCharSet

BOOL

eSrcCharSet;

bForceSrcCharSet;

KVLanguageID

BOOL

eOutputLanguageID;

bUseDocumentColors;

BOOL

BOOL

bUseDocumentFontInfo;

bNbspEmptyCells;

ENSATableBorder eSATableBorder;

int nTableBorderWidth;

char

char

char

char

char

char

BOOL

BOOL

*pszBaseURL;

*pszMainURL;

*pszDefaultOutputDirectory;

*pszPicPath;

*pszPicURL;

*pszJavaURL;

bRemoveFileNameSpaces;

bRasterizeFiles

KVXMLGraphicType

KVXMLGraphicType

eOutputRasterGraphicType;

eOutputVectorGraphicType;

int cxVectorToRasterXRes;

int cyVectorToRasterYRes;

int nCompressionQuality;

BOOL

long

bGenerateURLs;

lcbMaxMemUsage;

BYTE

BYTE

cReplaceChar;

cRedact;

KVXMLEmptyParaType eEmptyParaType;

KVXMLHardPageBreakType eHardPageBreakType;

BOOL

BOOL

BOOL

bSupportColumnHeadings;

bSupportRowHeadings;

bSupportCellSpan;

XML Export SDK C Programming Guide

253

254

Chapter 10 XML Export API Structures

BOOL

BOOL

BOOL

BOOL

bSupportRowSpan;

bSupportColumnWidth;

bRemoveEmptyColumns;

bRemoveEmptyRows;

BOOL bEnableEmptyRows;

int nRowsBeforeSplit;

}

KVXMLOptions;

Member Descriptions bUseVerityDTD pszVerityDTDPath eStyleSheetType

Set to TRUE to generate XML based on the Verity DTD. For more information, see

“Use the Verity Document Type Definition (DTD)” on page . This generates a valid XML document suitable as a general

interchange format. If FALSE, the XML is based on the source document’s paragraph structure.

The default is TRUE.

If you move the Verity DTD from the default tempout directory to another output directory, set the string value of pszVerityDTDPath to the new location. This path is added to the document type declaration in the XML file.

The default is no path. That is, the DTD is assumed to be in the same directory as the generated XML files.

One of the enumerated options for processing style sheet information.

The options are defined in KVXMLStyleSheetType in kvxml.h. See

STYLESHEET_DISABLED —Disables style sheet formatting.

XML_CSS —Enables Cascading Style Sheet (CSS) formatting, and outputs the generated formatting data in an external CSS file referenced in the XML output as a tag.

XML_XSL —Enables Extensible Stylesheet Language (XSL) formatting, and uses an external XSL file referenced in a

<?xml-stylesheet...?> processing instruction.

The default is STYLESHEET_DISABLED.

XML Export SDK C Programming Guide

KVXMLOptions bUseExistingStyleSheet pszStyleSheet bIndexOnly eOutputCharSet

Set to TRUE to apply an existing XSL style sheet or a CSS to an XML document. The style sheet filename is inserted into the type declaration at the beginning of the XML file. The location of the external style sheet file is set by pszStyleSheet. If pszStyleSheet is not specified and the style sheet type is XSL, then a default XSL style sheet, appropriate for the source document type, is used. The default

XSL style sheets are:

 wp.xls

(for word processing documents)

 ss.xls

(for spreadsheets)

 pg.xls

(for presentations)

If pszStyleSheet is not specified and the style sheet type is CSS, then a CSS file is created.

Existing style sheets are not validated.

The default is FALSE.

The path and filename of an external style sheet.

The default is no path.

Set this to TRUE to generate output with minimal markup (ID and style paragraph attributes) and without images. Since the generated output is minimized to textual content, it is suitable for an indexing engine. If bIndexOnly is set to FALSE, embedded images in a document are regenerated as separate files and stored in the output directory.

The template file named xml_index.ini and the xmlindex sample program demonstrate the effect of setting bIndexOnly.

To generate output with verbose markup and without images, set the nType argument of the function KVXMLConfig() to

KVCFG_SUPPRESSIMAGES

. See “KVXMLConfig()” on page .

Applies to word processing documents and spreadsheets only.

The default is FALSE.

The character set to use for textual output. To ensure the character set defined here is used, you must set bForceOutputCharSet to

TRUE. The available character sets are enumerated in KVCharSet in kvtypes.h

. See

“Convert Character Sets” on page .

The section

“Supported Formats” on page lists the file formats for

which character set information can be determined.

The default is KVCS_UNKNOWN.

XML Export SDK C Programming Guide

255

256

Chapter 10 XML Export API Structures bForceOutputCharSet eSrcCharSet bForceSrcCharSet eOutputLanguageID bUseDocumentColors bUseDocumentFontInfo bNbspEmptyCells

Set to TRUE to use the output character set specified in eOutputCharSet , regardless of the internal document information or

the source character set specified by eSrcCharSet. See “Convert

Character Sets” on page

.

Forcing a character set to KVCS_UNKNOWN is always ignored.

The default is FALSE.

Specifies the character set of the document. To ensure the character set defined here is used, you must set bForceSrcCharSet to TRUE.

The available character sets are enumerated in KVCharSet in kvtypes.h

. See

“Convert Character Sets” on page . The section

“Supported Formats” on page lists the file formats for which

character set information can be determined.

The default is KVCS_UNKNOWN.

Set to TRUE to use the source character set specified in eSrcCharSet , regardless of the internal document information. See

“Convert Character Sets” on page .

Forcing a character set to KVCS_UNKNOWN is always ignored.

The default is FALSE.

The language for the textual output of language-specific data such as time and date. eOutputLanguageID must be in the system locale. If eOutputLanguageID is invalid or not supplied, the system default is used. Language IDs are defined in KVLanguageID in kvtypes.h.

The default is Language_UNKNOWN.

Set to TRUE to retain the color attributes information contained in the source document. If set to FALSE, no color attributes appear in the

<font> tags of the output.

The default is FALSE.

Set to TRUE to retain the font information contained in the source document. If set to FALSE, no font information appears in the <font> tags in the output.

The default is FALSE.

Set to TRUE to include a non-breaking space (<td>&nbsp;</td>) in the markup for empty table cells in the source document. If this is set to FALSE, <td></td> is generated for empty table cells.

Applies to word processing documents and spreadsheets only.

The default is TRUE.

XML Export SDK C Programming Guide

KVXMLOptions eSATableBorder nTableBorderWidth

Specifies whether table borders are based on the setting in the source document, are always on, or are always off. The options are enumerated in ENSATableBorder in kvtypes.h. See

“ENSATableBorder” on page .

Applies to word processing documents only.

The default is SA_BaseOnDocument.

Sets the width of the table border in pixels.

Applies to word processing documents only.

The default is 1.

pszBaseURL pszMainURL

The base URL that replaces the $BASE token in the XML output.

The default is NULL.

The main URL that replaces the $MAIN token in the XML output.

The default is NULL.

pszDefaultOutputDirectory The default output directory for auxiliary files created during the conversion.

The default is NULL, and the files are placed in the directory in which your application is running.

pszPicPath pszPicURL

The output directory for graphic files created during the conversion. If specified, this member can also be used by the callback functions

KVXMLGetAnchor and KVXMLGetAuxOutput.

Applies to word processing documents only.

The default is NULL, and the files are placed in the directory in which your application is running.

The URL of the graphic files created from embedded graphics in the source document. To specify a complete image source, this element must be combined with pszAnchor of the fpGetAnchor callback

function. See “GetAnchor()” on page .

For example, setting pszPicURL to ../cgi-bin/ and setting pszAnchor to pic.jpg results in the following markup:

<a xmlns:xlink= xlink href="../cgi-bin/pic.jpg"> pszJavaURL

Applies to word processing documents only.

The default is NULL.

The URL where the Java rasterizer (kvvector.jar) is located.

The Java rasterizer is not currently enabled.

The default is NULL.

XML Export SDK C Programming Guide

257

Chapter 10 XML Export API Structures

258

• bRemoveFileNameSpaces bRasterizeFiles eOutputRasterGraphicType eOutputVectorGraphicType cxVectorToRasterXRes cyVectorToRasterYRes nCompressionQuality bGenerateURLs

Set to TRUE to remove spaces from generated output filenames.

The default is FALSE.

Set to TRUE to rasterize slides from presentations into single images.

Set to FALSE to only extract text from presentation files. When this member is set to FALSE graphics do not appear in the output.

Since XML Export only extracts textual components from presentations, this member must be set to FALSE.

The default is FALSE.

The output format of rasterized embedded graphics. There are six options enumerated in KVXMLGraphicType in kvxml.h. See

“KVXMLGraphicType” on page .

The default is KVGFX_JPEG.

The output format of vector graphics. The options are enumerated in

KVXMLGraphicType in kvxml.h. The default is JPEG. See

“KVXMLGraphicType” on page . For more information on

converting vector graphics on UNIX or Linux, see

“Display Vector

Graphics on UNIX and Linux” on page .

The default is KVGFX_JPEG.

Controls the X resolution (width in pixels) at which presentations and graphics are converted. This is set in conjunction with cyVectorToRasterYRes . To set this member, see

“Setting the

Resolution of Presentations and Graphics” on page

.

The default is 0, which means the original resolution is retained.

Controls the Y resolution (height in pixels) at which presentations and graphics are converted. This is set in conjunction with cxVectorToRasterXRes . To set this member, see

“Setting the

Resolution of Presentations and Graphics” on page

.

The default value is 0, which means the original resolution is retained.

Controls the output quality of graphics that support compression quality (for example, JPEG). A value of 0 means default quality (85 compression); 1 is the lowest quality (highest compression and therefore the smallest file size); 100 is the highest quality (no compression and therefore the largest file size).

Applies to word processing documents only.

The default is 0.

Set to TRUE to add anchor tags (<a xmlns:xlink= xlink href=> </a> ) to text starting with “www”, “http:” or “file:”.

Applies to word processing documents only.

The default is FALSE.

XML Export SDK C Programming Guide

KVXMLOptions lcbMaxMemUsage cReplaceChar cRedact eEmptyParaType eHardPageBreakType bSupportColumnHeadings bSupportRowHeadings

The maximum memory allocated dynamically for token buffers during file processing. If this maximum is reached, Export performs a swap-to-disk operation internally, and then reuses the memory blocks.

Export maintains an internal minimum memory size.

Applies to word processing or text documents only.

The default is LONG_MAX. The unit is in bytes.

The character used when a character in the source document’s character set cannot be mapped to the output character set.

The default replacement character is a question mark (?).

The character that replaces tagged text that has been designated, through style mapping, to be omitted from the output. This functionality is useful when you need to hide confidential or sensitive information.

The specified character is used for all text that has been mapped to a style processed with the KVSTYLE_REDACT flag (defined in kvtypes.h

). See

“Map Styles” on page

.

Applies to word processing documents only.

The default replacement character is “X”.

Determines if paragraphs without content generate markup or ID attributes in the output file. There are three options enumerated in

KVXMLEmptyParaType in kvxml.h. See

“KVXMLEmptyParaType” on page .

Applies to word processing documents only.

The default is KVEPT_SUPPRESS.

Determines if hard page breaks generate markup or ID attributes in the output file. There are four options enumerated in

KVXMLEmptyParaType in kvxml.h. See

“KVXMLHardPageBreakType” on page .

Applies to word processing documents only.

The default is KVHPBT_SUPPRESS.

Set to TRUE to include column headings from the source spreadsheet in the output.

Applies to spreadsheets only.

The default is FALSE.

Set to TRUE to include row headings from the source spreadsheet in the output.

Applies to spreadsheets only.

The default is FALSE.

XML Export SDK C Programming Guide

259

260

Chapter 10 XML Export API Structures bSupportCellSpan bSupportRowSpan bSupportColumnWidth bRemoveEmptyColumns bRemoveEmptyRows bEnableEmptyRows nRowsBeforeSplit

Set to TRUE to include colspan=”n” markup in the output.

Applies to spreadsheets only.

The default value is FALSE.

Set to TRUE to include row span data from the source spreadsheet in the output.

Applies to spreadsheets only.

The default value is FALSE. Currently not supported.

Set to TRUE to include column width data from the source spreadsheet in the output.

Applies to spreadsheets only.

The default value is FALSE.

Set to TRUE to remove spreadsheet columns that do not contain data and to disable cell merging.

Applies to spreadsheets only.

The default is FALSE.

Set this to TRUE to remove spreadsheet rows that do not contain data or color, and to disable cell merging.

Applies to spreadsheets only.

The default is FALSE.

Set to TRUE to display empty rows in a spreadsheet format. If set to

FALSE, empty rows are not displayed. This only applies to 20 or more consecutive empty rows.

Applies to spreadsheets only.

The default is FALSE.

The approximate number of spreadsheet rows to be processed before splitting a table. This helps to prevent large spreadsheet tables from occurring in a single document, which can cause speed and processing problems for the browser.

Applies to spreadsheets only.

The default is 0.

XML Export SDK C Programming Guide

KVXMLOptions

Discussion

A pointer to this structure is passed as an argument to fpConvertStream()

and

KVXMLConvertFile() . If the pointer to the structure is not NULL, the values of the members specified in the structure are used. If the pointer to the structure is

NULL, the default values are used.

Setting the Resolution of Presentations and Graphics

The members cxVectorToRasterXRes and cyVectorToRasterYRes are set in conjunction to specify the resolution (width in pixels) at which presentations and graphics are converted.

You can specify the resolution in one of two ways:

 as a proportion of the original resolution

 as a specified number of pixels

Setting the Resolution Proportionally

To set the resolution proportionally, set one of the members

( cxVectorToRasterXRes or cyVectorToRasterYRes ) to a percentage of the original resolution, and one to zero. For example, the following setting converts the graphic at 50 percent of the original resolution: cxVectorToRasterXRes=-50 cyVectorToRasterYRes=0

The following setting converts the graphic at 200 percent of the original resolution: cxVectorToRasterXRes=0 cyVectorToRasterYRes=-200

The member that is set to zero is automatically adjusted to maintain the aspect ratio. If both cxVectorToRasterXRes

and cyVectorToRasterYRes

are set to a percentage, cyVectorToRasterYRes defaults to zero during the conversion.

Setting the Resolution in Pixels

To set the resolution in pixels, set one of the members ( cxVectorToRasterXRes or cyVectorToRasterYRes

) to the number of pixels, and one to zero. For example: cxVectorToRasterXRes=0 cyVextorToRasterYRes=1500

The member that is set to zero is automatically adjusted to maintain the aspect ratio. The maximum resolution is 4,000 pixels.

XML Export SDK C Programming Guide

261

262

Chapter 10 XML Export API Structures

KVXMLTemplate

This structure defines the overall framework of the XML output. Members in this structure define the XML markup written at specific points in the output stream.

The pointers contain XML markup that may include embedded KeyView-defined tokens. The XML markup contained in these strings should be well-formed. For the generated document to be valid, the markup must conform to the Verity DTD.

The structure is initialized by calling the function fpConvertStream() or

KVXMLConvertFile()

. See 186

or

“KVXMLConvertFile()” on page

.

typedef struct tag_KVXMLTemplate

{

char *pszMainTop;

char *pszMainBottom;

char *pszFirstH1Start;

char *pszFirstH1End;

char *pszMiddleH1Start;

char *pszMiddleH1End;

char *pszLastH1Start;

char *pszLastH1End;

char *pszH[2..6]XML;

char *pszTOCH[1..6]Start;

char *pszTOC_H[1..6];

char *pszTOCH[1..6]End;

char *pszXFile;

char *pszXStartBlock;

char *pszXEndBlock;

char *pszStartBlock;

char *pszEndBlock;

BOOL bPutBlocksInSeparateFiles;

BOOL bHardPageMakesNewBlock

long lcbBlockSize;

char *pszChunkTemplate;

char *pszUserSummary;

char *pszTOCH[1..6]LeafNode;

}

KVXMLTemplate;

XML Export SDK C Programming Guide

KVXMLTemplate

Member Descriptions pszMainTop pszMainBottom pszFirstH1Start pszFirstH1End pszMiddleH1Start pszMiddleH1End pszLastH1Start pszLastH1End

The markup and tokens inserted at the beginning of the main XML file.

Most of the sample template files feature <MetaData> tags with tokens that store the input document’s metadata. This member does not include the processing instructions or document type declarations that appears at the beginning of an XML document. The document type declaration

<?xml version= is automatically generated by XML Export. If you are using style sheets or the Verity DTD, the processing instructions

<?xml stylesheet= ...> are also automatically generated by XML Export.

The default is NULL.

The markup and tokens inserted at the end of the main XML file.

The default is NULL.

The markup and tokens inserted at the beginning of the first created H1

XML block (that is, the block associated with the first H1 table of contents entry).

The default is NULL.

The markup and tokens inserted at the end of the first created H1 XML block (that is, the block associated with the first H1 table of contents entry).

The default is NULL.

The markup and tokens inserted at the beginning of those H1 XML blocks that are neither the first nor the last H1 blocks created (that is, blocks associated with all but the first and last H1 table of contents entries).

The default is NULL.

The markup and tokens inserted at the end of those H1 XML blocks that are neither the first nor the last H1 blocks created (that is, blocks associated with all but the first and last H1 table of contents entries).

The default is NULL.

The markup and tokens inserted at the beginning of the last created H1

XML block (that is, the block associated with the last H1 table of contents entry).

The default is NULL.

The markup and tokens inserted at the end of the last created H1 XML block (that is, the block associated with the last H1 table of contents entry).

The default is NULL.

XML Export SDK C Programming Guide

263

264

Chapter 10 XML Export API Structures pszH[2..6]XML pszTOCH[1..6]Start pszTOC_H[1..6] pszTOCH[1..6]End pszXFile pszXStartBlock pszXEndBlock

The markup and tokens inserted in an XML block for heading levels 2 through 6.

The default is NULL.

The markup and tokens inserted at the beginning of a table of contents block for heading levels 1 through 6 entries. For example:

<ol list-style-type="upper-roman">

The default is NULL.

The markup and tokens required to process the table of contents entries for heading levels 1 through 6. For example:

<a xmlns:xlink="http://www.w3.org/TR/xlink" xlink href=

"#$ANCHOR"> $TOCTE</a>

If the table of contents heading contains special characters, such as an ampersand (&) or parentheses, you must use the $TOCPE token in the pszTOC_H[1..6] markup. This token retains character entities and prevents validity errors. See

“Export Tokens” on page

for more information on table of contents tokens.

The default is NULL.

The markup and tokens inserted at the end of a table of contents block for heading levels 1 through 6 entries. For example:

</ol>

The default is NULL.

The markup and tokens generated and placed in an extra XML file. This file holds content from the source document. To process this file, you would use the $XANCHOR token. See

“Export Tokens” on page for

more information on Export tokens.

The default is NULL.

The markup and tokens inserted at the beginning of each XML block generated by the $XANCHOR token. If either this member or pszXEndBlock is defined, both pszStartBlock and pszEndBlock are ignored. See

“Export Tokens” on page

for more information on

Export tokens.

The default is NULL.

The markup and tokens to output at the end of each XML block generated by the $XANCHOR token. If either this member or pszXStartBlock is defined, both pszStartBlock and pszEndBlock are ignored. See

“Export Tokens” on page for more information on Export tokens.

The default is NULL.

XML Export SDK C Programming Guide

KVXMLTemplate pszStartBlock pszEndBlock

The markup and tokens inserted at the beginning of each block created as a result of lcbBlockSize or bHardPageMakesNewBlock.

The default is NULL.

The markup and tokens inserted at the end of each block created as a result of lcbBlockSize or bHardPageMakesNewBlock.

The default is NULL.

bPutBlocksInSeparateFiles Set to TRUE to create a separate XML file for each heading level 1 block.

Each new block uses the markup defined in pszStartBlock and pszEndBlock . If set to FALSE, then each heading level 1 block is placed sequentially in the same file, after the initial markup is written.

The default is FALSE.

bHardPageMakesNewBlock lcbBlockSize pszChunkTemplate

Set to TRUE to have hard page breaks in the source document generate new XML files during the conversion process. The member pszchunktemplate provides the appropriate table of contents entry for the new block.

Applies to word processing documents and spreadsheets only.

The default is FALSE.

The maximum size (in bytes) of heading level 1 XML output files. This number is used as a guideline and may be exceeded to break content at a logical location (for example, a row boundary).

The default. This means the size is undefined and unlimited.

If an H1 XML block is subdivided into separate files as a result of the size limitations specified in lcbBlockSize, this member provides a template for creating a table of contents entry for the new file. The block number can be made a part of this template by inserting the token

$SPLITBLOCKNUMBER . For example:

Page $SPLITBLOCKNUMBER pszUserSummary

The default is NULL.

The markup and tokens generated when the tokens $USERSUMMARY or

$SUMMARY are used. For example:

<MetaData name=”$NAME” content=”$CONTENT”/> pszTOCH[1..6]LeafNode

The default is NULL.

The markup that replaces pszTOC_H[1..6] entries for leaf nodes in the table of contents. A leaf node is a node that has no children.

The default is NULL.

XML Export SDK C Programming Guide

265

Chapter 10 XML Export API Structures

Discussion

A pointer to this structure is passed as an argument to fpConvertStream()

and

KVXMLConvertFile() . If the pointer to the structure is not NULL, the values of the members specified in the structure are used. If the pointer to the structure is

NULL, the default values are used.

266

• XML Export SDK C Programming Guide

KVXMLTOCOptions

KVXMLTOCOptions

This structure defines whether a heading is included in the table of contents.

Source text is converted to a heading in the XML output if

 it meets all the criteria defined by the members of the headingCreateType member of automatic heading generation.

KVXMLHeadingInfo

KVXMLTOCOptions is set to allow

, and

The structure is initialized by calling the function fpConvertStream() or

KVXMLConvertFile()

. See 186

or

“KVXMLConvertFile()” on page

.

See

“KVXMLOptions” on page

for more information on the criteria used to determine whether a heading is included in the table of contents.

Typedef struct tag_KVXMLTOCOptions

{

BOOL bAllowHeadingsInTables;

KVHeadingCreateOptions headingCreateType;

KVXMLHeadingInfo

KVXMLHeadingInfo

KVXMLHeadingInfo

KVXMLHeadingInfo

*pH1;

*pH2;

*pH3;

*pH4;

KVXMLHeadingInfo

KVXMLHeadingInfo

}

KVXMLTOCOptions;

*pH5;

*pH6;

XML Export SDK C Programming Guide

267

268

Chapter 10 XML Export API Structures

Member Descriptions bAllowHeadingsInTables Determines if the text in tables is considered for automatic heading generation. If set to TRUE, the text in tables is included in the determination of headings and table of contents entries.

Applies to word processing documents and spreadsheets only.

The default is FALSE.

headingCreateType

KVXMLHeadingInfo

Determines how XML Export subdivides the source document into table of contents entries. This can be set to one of the two options enumerated in

KVHeadingCreateOptions in kvxml.h. See

“KVHeadingCreateOptions” on page

.

The determination of table of contents entries is based on whether the source document contains heading styles or whether text attributes conform to the criteria defined in the structure KVXMLHeadingInfo. See

“KVXMLHeadingInfo” on page .

Heading styles are predefined style tags, such as “Heading 1” and

“Heading 2” tags in a Microsoft Word document. Text attributes are bold, underlined, italic, and so on.

Applies to word processing documents only.

The default is KVCS_DocHeadingsOnly.

Pointer to the structure KVXMLHeadingInfo. See “KVXMLHeadingInfo” on page .

When the table of contents entries are not based on the source documents heading styles, the table of contents entries are determined by whether text attributes (such as bold, underlined, and italic text) in the source document meet all the criteria defined in KVXMLHeadingInfo.

Discussion

A pointer to this structure is passed as an argument to fpConvertStream()

and

KVXMLConvertFile() . If the pointer to the structure is not NULL, the values of the members specified in the structure are used. If the pointer to the structure is

NULL, the default values are used.

XML Export SDK C Programming Guide

C HAPTER 11

Enumerated Types

This section provides information on some of the enumerated types used by the

XML Export API. It contains the following topics:

Introduction

ENSATableBorder

KVCredKeyType

KVErrorCode

KVErrorCodeEx

KVXMLStyleSheetType

KVXMLAnchorType

KVXMLGraphicType

KVHeadingCreateOptions

KVXMLEmptyParaType

KVXMLHardPageBreakType

KVMetadataType

KVMetaNameType

KVSumInfoType

KVSumType

LPDF_DIRECTION

XML Export SDK C Programming Guide

269

Chapter 11 Enumerated Types

270

Introduction

The enumerated types are in adinfo.h, kvtypes.h,kvxml.h, and kvxtract.h

. These header files are in the include directory. The first entry in an enumerated type structure should be set to zero (0). Each subsequent entry is increased by 1. For example, the first five entries of KVCharSet in kvtypes.h are:

KVCS_UNKNOWN

KVCS_SJIS

KVCS_GB

KVCS_BIG5

KVCS_KSC

They would be set in the following way:

Enumerated Type

KVCS_UNKNOWN

KVCS_SJIS

KVCS_GB

KVCS_BIG5

KVCS_KSC

Setting

2

3

0

1

4

Many enumerated types may also be set by entering the appropriate symbolic constant, or TRUE/FALSE.

Programming Guidelines

As KeyView is enhanced in future releases, some enumerated types may be expanded. For example, new format IDs may be added to the ENdocFmt enumerated type, or new error codes may be added to the KVErrorCodeEx enumerated type. When using these expandable types, your code should ensure binary compatibility with future releases.

For example, if you use an array to access error messages based on an error code, your code should check the error code is less than KVError_Last before accessing the data. This ensures new error codes are detected when you add

KeyView binary files from new releases to your existing installation.

The following enumerated types are expandable:

KVErrorCodeEx

KVMetadataType

KVCharSet

XML Export SDK C Programming Guide

ENSATableBorder

KVLanguageID

KVSubfileType

ENdocFmt

ENSATableBorder

This enumerated type defines the type of border to display around tables. It is defined in kvtypes.h.

Definition typedef enum tag_ENSATableBorder

{

SA_BaseOnDocument,

SA_NoBorder,

SA_Border

}

ENSATableBorder;

Enumerators

SA_BaseOnDocument

SA_NoBorder

SA_Border

Border type is based on the document.

Table borders are always off.

Table borders are always on.

KVCredKeyType

This enumerated type defines the type of credential used to open a protected file.

See

“KVCredentialComponent” on page . It is defined in kvxtract.h.

Definition typedef enum tag_KVCredKeyType

{

KVCredKeyType_UserName,

KVCredKeyType_UserIdFile,

KVCredKeyType_Password,

}

KVCredKeyType;

XML Export SDK C Programming Guide

271

Chapter 11 Enumerated Types

272

Enumerators

KVCredKeyType_UserName The credential in KVCredentialComponent is a user name.

KVCredKeyType_UserIdFile The credential in KVCredentialComponent is a path to a file containing user IDs.

KVCredKeyType_Password The credential in KVCredentialComponent is a password.

KVErrorCode

This enumerated type defines the type of error generated if Export fails. It is defined in kvtypes.h.

Definition typedef enum tag_KVErrorCode

{

KVERR_Success, /* 0 Success*/

KVERR_DLLNotFound, /* 1 DLL or shared library not found*/

KVERR_OutOfCore, /* 2 memory allocation failure*/

KVERR_processCancelled, /* 3 fpContinue() returns FALSE*/

KVERR_badInputStream,

KVERR_badOutputType,

/* 4 Invalid/corrupt input stream*/

/* 5 Invalid output type requested*/

KVERR_General, /* 6 General error.... */

KVERR_FormatNotSupported, /* 7 Format not supported*/

KVERR_PasswordProtected,

KVERR_ADSNotFound,

/* 8 File is Password Protected*/

/* 9 Adobe Document Server not found*/

KVERR_AutoDetFail, /* 10 Autodetect error*/

KVERR_AutoDetNoFormat, /* 11 Unable to detect file format*/

KVERR_ReaderInitError,

KVERR_NoReader,

/* 12 Error initializing the reader*/

/* 13 No reader available for this format*/

KVERR_CreateOutputFileFailed, file*/

KVERR_CreateTempFileFailed,

/* 14 Unable to create output

/* 15 Unable to create temp file*/

KVERR_ErrorWritingToOutputFile, /* 16 Error writing to output file*/

KVERR_CreateProcessFailed,

KVERR_WaitForChildFailed,

/* 17 Error creating a child process*/

/* 18 Wait for child process failed*/

KVERR_ChildTimeOut, /* 19 Child process hung / timed out*/

XML Export SDK C Programming Guide

KVErrorCode

KVERR_ArchiveFileNotFound, file*/

KVERR_ArchiveFatalError should abort*/

/* 20 Attempt to extract nonexistent

/* 21 Fatal error processing archive -

}

KVErrorCode;

Enumerators

KVERR_SUCCESS

KVERR_DLLNotFound

KVERR_OutOfCore

KVERR_processCancelled

KVERR_badInputStream

KVERR_badOutputType

KVERR_General

KVERR_FormatNotSupported

KVERR_PasswordProtected

KVERR_ADSNotFound

KVERR_AutoDetFail

KVERR_AutoDetNoFormat

KVERR_ReaderInitError

KVERR_NoReader

KVERR_CreateOutputFileFailed

Function completed successfully.

A DLL or shared library was not found.

Memory allocation failure.

Callback function fpContinue() returns FALSE.

Invalid or corrupt input stream.

Invalid output is requested.

General error.

File format is not supported.

File is encrypted or password-protected. KeyView only supports secure PST files.

Adobe Document Server not found. This error is obsolete.

Autodetect error.

Unable to detect file format.

Error initializing the reader.

No reader available for this format.

Unable to create output file.

If the overwrite flag in KVExtractSubFileArg is FALSE, and a sub file has the same name as a file in the target path, this

error is generated. See “KVExtractSubFileArg” on page

.

KVERR_CreateTempFileFailed Unable to create temporary file.

KVERR_ErrorWritingToOutputFile Error writing to output file.

KVERR_CreateProcessFailed

KVERR_WaitForChildFailed

Error creating a child process.

Wait for child process failed.

XML Export SDK C Programming Guide

273

Chapter 11 Enumerated Types

274

KVERR_ChildTimeOut

KVERR_ArchiveFileNotFound

KVERR_ArchiveFatalError

Child process hung/timed out.

Attempt to extract nonexistent file.

Fatal error processing an archive file.

KVErrorCodeEx

This enumerated type defines extended error codes. It is defined in kvtypes.h.

Definition typedef enum tag_KVErrorCodeEx

{

KVError_OpenStreamFailure = KVERR_ArchiveFatalError + 1, /* 22

KVOpen stream failure */

/* 23 Interface function not found */

KVError_InputFileNotFound, /* 24 Cannot find input file*/

/* 25 Cannot open output file*/

KVError_MemoryLeak, /*

KVError_MemoryOverwrite, /*

26 Memory leak*/

27 Memory overwrite*/

KVError_GPF, filtering*/

/* 28 Exception during oop

KVError_OopCore, /*

KVError_KVoopLogFailed, /*

29 Core dump in child process*/

30 Creation of oop error log failed*/

KVError_OverNestedFileLimit, /* 31 File exceeds nested file limit*/

KVError_PSTAccessFailed,

KVError_PasswordRequired, file*/

/* 32 Access failed on PST files*/

/* 33 Password required to access

KVError_InvalidArgs invalid*/

KVError_OopBadConfig, incomplete*/

/*

KVError_ReaderUsageDenied, license*/

34 Input argument/structure is

/* 35 Reader requires a valid

/* 36 Config buffer data was

KVError_OopBrokenPipe, failed*/

KVError_OopPipeOEF, write*/

/* 37 Read/write to/from pipe

/* 38 Pipe was closed prior to read/

KVError_IPCTimeOut, select*/

/* 39 Pipe/socket timed out on poll/

OOP server but context driver does not exist on the server*/

XML Export SDK C Programming Guide

KVErrorCodeEx

OOP service that does not exist*/

KVError_ZeroFile, /* 42 Input file is empty or zero bytes */

KVError_CompressionNotSupported /* 43 File or subfile is compressed with unsupported method */

KVError_Last

}

KVErrorCodeEx;

Enumerators

KVError_OpenStreamFailure =

KVERR_ArchiveFatalError +1

Failed to open a stream during out-of-process filtering. This is an extended error for the code KVERR_General. This is used by KeyView Filter.

KVError_InterfaceFunctionNotFound An interface function was not found during out-of-process filtering. This is an extended error for the code

KVERR_General . This is used by KeyView Filter.

KVError_InputFileNotFound

KVError_OpenOutputFileFailed

Could not find the input file during out-of-process filtering.

This is an extended error for the code KVERR_General.

This is used by KeyView Filter.

Could not open the output file during out-of-process filtering. This is an extended error for the code

KVERR_General . This is used by KeyView Filter.

KVError_MemoryLeak

KVError_MemoryOverwrite

KVError_GPF

KVError_OopCore

KVError_KVoopLogFailed

KVError_OverNestedFileLimit

Memory leak occurred during out-of-process filtering. This is an extended error for the code KVERR_General. This is used by KeyView Filter.

Memory overwrite occurred during out-of-process filtering.

This is an extended error for the code KVERR_General.

This is used by KeyView Filter.

Exception occurred during out-of-process filtering. This is an extended error for the code KVERR_General. This is used by KeyView Filter.

Memory dump was generated in a child process during out-of-process filtering. This is an extended error for the code KVERR_General. This is used by KeyView Filter.

Creation of out-of-process error log failed. This is an extended error for the code KVERR_General. This is used by KeyView Filter.

The container file has more than the allowable number of child documents. One or more child documents were not converted. Currently, this is not used.

XML Export SDK C Programming Guide

275

276

Chapter 11 Enumerated Types

KVError_PSTAccessFailed

KVError_PasswordRequired

KVError_InvalidArgs

KVError_ReaderUsageDenied

KVError_OopBadConfig

KVError_OopBrokenPipe

KVError_OopPipeOEF

KVError_IPCTimeOut

The PST file could not be converted. This error may be returned when a call to fpOpenFile() returns NULL for one of the following reasons:

 Microsoft Outlook client is not installed

 Microsoft Outlook client is installed, but is not the default email client

 Microsoft Outlook client is installed, but is not configured correctly

 PST file is corrupt

 PST file is read-only (PST files must allow read and write access)

 MAPI call fails

To open the file, credentials must be provided. This error may be returned when a call to fpOpenFile() returns

NULL.

The input argument or structure is invalid. This is generated by the File Extraction APIs.

The current license key does not enable the document reader required to convert the file. This error may be returned when a call to fpOpenFile() returns NULL.

Some document readers are considered advanced features and are licensed separately from the KeyView

SDK (for example, the PST and MBX readers). Contact your Autonomy sales representative to get an updated license key.

Information in the kvxconfig.ini file is incomplete and cannot be used to the XML file. This is used by KeyView

Filter.

Data was not transferred between the parent and child processes during out-of-process filtering because either the parent or child failed. This is used by KeyView Filter.

Data was not transferred between the parent and child processes during out-of-process filtering because the parent process was shutdown. This is used by KeyView

Filter.

Either the parent or child process is waiting for a reply or request during out-of-process filtering. This is used by

KeyView Filter.

XML Export SDK C Programming Guide

KVXMLStyleSheetType

KVError_InvalidOopDriverSignature A client sent a request to an out-of-process server, but the context driver does not exist on the server. This is used by

KeyView Filter.

KVError_InvalidOopServiceSignature A client sent a request to a File Extraction service that does not exist.

If this error is generated on the call to fpClose(), it can be ignored. This is used by KeyView Filter.

KVError_ZeroFile

KVError_CompressionNotSupported

The input file is empty or zero bytes.

The file or subfile is compressed with an unsupported compression method.

Discussion

 As error reporting is enhanced in future releases, new error messages may be added to this enumerator type. When using this type, your code should ensure binary compatibility with future releases. See

“Programming Guidelines” on page .

 If an extended error code is called for a format to which the error does not apply, the code KVError_Last is returned.

KVXMLStyleSheetType

This enumerated type defines the options for processing style sheet information. It is defined in kvxml.h.

Definition typedef enum tag_KVXMLStyleSheetType

{

STYLESHEET_DISABLED = 0,

XML_CSS,

XML_XSL,

}

KVXMLStyleSheetType;

XML Export SDK C Programming Guide

277

278

Chapter 11 Enumerated Types

Enumerators

STYLESHEET_DISABLED Disables Cascading Style Sheet (CSS) formatting.

XML_CSS Enables cascading style sheet (CSS) formatting and generates an external file or uses an existing external file which is referenced in a <?xml-stylesheet...?> processing instruction.

XML_XSL Enables Extensible Stylesheet Language (XSL) formatting and uses an external XSL file which is referenced in a

<?xml-stylesheet...?> processing instruction.

KVXMLAnchorType

This enumerated type defines the anchor types for the output stream. It is defined in kvxml.h.

Definition typedef enum tag_KVXMLAnchorType

{

VectorPictureAnchor = 0,

RasterPictureAnchor,

H1Anchor,

H2Anchor,

H3Anchor,

H4Anchor,

H5Anchor,

H6Anchor,

XAnchor,

AnimatedGIFAnchor,

CSSAnchor,

XSLAnchor,

GeneralAnchor,

DBAnchor,

JPEGAnchor

}

KVXMLAnchorType;

XML Export SDK C Programming Guide

KVXMLGraphicType

Enumerators

VectorPictureAnchor

RasterPictureAnchor

H1Anchor

H2Anchor

H3Anchor

H4Anchor

H5Anchor

H6Anchor

XAnchor

AnimatedGIFAnchor

CSSAnchor

XSLAnchor

GeneralAnchor

DBAnchor

JPEGAnchor

Anchor for embedded vector graphics.

Anchor for embedded raster graphics.

Anchor for heading level H1 blocks.

Anchor for heading level H2 blocks.

Anchor for heading level H3 blocks.

Anchor for heading level H4 blocks.

Anchor for heading level H5 blocks.

Anchor for heading level H6 blocks.

Anchor for an external file.

Anchor for embedded animated GIF graphics.

Anchor for external CSS file.

Anchor for external XSL file.

Reserved for future use.

Used internally.

Anchor for embedded JPEG graphic.

KVXMLGraphicType

This enumerated type defines graphic formats to which embedded graphics and presentations are converted. It is defined in kvxml.h.

Definition typedef enum tag_KVXMLGraphicType

{

KVGFX_GIF,

KVGFX_JPEG,

KVGFX_PNG,

KVGFX_CGM,

KVGFX_WMF,

KVGFX_JAVA

}

KVXMLGraphicType;

XML Export SDK C Programming Guide

279

280

Chapter 11 Enumerated Types

Enumerators

KVGFX_GIF

KVGFX_JPEG

KVGFX_PNG

KVGFX_CGM

KVGFX_WMF

KVGFX_JAVA

Specifies GIF (Graphics Interchange Format) as the graphic type.

Specifies JPEG (Joint Photographic Experts Group) as the graphic type.

Specifies PNG (Portable Network Graphics) as the graphic type.

Specifies CGM (Computer Graphics Metafile) as the graphic type.

Specifies WMF (Windows Metafile) as the graphic type.

Deprecated.

Also see

“Display Vector Graphics on UNIX and Linux” on page .

KVHeadingCreateOptions

This enumerated type defines whether Export generates blocks and block chunks

(see “Definition of Terms” on page ) based only on the heading styles defined in

a source document (if they are available), or based on both the source document’s heading styles and headings that are created automatically by Export.

Headings that are created automatically by Export are based on the text attributes of the source document as defined by KVXMLHeadingInfo (see

“KVXMLHeadingInfo” on page

). It is defined in kvxml.h.

Definition typedef enum tag_KVHeadingCreateOptions

{

KVHC_DocHeadingsOnly,

KVHC_CreateHeadingsAlways

}

KVHeadingCreateOptions;

XML Export SDK C Programming Guide

KVXMLEmptyParaType

Enumerators

KVHC_DocHeadingsOnly This instructs Export to rely exclusively on heading styles defined in the source document.

However, if the source document does not contain heading styles, Export generates blocks on its own using the criteria defined by the structure

KVHeadingInfo .

KVHC_CreateHeadingsAlways This instructs Export to use the heading styles in the source document when available, and to also automatically create table of contents entries based on the criteria defined by the structure

KVHeadingInfo .

KVXMLEmptyParaType

This enumerated type defines the options for paragraphs that do not contain content. It is defined in kvxml.h.

Definition typedef enum tag_KVXMLEmptyParaType

{

KVEPT_SUPPRESS,

KVEPT_EMPTY,

/* No markup generated */

/* Use <p/> */

KVEPT_VERBOSE

}

/* Use <p id="...>&nbsp;</p> */

KVXMLEmptyParaType;

XML Export SDK C Programming Guide

281

282

Chapter 11 Enumerated Types

Enumerators

KVEPT_SUPPRESS

KVEPT_EMPTY

KVEPT_VERBOSE paragraphs without content are ignored. They do not contribute white space and do not affect the ID number of subsequent paragraphs. This is the default value.

paragraphs without content are represented by an

“empty” paragraph element <p/>. These contribute minimal white space, but do not affect the ID number of subsequent paragraphs.

paragraphs without content contain a fully-defined start tag <p id=”...”> with all non-default attributes, a

&nbsp; character entity, and end tag </p>. These contribute additional white space and affect the ID number of subsequent paragraphs.

KVXMLHardPageBreakType

This enumerated type defines the options for hard page breaks. It is defined in kvxml.h

.

Definition typedef enum tag_KVXMLHardPageBreakType

{

KVHPBT_SUPPRESS,

KVHPBT_EMPTY,

/* No markup generated */

/* Use <Page/> */

KVHPBT_EMPTYID,

KVHPBT_ID

/* Use <Page id="n"/> */

/* Use <Page id="n"> ... </Page> */

}

KVXMLHardPageBreakType;

Enumerators

KVHPBT_SUPPRESS No markup is generated for hard page breaks. This is the default value.

XML Export SDK C Programming Guide

KVMetadataType

KVHPBT_EMPTY

KVHPBT_EMPTYID

KVHPBT_ID

An empty page element, <Page/>, without ID attributes is generated for hard page breaks.

An empty page element, <Page id=”n”/>, with ID attributes is generated for hard page breaks. The ID is incremented for each subsequent hard page break.

A “non-empty” “Page” element is generated for hard page breaks. The page tags enclose the contents immediately after the <WP> tag. When subsequent hard page breaks are encountered, the previous “Page” element is closed with a </

Page> tag, and a <Page id=”...”> opening tag is added.

The final “Page” element is closed immediately before the closing </WP> tag.

KVMetadataType

This enumerated type defines the data type of metadata that can be extracted from a sub file in a mail message or mail store. If a metadata field has a corresponding KeyView type in KVMetadataType, the metadata is converted to the KVMetadataElem structure, and the structure member isDataValid is 1.

See

“KVMetadataElem” on page . It is defined in kvtypes.h.

Definition typedef enum

{

KVMetadata_Unknown

KVMetadata_Bool

= 0,

= 1,

KVMetadata_Binary

KVMetadata_Int4

= 2,

= 3,

KVMetadata_UInt4

KVMetadata_Int8

= 4,

= 5,

KVMetadata_UInt8

KVMetadata_String

= 6,

= 7,

KVMetadata_Unicode

KVMetadata_DateTime

= 8,

= 9,

KVMetadata_Float

KVMetadata_Double

= 10,

= 11,

KVMetadata_Last

}

KVMetadataType;

XML Export SDK C Programming Guide

283

284

Chapter 11 Enumerated Types

Enumerators

KVMetadata_Unknown

KVMetadata_Bool

KVMetadata_Binary

KVMetadata_Int4

The value in the property is of an unknown type.

The value in the property is a boolean. The corresponding

MAPI type is PT_BOOLEAN.

The value in the property is a byte array. The corresponding MAPI type is PT_BINARY.

The value in the property is a signed 4-byte integer. The corresponding MAPI types are PT_I2, PT_SHORT,

PT_I4 , and PT_LONG.

KVMetadata_UInt4

KVMetadata_Int8

KVMetadata_UInt8

KVMetadata_String

The value in the property is an unsigned 4-byte integer.

This type is not currently supported.

The value in the property is a signed 8-byte integer. This type is not currently supported.

The value in the property is an unsigned 8-byte integer.

This type is not currently supported.

The value in the property is a string. The corresponding

MAPI type is PT_STRING8.

KVMetadata_Unicode The value in the property is a Unicode string. The corresponding MAPI type is PT_UNICODE.

KVMetadata_DateTime The value in the property is a date and time. The corresponding MAPI type is PT_SYSTIME.

KVMetadata_Float

KVMetadata_Double

The value in the property is a 4-byte float. The corresponding MAPI type is PT_FLOAT.

The value in the property is an 8-byte double. The corresponding MAPI type is PT_DOUBLE.

Discussion

New types may be added to this enumerated type. When using this type, your code should ensure binary compatibility with future releases. See

“Programming

Guidelines” on page .

XML Export SDK C Programming Guide

KVMetaNameType

KVMetaNameType

This enumerated type defines the type of metadata fields extracted from a sub file in a mail message or mail store. See

“KVMetaName” on page . It is defined in

kvxtract.h

.

Definition typedef enum

{

KVMetaNameType_Integer = 0,

KVMetaNameType_String

}

KVMetaNameType;

Enumerators

KVMetaNameType_Integer

KVMetaNameType_String

The metadata field is an integer.

The metadata field is a string.

KVSumInfoType

This enumerated type defines the data type of the metadata field extracted from a document. See

“Extract Metadata” on page . It is defined in kvtypes.h.

Definition typedef enum tag_KVSumInfoType

{

KV_String

KV_Int4

= 0x1,

= 0x2,

KV_DateTime

KV_ClipBoard

= 0x3,

= 0x4,

KV_Bool = 0x5,

KV_Unicode = 0x6,

KV_IEEE8

KV_Other

}

KVSumInfoType;

= 0x7,

= 0x8

XML Export SDK C Programming Guide

285

Chapter 11 Enumerated Types

Enumerators

KV_String

KV_Int4

KV_DateTime

The value in the metadata field is a string.

The value in the metadata field is an integer.

The value in the metadata field is a date and time. This type is a

64-bit value representing the number of 100-nanosecond intervals since January 1, 1601 (Windows FILETIME EPOCH). You may need to convert this value into another format.

KV_ClipBoard

KV_Bool

KV_Unicode

KV_IEEE8

KV_Other

Currently not supported.

The value in the metadata field is a boolean.

The value in the metadata field is a Unicode string.

The value in the metadata field is an IEEE 8-byte integer.

The value in the metadata field is user-defined.

286

KVSumType

This enumerated type defines the metadata fields that can be extracted from a document.

 Types 0 to 34 and type 42 are office summary fields.

Types 35 to 40 are computer-aided design (CAD) metadata fields.

Type 41, KV_OrigAppVersion, is shared by office software and CAD.

Types 43 or greater are reserved for any non-standard metadata field defined in a document. See

“Extract Metadata” on page . It is defined in kvtypes.h.

Definition typedef enum tag_KVSumType

{

KV_CodePage = 0,

KV_Title = 1,

KV_Subject = 2,

KV_Author = 3,

KV_Keywords = 4,

KV_Comments = 5,

KV_Template = 6,

KV_LastAuthor = 7,

XML Export SDK C Programming Guide

KVSumType

Enumerators

KV_CodePage

KV_Title

KV_Subject

KV_Author

KV_RevNumber = 8,

KV_EditTime = 9,

KV_LastPrinted = 10,

KV_Create_DTM = 11,

KV_LastSave_DTM = 12,

KV_PageCount = 13,

KV_WordCount = 14,

KV_CharCount = 15,

KV_ThumbNail = 16,

KV_AppName = 17,

KV_Security = 18,

KV_Category = 19,

KV_PresentationTarget = 20,

KV_Bytes = 21,

KV_Lines = 22,

KV_Paragraphs = 23,

KV_Slides = 24,

KV_Notes = 25,

KV_HiddenSlides = 26,

KV_MMClips = 27,

KV_ScaleCrop = 28,

KV_HeadingPairs = 29,

KV_TitlesofParts = 30,

KV_Manager = 31,

KV_Company = 32,

KV_LinksUpToDate = 33,

KV_HyperlinkBase = 34,

KV_Layouts = 35,

KV_Objects = 36,

KV_FileVersion = 37,

KV_LastFileVersion = 38,

KV_OrigFileVersion = 39,

KV_OrigFileType = 40,

KV_OrigAppVersion = 41,

KV_ContentStatus = 42,

KV_UserDefined = 43

}

KVSumType;

Code page of the document.

Contents of the “Title” property field taken from the source document.

Contents of the “Subject” property field taken from the source document.

Contents of the “Author” property field taken from the source document.

XML Export SDK C Programming Guide

287

Chapter 11 Enumerated Types

288

KV_Keywords

KV_Comments

KV_Template

KV_LastSavedby

KV_RevNumber

KV_EditTime

KV_LastPrinted

KV_Create_DTM

KV_LastSave_DTM

KV_PageCount

KV_WordCount

KV_CharCount

KV_ThumbNail

KV_AppName

KV_Security

KV_Category

Contents of the “Keywords” property field taken from the source document.

Contents of the “Comments” property field taken from the source document.

Contents of the “Template” property field taken from the source document.

Contents of the “Last saved by” property field taken from the source document.

Contents of the “Revision number” property field taken from the source document.

Contents of the “Total editing time” property field taken from the source document.

Contents of the “Printed” property field taken from the source document.

Contents of the “Created” property field taken from the source document.

Contents of the “Modified” property field taken from the source document.

Contents of the “Pages” property field taken from the source document. The field provides the number of pages in the document.

Contents of the “Words” property field taken from the source document. The field provides the number of words in the document.

Contents of the “Characters” property field taken from the source document.

The field provides the number of characters in the document.

Thumbnail image of a document.

Contents of the “Type” property field taken from the source document. This field identifies the application used to read the document.

Contents of the “Attributes” property field taken from the source document.

Contents of the “Category” property field taken from the source document.

KV_PresentationTarget Target format for presentations (35mm, printer, video, and so forth).

KV_Bytes Contents of the “Size” property field taken from the source document. The field provides the size of the file in bytes.

KV_Lines

KV_Paragraphs

Contents of the “Lines” property field taken from the source document. The field provides the number of lines in the document.

Contents of the “Paragraphs” property field taken from the source document. The field provides the number of paragraphs in the document.

KV_Slides

KV_Notes

Contents of the “Slides” property field taken from a presentation document.

The field provides the number of slides in the document.

Contents of the “Notes” property field taken from a presentation document.

The field provides the number of notes in the document.

XML Export SDK C Programming Guide

KVSumType

KV_HiddenSlides

KV_MMClips

KV_ScaleCrop

KV_HeadingPairs

KV_TitlesofParts

KV_Manager

KV_Company

KV_LinksUpToDate

KV_HyperlinkBase

KV_Layouts

KV_Objects

KV_FileVersion

KV_LastFileVersion

KV_OrigFileVersion

KV_OrigFileType

KV_OrigAppVersion

KV_ContentStatus

KV_UserDefined

Contents of the “Hidden slides” property field taken from a presentation document. The field provides the number of hidden slides in the document.

Contents of the “Multimedia clips” property field taken from a presentation document. The field provides the number of multimedia clips in the document.

Boolean specifies whether thumbnails are cropped or scaled.

Internally used property indicating the grouping of different document parts and the number of items in each group.

Contents of the “Document Contents” property field taken from the source document. The field contains a list of the parts of the file, such as the names of macro sheets in Microsoft Excel or the headings in Word.

Contents of the “Manager” property field taken from the source document.

Contents of the “Company” property field taken from the source document.

Boolean specifies whether links in the document are resolved and current.

The base address used for all relative links in the file.

The number of layouts in the AutoCAD drawing.

The approximate number of objects in the AutoCAD drawing.

The AutoCAD version (for example, R13, R14) of the drawing.

The AutoCAD version (for example, R13, R14) that the AutoCAD drawing was last saved as.

The AutoCAD version (for example, R13, R14) of the original source file.

The AutoCAD file type (for example, DWG, DXF or DWB) of the original source file.

The AutoCAD version (for example, R13, R14) of the application that created the originally source file.

The status of the content, for example Draft, Reviewed, or Final.

Contents of the first entry in the array of non-standard metadata. This could be user-defined metadata, or metadata unique to a file type.

XML Export SDK C Programming Guide

289

290

Chapter 11 Enumerated Types

LPDF_DIRECTION

This enumerated type defines the paragraph direction of extracted paragraphs from a PDF file when logical order is enabled. See

“Convert PDF Files to a Logical

Reading Order” on page It is defined in kvtypes.h.

Definition typedef enum{

LPDF_RAW = 0,

LPDF_LTR,

LPDF_RTL,

LPDF_AUTO

} LPDF_DIRECTION ;

Enumerators

LPDF_RAW

LPDF_LTR

LPDF_RTL

LPDF_AUTO

Unstructured paragraph flow. This is the default behavior.

Logical reading order and left-to-right paragraph direction.

Logical reading order and right-to-left paragraph direction.

Logical reading order. The PDF reader determines the paragraph direction for each PDF page, and then sets the direction accordingly.

This is the default when logical order is enabled.

XML Export SDK C Programming Guide

Appendixes

This section lists supported formats, supported character sets and redistributed files, and provides information on format detection. It contains the following appendixes:

Supported Formats

Files Required for Redistribution

Export Tokens

Character Sets

File Format Detection

File Formats and Extensions

Extract and Format Lotus Notes Sub Files

Password Protected Files

Appendixes

292

• XML Export SDK C Programming Guide

A PPENDIX A

Supported Formats

This section lists information about the file formats that can be detected and processed (either filtered, converted, or displayed) by the KeyView suite of products. The KeyView suite includes KeyView Filter SDK, KeyView Export SDK, and KeyView Viewing SDK.

Supported Formats

Archive Formats

Binary Format

Computer-Aided Design Formats

Database Formats

Desktop Publishing

Display Formats

Graphic Formats

Mail Formats

Multimedia Formats

Presentation Formats

Spreadsheet Formats

Text and Markup Formats

Word Processing Formats

Supported Formats (Detected)

XML Export SDK C Programming Guide

293

294

Appendix A Supported Formats

Supported Formats

The tables in this section provide the following information:

 The file formats supported by the Filter API, Export API, Viewing API, and File

Extraction API. The supported versions and the format’s extension are also listed.

The formats listed in this section can also be detected by the KeyView format detection module ( kwad ). The section

“Supported Formats (Detected)” on page lists formats that can be detected, but cannot be filtered, converted,

or displayed.

The file formats for which KeyView can detect and extract the character set and metadata information (properties such as title, author, and subject).

Even though a file format may be able to provide character set information, some documents may not contain character set information. Therefore, the document reader would not be able to determine the character set of the document. In this case, either the operating system code page or the character set specified in the API is used.

The document reader used to filter each format.

Symbol Description

Y

N

P

T

M

Format is supported.

Metadata can be extracted for this format.

Character set can be determined for this format.

Format is not supported.

Metadata cannot be extracted for this format.

Character set cannot be determined for this format.

Partial metadata is extracted from this format. Some non-standard fields are not extracted.

Only text is extracted from this format. Formatting information is not extracted.

Only metadata (title, subject, author, and so on) is extracted from this format. Text and formatting information are not extracted.

XML Export SDK C Programming Guide

Archive Formats

Format

7-Zip

AD1

BinHex

Bzip2

Expert Witness

Compression

Format (EnCase)

GZIP n/a

6

7

Version

4.57

n/a n/a

2

ISO

Java Archive

Legato

EMailXtender

Archive

MacBinary

Mac Disk Copy Disk

Image

Microsoft Backup

File

Microsoft Cabinet format

Microsoft Compiled

HTML Help n/a n/a n/a n/a n/a n/a

1.3

3

Reader z7zsr ad1sr kvhqxsr

Extension Filter Export View Extract Metadata Charset

7Z N N Y Y N n/a

AD1

HQX

N

N

N

N

Y

Y

Y

Y

N

N n/a n/a bzip2sr encasesr

BZ2 encase2sr Lx01

N

E01, L01 N

N

N

N

N

Y

Y

Y

Y

Y

Y

N

N

N n/a n/a n/a kvgzsr kvgz isosr unzip emxsr

GZ

GZ

ISO

JAR

EMX

N

N

N

N

N

N

N

N

N

N

Y

Y

N

Y

Y

Y

Y

Y

N

Y

N

N

N

N

N n/a n/a n/a n/a n/a

N

N

N

N

N

N

N

N

N

Header

/Footer

N

N macbinsr dmgsr bkfsr cabsr chmsr

BIN

DMG

BKF

CAB

CHM

N

N

N

N

N

N

N

N

N

N

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

N

N

N

N

N n/a n/a n/a n/a n/a

N

N

N

N

N

Format

Microsoft

Compressed Folder

Version n/a

Reader lzhsr

PKZIP

RAR archive

Tape Archive

UNIX Compress through 9.0

unzip

2.0 through

3.5

rarsr n/a n/a tarsr kvzeesr

UUEncoding all versions

Windows Scrap File n/a kvzee uudsr olesr

WinZip through 10 unzip

Binary Format

TAR

Z

Z

UUE

SHS

ZIP

Extension Filter Export View Extract Metadata Charset

LZH

LHA

N N N Y N n/a

ZIP

RAR

N

N

N

N

Y

N

Y

Y

N

N n/a n/a

N

N

Header

/Footer

N

N

N

N

N

N

N

N

N

N

N

N

N

Y

Y

Y

N

N

Y

N

Y

Y

Y

Y

Y

N

N

N

N

N

N n/a n/a n/a n/a n/a n/a

N

N

N

N

N

N

Format Version Reader Extension Filter

Executable n/a

Link Library n/a exesr exesr

EXE

DLL

N

N

Export

N

N

View

Y

Y

Extract Metadata

N

N

N

N

Charset n/a n/a

Header/

Footer

N

N

Computer-Aided Design Formats

Format

AutoCAD

Drawing

AutoCAD

Drawing

Exchange

CATIA formats

Version

R13, R14,

R15/2000,

2004, 2007,

2010, 2013

R13, R14,

R15/2000,

2004, 2007,

2010, 2013

5

Microsoft Visio 4, 5, 2000,

2002, 2003,

2007, 2010

5

2013

Reader kpODArdr kpDWGrdr

1 kpODArdr kpDXFrdr kpCATrdr vsdsr kpVSDrdr

ActiveX

1

components

Extension Filter Export View

DWG

DXF

Y

Y

CAT

4

VSD

VSD, VSS

VST

N

VSDM

VSSM

VSTM

VSDX

VSSX

VSTX

N

Y

Y

Y

2

Y

N

Y

Y

N

3

Y

Y

N

Y

Y

Y

2

1

7

Extract Metadata Charset

N

N

N

Y

N

N

6

Y

Y

Y

Y

Y

Y

Y

Y

N

Y

Y

N

Header/

Footer

N

N

N

N

N

N

1. On Windows platforms, kpODArdr is used for all versions up to 2007 and graphic rendering is supported; for later versions, only text extraction is supported through the kpDWGrdr or kpDXFrdr reader.

2. On non-Windows platforms, graphic rendering is supported through the kpDWGrdr reader for versions R13, R14, R15, and R18 (2004); for other versions, only text extraction is supported.

3. On non-Windows platforms, graphic rendering is supported through the kpDXFrdr reader for versions R13, R14, R15, and R18 (2004); for other versions, only text extraction is supported.

4. All CAT file extensions, for example CATDrawing, CATProduct, CATPart, and so on.

5. Viewing and Export use the graphic reader, kpVSDrdr, for Microsoft Visio 2003, 2007, and 2010, and vsdsr for all earlier versions; image fidelity in Viewing and Export is therefore only supported for versions 2003 and above. Filter uses vsdsr for all versions.

6. Extraction of embedded OLE objects is supported for Filter on Windows platforms only.

7. Visio 2013 is supported in Viewing only, with the support of ActiveX components from the Microsoft Visio 2013 Viewer. Image fidelity is supported but other features, such as highlighting, are not.

Database Formats

Format Version dBase Database III+, IV

Microsoft Access 95, 97, 2000,

2002, 2003,

2007, 2010,

2013

Microsoft Project 2000, 2002,

2003, 2007,

2010, 2013

Reader Extension Filter Export View Extract Metadata Charset dbfsr mppsr

DBF mdbsr MDB,

ACCDB

MPP

1. Charset is not supported for Microsoft Access 95 or 97.

Y

Y

Y

Y

T

Y

Y

T

Y

N

N

Y

N

N

Y

N

Y

1

Y

Header

/Footer

N

N

N

Desktop Publishing

Format

Microsoft

Publisher

Version Reader Extension Filter Export View Extract

98 to 2013 mspubsr PUB Y T T Y

Metadata

Y

Charset

Y

Header/

Footer

N

Display Formats

Format Version Reader

Adobe PDF 1.1 to 1.7

pdfsr kppdfrdr kppdf2rdr

2

Extension Filter Export View Extract Metadata

PDF

PDF

PDF

Y

N

N

Y

Y

Y

N

Y

Y

Y

1

N

N

Y

N

N

Charset

Y

N

N

Header/

Footer

N

N

N

1. Includes support for extraction of subfiles from PDF Portfolio documents.

2. kppdf2rdr is an alternate graphic-based reader that produces high-fidelity output but does not support other features such as highlighting or text searching.

Graphic Formats

Format

Computer Graphics

Metafile

CorelDRAW

2

Version Reader n/a kpcgmrdr

1

Extension

CGM

Filter Export View Extract Metadata Charset

Y Y Y N N N

Header

/Footer

N through

9.0

10, 11,

12, X3 n/a n/a kpcdrrdr kpdcxrdr dcmsr

CDR

DCX

DCM

N

N

M

Y

Y

N

Y

Y

N

N

N

N

N

N

Y

N

N

N

N

N

N

DCX Fax System

Digital Imaging &

Communications in

Medicine (DICOM)

Encapsulated

PostScript (raster)

Enhanced Metafile

TIFF header n/a kpepsrdr kpemfrdr

EPS

EMF

N

Y

Y

Y

Y

Y

N

N

N

Y

N

N

N

N

Format

GIF

JBIG2

JPEG

JPEG 2000

Lotus AMIDraw

Graphics

Lotus Pic

Macintosh Raster

MacPaint

Microsoft Office

Drawing

Omni Graffle

PC PaintBrush

Portable Network

Graphics

SGI RGB Image

Sun Raster Image n/a

2 n/a n/a n/a

3 n/a

Version Reader

87, 89 kpgifrdr gifsr n/a n/a

Extension

GIF kpJBIG2rdr JBIG2 kpjpgrdr JPEG n/a n/a jpgsr kpjp2000rdr JP2, JPF,

J2K, jp2000sr

JPWL,

JPX, PGX kpsdwrdr SDW N

M

N

N

N

M

Filter Export View Extract Metadata Charset

N

M

Y

M

Y

N

N

N

N

Y

N

N

Y

Y

M

Y

M

Y

Y

N

Y

N

N

N

N

N

N

N

N

Y

N

Y

N

N

N

N

N

N

N

N

N

N

Header

/Footer

N

N

Y Y N N N N n/a n/a kppicrdr kppctrdr kpmacrdr kpmsordr kpGFLrdr kppcxrdr kppngrdr pngsr kpsgirdr kpsunrdr

PIC

PIC

PCT

PNTG

MSO

RGB

RS

Y

N

N

N

GRAFFLE Y

PCX N

PNG

PNG

N

M

N

N

Y

Y

Y

Y

N

Y

Y

M

Y

Y

Y

Y

Y

Y

N

Y

Y

N

Y

Y

N

N

N

N

N

N

N

N

N

N

N

N

N

N

Y

N

N

Y

N

N

N

N

N

N

Y

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

Format

Tagged Image File

Truevision Targa

Windows Animated

Cursor

Windows Bitmap

Version Reader through

6.0

3 tifsr kptifrdr

2 n/a kptrardr kpanirdr n/a

Windows Icon Cursor

Windows Metafile n/a

3

WordPerfect Graphics 1 1

WordPerfect Graphics 2 2, 7

1. Files with non-partitioned data are supported.

2. CDR/CDR with TIFF header.

kpbmprdr bmpsr kpicordr kpwmfrdr kpwpgrdr kpwg2rdr

Extension

TIFF

TIFF

TGA

ANI

BMP

BMP

ICO

WMF

WPG

WPG

Filter Export View Extract Metadata Charset

M

N

N

N

N

M

N

Y

N

N

M

Y

Y

Y

Y

M

Y

Y

Y

Y

N

Y

Y

Y

Y

N

Y

Y

Y

Y

N

N

N

N

N

N

N

N

N

N sional, CCITT Group 4 T6, LZW, JPEG (only Gray, RGB and CMYK color space are supported), and PackBits.

Y

N

N

N

N

Y

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

Header

/Footer

N

N

N

N

N

N

N

N

Mail Formats

Format

Documentum

EMCMF

Domino XML

Language

1

Version n/a n/a

GroupWise FileSurf n/a

Legato Extender n/a

Lotus Notes database

Mailbox

2

4, 5, 6.0, 6.5,

7.0, 8.0

Thunderbird

1.0, Eudora 6.2

2004 Microsoft

Entourage

Database

Microsoft Outlook 97, 2000, 2002,

2003, 2007,

2010, 2013

5.0, 6.0

Microsoft Outlook

DBX

Microsoft Outlook

Express

Windows 6

MacIntosh 5

1.0, 2.0

Reader msgsr dxlsr gwfssr onmsr nsfsr mbxsr

3 entsr msgsr

3

dbxsr

emlsr

3

mbxsr

3

icssr Microsoft Outlook iCalendar

Microsoft Outlook for Macintosh

2011 olmsr

Extension Filter Export View Extract Metadata Charset

EMCMF N N Y Y Y Y

DXL

GWFS

ONM

NSF

MBX various

N

N

N

N

N

N

MSG,

OFT

DBX

Y

N

EML

EML

ICS, VCS N

Y

N

OLM N

N

N

N

N

N

N

T

N

T

N

N

N

Y

Y

Y

Y

T

Y

T

Y

T

T

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

N

N

N

N

N

Y

Y

Y

Y

Y

Y

Y

Y

4

N

N

Header

/Footer

N

N

N

N

N

N

N

N

N

N

N

Format

Microsoft Outlook

Offline Storage File

Microsoft Outlook

Personal Folder

Version

97, 2000, 2002,

2003, 2007,

2010, 2013

97, 2000, 2002,

2003, 2007,

2010, 2013

97, 2000, 2002,

2003, 2007,

2010, 2013

2.1, 3.0, 4.0

Reader pffsr

pstsr

3 ,5

pstnsr

Extension Filter Export View Extract Metadata Charset

OST

PST

PST

N

N

N

N

N

N

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

N

Y

Header

/Footer

N

N

N

Microsoft Outlook vCard Contact

Text Mail (MIME) vcfsr VCF Y Y T N Y N N n/a

emlsr

3

mbxsr

3

tnefsr various various various

Y

Y

N

T

T

N

T

T

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

N

N

N Transport Neutral

Encapsulation

Format n/a

1. Only supports non-encrypted embedded files.

2. KeyView supports MBX files created by Eudora Email, and Mozilla Thunderbird. MBX files created by other common mail applications are typically filtered, converted, and displayed.

3. This reader supports both clear signed and encrypted S/MIME. KeyView supports S/MIME for PST, EML, MBX, and MSG files.

4. Returns “Unicode” character set for version 2003 and up, and “Unknown” character set for previous versions.

5. Uses Microsoft Messaging Application Programming Interface (MAPI).

Multimedia Formats

Viewing SDK plays some multimedia files using the Windows Media Control Interface (MCI). MCI is a set of Windows APIs that communicate with multimedia devices.

Format

Advanced Systems

Format

Audio Interchange

File Format

Microsoft Wave

Sound

MIDI

MPEG-1 Audio layer 3

MPEG-1 Video

MPEG-2 Audio

MPEG-4 Audio

NeXT/Sun Audio

QuickTime Movie

Windows Video

Version

1.2

n/a n/a n/a n/a

2, 3, 4

Reader asfsr

MCI aiffsr

MCI riffsr n/a MCI

ID3 v1 and v2 MCI

2, 3 n/a mp3sr

MCI

Extension Filter Export View Extract Metadata Charset

ASF

WMA

WMV

N N N N Y N

AIFF

AIFF

WAV

WAV

MID

MP3

MP3

MPG

MCI mpeg4sr MP4

3GP

MCI

MCI

MPEGA

AU

QT

MOV

N

M

N

M

N

N

M

N

N

M

N

N

N

N

N

N

N

N

M

N

N

N

N

N

Y

N

Y

N

Y

Y

Y

Y

Y

N

Y

Y

N

N

N

N

N

N

N

N

N

N

N

N

N

Y

N

Y

N

N

Y

N

N

Y

N

N

N

N

N

N

N

N

N

N

N

N

N

N

2.1

MCI AVI N N Y N N N

N

N

Header/

Footer

N

N

N

N

N

N

N

N

N

N

N

N

Presentation Formats

NOTE Depending on the default multimedia player installed on your computer, the View API may not be able to play some supported multimedia formats. To play multimedia files, the View API uses the Windows Media Control Interface (MCI) to communicate with the multimedia player installed on your computer. If the player does not play a multimedia file that is supported by the Viewing SDK, the

View API will not be able to play the file.

If you cannot play a supported multimedia file using the View API, install a different multimedia player or compressor/decompressor (codec) component.

Format

Applix Presents

Version

Apple iWork Keynote 2, 3, ‘08,

‘09

4.0, 4.2,

4.3, 4.4

Corel Presentations 6, 7, 8, 9,

10, 11, 12,

X3 n/a Extensible Forms

Description

Language

Lotus Freelance

Graphics

Lotus Freelance

Graphics 2

Macromedia Flash

Microsoft OneNote

96, 97, 98,

R9, 9.8

2 through 8.0

2007,

2010, 2013

Reader Extension kpIWPGrdr GZ kpagrdr kpshwrdr kpXFDLrdr kpprzrdr kpprerdr swfsr kpONErdr

AG

SHW

XFD

XFDL

PRZ

PRE

SWF

ONE

ONETOC2

Filter Export View Extract Metadata Charset

Y Y Y N Y Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

N

N

N

N

N

N

Y

N

N

Y

N

N

N

N

N

N

Y

N

N

Y

Y

1

N

N

N

N

N

N

N

Header

/Footer

N

Format

Microsoft

PowerPoint

Macintosh

Version

98

2001, v.X,

2004

Reader kpp40rdr kpp97rdr

Extension

PPT

PPT

PPS

POT

PPT Microsoft

PowerPoint PC

Microsoft

PowerPoint

Windows

Microsoft

PowerPoint

Windows

Microsoft

PowerPoint

Windows XML

OASIS Open

Document Format

OpenOffice Impress

StarOffice Impress

4

95

97, 2000,

2002, 2003

2007,

2010, 2013

1, 2

3

1, 1.1

6, 7 kpp40rdr kpp95rdr kpp97rdr kpppxrdr kpodfrdr sosr sosr

PPT

SXD

SXI

ODG

ODP

SXI

SXP

ODP

SXI

SXP

ODP

PPT

PPS

POT

PPTX

PPTM

POTX

POTM

PPSX

PPSM

PPAM

1. The character set cannot be determined for versions 5.x and lower.

Y

Y

Y

Y

Y

Y

Y

Filter Export View Extract Metadata Charset

Y

Y

Y

Y

Y

Y

N

N

N

P

N

Y

Header

/Footer

N

N

Y

Y

Y

Y

Y

T

T

Y

Y

Y

Y

Y

T

T

N

N

Y

Y

Y

N

N

4

P

P

P

Y

Y

Y

Y

N

Y

Y

Y

Y

Y

Y

N

N

Y

Y

N

N

N

2

2. Slide footers are supported for Microsoft PowerPoint 97 and 2003.

3. Generated by OpenOffice Impress 2.0, StarOffice 8 Impress, and IBM Lotus Symphony Presentation 3.0.

4. Supported using the embedded objects reader olesr..

Spreadsheet Formats

Format

Apple iWork Numbers

Applix Spreadsheets

Comma Separated

Values

Corel Quattro Pro

Version

‘08, ‘09 n/a

Reader iwsssr

4.2, 4.3, 4.4

assr csvsr

Extension Filter Export View Extract Metadata Charset

GZ

AS

CSV

5, 6, 7, 8

Data Interchange

Format

Lotus 1-2-3

X4 n/a

Lotus 1-2-3

Lotus 1-2-3 Charts

96, 97, R9,

9.8

2, 3, 4, 5

2, 3, 4, 5

Microsoft Excel Charts 2, 3, 4, 5, 6,

7

Microsoft Excel

Macintosh

98, 2001, v.X, 2004 qpssr qpwsr difsr l123sr xlssr

WB2

WB3

QPW

DIF

123 wkssr WK4 kpchtrdr 123 kpchtrdr XLS

XLS

Y

Y

Y

Y

Y

Y

Y

Y

N

N

Y

Y

Y

Y

Y

N

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

N

N

N

N

N

N

N

N

N

N

Y

1

Y

N

N

P

P

N

P

N

N

N

Y

Y

Y

N

Y

Y

N

Y

Y

N

N

Y

N

N

N

Header/

Footer

N

N

N

N

N

N

N

N

Format

Microsoft Excel

Windows

Microsoft Excel

Windows XML

Version

2.2 through

2003

2007, 2010,

2013

Reader xlssr xlsxsr

Extension Filter Export View Extract Metadata Charset

XLS

XLW

XLT

XLA

Y Y Y Y

2

Y Y

Y Y Y Y Y Y XLSX

XLTX

XLSM

XLTM

XLAM

XLSB Y Y Y N N N Microsoft Excel Binary

Format

Microsoft Works

Spreadsheet

OASIS Open Document

Format

OpenOffice Calc

StarOffice Calc

2007, 2010,

2013

2, 3, 4

1, 2

3

1, 1.1

6, 7 xlsbsr mwssr odfsssr sosr sosr

S30

S40

ODS

SXC

STC

SXC

ODS

OTS

SXC

ODS

Y

Y

Y

Y

Y

Y

T

T

Y

Y

T

T

1. Supported using the embedded objects reader olesr.

2. Supported for versions 97 and higher using the embedded objects reader olesr.

3. Generated by OpenOffice Calc 2.0, StarOffice 8 Calc, and IBM Lotus Symphony Spreadsheet 3.0.

N

Y

N

N

1

N

Y

Y

Y

Y

Y

Y

Y

Header/

Footer

Y

Y

N

N

N

N

N

Text and Markup Formats

Format

ANSI

ASCII

HTML

Microsoft Excel Windows

XML

Microsoft Word Windows

XML

Microsoft Visio XML

Version n/a n/a

3, 4

2003

2003

2003

MIME HTML

Rich Text Format

Unicode HTML

Unicode Text

XHTML

XML (generic)

Reader afsr afsr htmsr xmlsr xmlsr XML n/a

1 through

1.7

n/a

3, 4

1.0

1.0

xmlsr mhtsr rtfsr

VDX

VTX

MHT

RTF unihtmsr HTM unisr TXT htmsr xmlsr

HTM

XML

Extension Filter Export View Extract Metadata Charset

TXT Y Y Y N N N

TXT

HTM

XML

Y

Y

Y

Y

Y

T

Y

Y

T

N

N

N

N

P

Y

N

Y

Y

N

N

Header/

Footer

N

N

Y

Y

Y

Y

Y

Y

Y

Y

T

T

Y

Y

Y

Y

Y

T

T

T

Y

Y

Y

Y

Y

T

N

N

N

N

N

N

N

N

Y

Y

Y

P

Y

N

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

N

N

N

Y

N

N

N

N

Word Processing Formats

Format

Adobe FrameMaker

Interchange Format

Apple iChat Log

Apple iWork Pages

Applix Words

Corel WordPerfect

Linux

Corel WordPerfect

Macintosh

Corel WordPerfect

Windows

Corel WordPerfect

Windows

DisplayWrite

Folio Flat File

Founder Chinese

E-paper Basic

Fujitsu Oasys

Haansoft Hangul

Health level7

Version

5, 5.5, 6, 7

Reader mifsr

Extension

MIF

Filter Export View Extract Metadata Charset

Y Y Y N N Y

Header/

Footer

N

1, AV 2

AV 2.1, AV 3

‘08, ‘09

3.11, 4, 4.1,

4.2, 4.3, 4.4

6.0, 8.1

ichatsr iwwpsr awsr wp6sr

ICHAT

GZ

AW

WPS

1.02, 2, 2.1,

2.2, 3, 3.1

5, 5.1

wpmsr wosr

WPM

WO

6, 7, 8, 9, 10,

11, 12, X3

4

3.1

3.2.1

wp6sr dw4sr foliosr cebsr

1

7

97

2002, 2005,

2007, 2010

2.0

oa2sr hwpsr

OA2

HWP hwposr HWP hl7sr HL7

WPD

IP

FFF

CEB

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

N

Y

N

T

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

N

Y

N

T

Y

N

N

N

N

N

N

N

N

N

N

N

N

Y

N

N

Y

N

P

P

P

N

N

Y

N

P

N

Y

Y

N

Y

Y

Y

Y

Y

Y

Y

Y

N

N

Y

Y

Y

N

N

Y

Y

Y

N

N

N

N

N

N

Y

N

N

Format

IBM DCA/RFT

(Revisable Form Text)

JustSystems Ichitaro

Lotus AMI Pro

Lotus AMI Professional

Write Plus

Lotus Word Pro

Lotus SmartMaster

Microsoft Word

Macintosh

Version

SC23-0758-

1

8 through

2013

2, 3

2.1

Reader dcasr jtdsr lasr lasr

96, 97, R9

96, 97

4, 5, 6, 98

2001, v.X,

2004 lwpsr lwpsr mbsr mw8sr

4, 5, 5.5, 6 mwsr

1.0 and 2.0

misr

Microsoft Word PC

Microsoft Word

Windows

Microsoft Word

Windows

Microsoft Word

Windows

Microsoft Word

Windows XML

6, 7, 8, 95

97, 2000,

2002, 2003

2007, 2010,

2013 mw6sr

Microsoft Works

Microsoft Works

1, 2, 3, 4

6, 2000

Extension

DC

JTD

SAM

AMI

LWP

MWP

DOC

DOC

DOT

DOC

DOC

DOC mw8sr mwxsr

DOC

DOT

DOCM

DOCX

DOTX

DOTM mswsr WPS msw6sr WPS

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Filter Export View Extract Metadata Charset

Y Y Y N N Y

Header/

Footer

N

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

N

N

N

N

N

N

Y

N

N

N

Y

Y

N

N

2

2

P

P

N

P

N

Y

Y

N

N

Y

Y

Y

N

N

N

Y

N

N

N

N

Y

N

N

Y

Y

Y

N

N

Y

Y

Y

Y

N

Y

N

Y

Y

Y

Y

Y

Y

Y

Format

Microsoft Windows

Write

OASIS Open

Document Format

Omni Outliner

OpenOffice Writer

Version

1, 2, 3

1, 2

3 v3, OPML,

OOutline

1, 1.1

Reader mwsr

Extension

WRI odfwpsr

ODT

SXW

STW oo3sr OO3

OPML

OOUTLINE sosr epubsr

SXW

ODT

EPUB

Filter Export View Extract Metadata Charset

Y

Y

Y

Y

Y

Y

Y

T

Y

Y

Y

T

N

Y

N

N

2

N

Y

N

Y

Y

Y

Y

Y

Header/

Footer

N

Y

N

N

Open Publication

Structure eBook

StarOffice Writer

2.0, 3.0

Y Y Y N Y Y N

Skype Log

WordPad

6, 7

3 through

2003 n/a sosr skypesr rtfsr

SXW

ODT

DBB

RTF

Y

Y

Y

T

Y

Y

T

Y

Y

N

N

N

Y

N

P

Y

N

Y

N

N

N

XML Paper

Specification

XyWrite

Yahoo! Instant

Messenger

4.12

n/a xpssr xywsr yimsr

4

XPS

XY4

DAT

Y

Y

Y

T

Y

Y

T

Y

Y

N

N

N

N

N

N

N

N

N

N

N

N

1. This reader is only supported on Windows 32-bit platforms.

2. Supported using the embedded objects reader olesr.

3. Generated by OpenOffice Writer 2.0, StarOffice 8 Writer, and IBM Lotus Symphony Documents 3.0.

4. To successfully use this reader, you must set the KV_YAHOO_ID environment variable to the Yahoo user ID. You can optionally set the

KV_OTHER_YAHOO_ID environment variable to the other Yahoo user ID. If you do not set it, “Other” is used by default. If you enter incorrect values for the environment variables, erroneous data is generated.

Supported Formats (Detected)

Supported Formats (Detected)

The file formats listed in this section can be detected by the KeyView format detection module ( kwad ), but cannot be filtered, converted, or displayed. The detection module determines a file’s format and reports the information to the developer’s application.

The formats listed in

“Supported Formats” on page can be detected as well as

filtered, exported, and viewed.

Ability Office (SS, DB, GR, WP, COM)

ACT

Adobe FrameMaker Markup Language

Aldus Freehand (Macintosh)

Aldus PageMaker (Macintosh)

Amiga MOD sound

Apple Double

Apple Single

Appleworks

Applix Asterix

ARC/PAK Archive

ASCII-armored PGP encoded

ASCII-armored PGP signed

AutoDesk Animator Pro FLIC Animation

AutoShade Rendering

CADAM Drawing

CCITT Group 3 1-Dimensional (G31D)

Compactor/Compact Pro Archive

Corel Draw CMX

CPT Communication

Curses Screen Image (UNIX/VAX/SUN)

DCX Fax

AC3 audio

Adobe FrameMaker

AES Multiplus Comm

Aldus PageMaker (DOS)

Amiga IFF-8SVX sound

Apple Binary Property List

Apple Photoshop Document

Apple XML Property List

Applix Alis

Applix Graphics

ARJ Archive

ASCII-armored PGP Public Keyring

AutoDesk Animator FLIC Animation

AutoDesk WHIP

BlackBerry Activation File

CADAM Drawing Overlay

COMET TOP Word

Convergent Tech DEF Comm. cpio Archive (UNIX/VAX/SUN)

Creative Voice (VOC) sound

Data Point VISTAWORD

DEC WPS PLUS

XML Export SDK C Programming Guide

313

314

Appendix A Supported Formats

DECdx

Device Independent file (DVI)

Desktop Color Separation (DCS)

Digital Imaging and Communications in

Medicine ( DICOM)

DG CEOwrite

DIF Spreadsheet

Disk Doubler Compression

ENABLE eFax

Executable UNIX/VAX/SUN

Framework

Freehand 11

GEM Bit Image

Google SketchUp

DG Common Data Stream (CDS)

Digital Document Interchange Format

(DDIF)

EBCDIC Text

ENABLE Spreadsheet (SSF)

Envoy (EVY)

FileMaker (Macintosh)

Framework II

FTP Session Data

Ghost Disk Image

Graphics Environment Manager (GEM

VDI)

Harvard Graphics

Honey Bull DSA101

HP Graphics Language (Plotter)

IBM 1403 Line Printer

Hewlett-Packard

HP Graphics Language (HP-GL)

HP PCL and PJL Languages

IBM DCA-FFT

IBM DCF Script Informix SmartWare II

Informix SmartWare II Communication File Informix SmartWare II Database

Informix SmartWare Spreadsheet

Java Class file

Interleaf

JPEG File Interchange Format (JFIF)

KW ODA G31D (G31)

KW ODA Internal G32D (G32)

Lasergraphics Language

Lotus Notes Bitmap

Lotus Screen Cam

Macromedia Director

MacWrite II

KW ODA G4 (G4)

KW ODA Internal Raw Bitmap (RBM)

Link Library UNIX/VAX/SUN

Lotus Notes CDF

Lyrix

MacWrite

MASS-11

XML Export SDK C Programming Guide

Supported Formats (Detected)

MATLAB MAT Format

Microsoft Access 2007

Micrografx Designer

Microsoft Access 2007 Template

Microsoft Compiled HTML Help Microsoft Common Object File Format

(COFF)

Microsoft Device Independent Bitmap

Microsoft Excel 2007 Macro-Enabled

Spreadsheet Template

Microsoft Document Imaging (MDI)

Microsoft Excel 2007 Spreadsheet

Template

Microsoft Exchange Server Database File Microsoft Object File Library

Microsoft Office Drawing Microsoft Office Groove

Microsoft Outlook Restricted Permission

Message File

Microsoft Windows Cursor (CUR)

Graphics

Microsoft Windows Group File

Microsoft Windows Icon (ICO)

Microsoft Windows Help File

Microsoft Windows NT Event Log

Microsoft Windows OLE 2 Encapsulation Microsoft Windows Vista Event Log

Microsoft Word (UNIX) Microsoft Works (Macintosh)

Microsoft Works Communication

(Macintosh)

Microsoft Works Communication

(Windows)

Microsoft Works Database (Macintosh)

Microsoft Works Database (Windows)

Microstation

MORE Database Outliner (Macintosh)

MS DOS Batch File format

MultiMate 4.0

Navy DIF

NBI Net Archive Format

Microsoft Works Database (PC)

Microsoft Works Spreadsheet (Macintosh)

Milestone Document

MPEG-PS container with CDXA stream

MS DOS Device Driver

Multiplan Spreadsheet

NBI Async Archive Format

Netscape Bookmark file

Nero Encrypted File

NIOS TOP

NURSTOR Drawing

ODA/ODIF

Office Writer

OLIDIF

NeWS font file (SUN)

Nota Bene

Object Module UNIX/VAX/SUN

ODA/ODIF (FOD 26)

OLE DIB object

Open PGP (new format packets)

XML Export SDK C Programming Guide

315

316

Appendix A Supported Formats

OS/2 PM Metafile Graphics

Paradox (PC) Database

PC Library Module

PC True Type Font

PeachCalc Spreadsheet

PEX Binary Archive (SUN)

PGP Encrypted Data

PGP Secret Keyring

PGP Signed and Encrypted Data

Philips Script

Portable Bitmap Utilities (PBM)

Portable Pixmap Utilities (PPM)

PostScript Type 1 Font File

Program Information File

Q & A for Windows

Quadratron Q-One (V2.0)

QuickDraw 3D Metafile (3DMF)

RealLegal E-Transcript

Reflex Database

RIFF MIDI

SAMNA Word IV

SEG-Y Seismic Data format

SGML

SMTP document

Stuff It Archive (Macintosh)

Supercalc Spreadsheet

Symphony Spreadsheet

PaperPort image file

PC COM executable (detected in file mode only)

PC Object Module

PCD Image

Persuasion Presentation

PGP Compressed Data

PGP Public Keyring

PGP Signature Certificate

PGP Signed Data

Plan Perfect

Portable Greymap Utilities (PGM)

PostScript File

PRIMEWORD

Q & A for DOS

Quadratron Q-One (V1.93J)

Quark Express (Macintosh)

Real Audio

RealMedia Streaming Media

RIFF Device Independent Bitmap

RIFF Multimedia Movie

Samsung Electronics JungUm Global format

Serialized Object Format (SOF)

Encapsulation

Simple Vector Format (SVF)

SolidWorks

SUN vfont definition

SYLK Spreadsheet

Targon Word (V 2.0)

XML Export SDK C Programming Guide

Supported Formats (Detected)

Ultracalc Spreadsheet

Uniplex Ucalc Spreadsheet

Usenet format

VRML 2.0

Wang Office GDL Header Encapsulation

Wang WITA

Web ARChive (WARC)

Windows Journal

Windows Palette

Word Connection

WordMARC word processor

Uniplex (V6.01)

UNIX SHAR Encapsulation

VRML

Volkswriter

WANG PC

WANG WPS Comm.

Windows C++ Object Storage

Windows Micrografx Draw (DRW)

Windows scrap file (SHS)

WordERA (V 1.0)

WordPerfect General File

WordStar 6.0

Writing Assistant word processor

X Image

Xerox 860 Comm.

Xerox Writer word processor

WriteNow

X Bitmap (XBM)

X Pixmap (XPM)

Xerox DocuWorks

Yahoo! Messenger chat log

XML Export SDK C Programming Guide

317

Appendix A Supported Formats

318

• XML Export SDK C Programming Guide

A PPENDIX B

Files Required for

Redistribution

This section lists the Export files that may be redistributed in your applications under the licensing agreement. These files are in the directory install\OS\ bin , where install is the pathname of the Export installation directory and OS is the name of the operating system. This section contains the following topics:

Core Files

Support Files

Document Readers and Writers

Document Type Definition Files

NOTE On Windows systems, the libraries are .dll files. On

UNIX systems, the libraries are .so, .a, or .sl files.

XML Export SDK C Programming Guide

319

320

Appendix B Files Required for Redistribution

Core Files

The following core files may be redistributed with your application:

File formats_e.ini

htmlexport.* xmlcnv.* kpifcnvt.* kpifutil.* kvxtract.* kvxml.* kvexport.* kvolefio.* kvutil.* kvxpgsa.* kvxsssa.* kvxwpsa.* kwad.* regsvr32.exe

txtcnv.* xmlexport.*

Description

Initialization file. For more information on this file, see

“Determine

Format Support” on page .

Required by the Java API.

XML converter for the document token stream.

Graphic conversion routines.

Graphic utility routines.

File Extraction interface.

XML Export C API.

Export C API. Interface to the HTML and XML Export C APIs.

Embedded OLE object writer.

Internal KeyView utility functions.

Interface between presentations or graphic readers and the

Export API.

Interface between spreadsheet readers and the Export API.

Interface between word processing readers and the Export API.

File auto-recognition module.

A Microsoft Windows program used to register in-process COM objects.

Converter for document token stream.

Required by the Java API.

XML Export SDK C Programming Guide

Support Files

Support Files

chartbls.ux

chmdll.* kp3dwrld.* kpchtrdr.* kpjavwrt.* kpjpeg.* kppng.* kvxconfig.ini

kvgraph.* kvpie.* kvradar.* kv.lic

The following support files may be redistributed with your application:

File bentofio.* cbmap.map

kvraster.class

kvVector.class

kvvector.jar

mscomctl.ocx

msvbvm60.*

MSVCP60.* msvcrt.* oleaut32.*

Description

Required by l123sr.* and kpprzrdr.*.

Character mappings for Adobe Portable Document Format

(PDF).

Character mapping tables.

Required by chmsr.

Required for 3D charts.

Required for all spreadsheets (chart support).

Java utility routines.

JPEG file interchange format shared routines.

Portable Network Graphics (PNG) utilities.

Contains element extraction settings for source XML files.

Required for all spreadsheets (chart support).

Required for all spreadsheets (chart support).

Required for all spreadsheets (chart support).

Contains license information for KeyView products. This file is opened and validated when a KeyView API is used.

Java program used to convert vector graphics on UNIX and

Linux.

Java applet used to convert vector graphics on UNIX and Linux.

Java applet used to convert vector graphics on UNIX and Linux.

This must reside in the output directory.

Microsoft Common Control (for example, labels, dialog boxes).

Required for Visual Basic programs and COM objects.

Microsoft Visual Basic Runtime library V6.0.

Microsoft Visual C++ Runtime Library V6.0.

Microsoft Visual C Runtime library.

Microsoft OLE Automation Controls.

XML Export SDK C Programming Guide

321

Appendix B Files Required for Redistribution

File olepro32.* servant.exe

wpmap.* xmlsh.*

Description

Microsoft OLE property support library.

Executable required for out-of-process conversions.

Extended character mapping for WordPerfect and Corel

Presentation.

Contains a library of content handlers for each XML file type.

Required by the Expat XML parser.

322

Document Readers and Writers

.

The following readers and writers may be redistributed with your application:

File Description ad1sr.* afsr.* assr.* awsr.* bkfsr.* bzip2sr.* cabsr.* cebsr.* chmsr.* csvsr.* dbfsr.* dbxsr.* dcasr.* difsr.* dmgsr.* dw4sr.* dxlsr.*

AD1 Evidence file reader

ASCII reader

Applix spreadsheet reader

Applix Words reader

Microsoft Backup File reader

Bzip2 reader

Microsoft Cabinet format reader

Founder Chinese E-paper Basic reader

Microsoft Compiled HTML Help reader

Comma Separated Values reader dBase Database reader

Microsoft Outlook Express DBX reader

Document Content Architecture/Revisable Form Text (DCA/RFT) reader

Data Interchange Format reader

Mac Disk Copy Disk Image File reader

DisplayWrite 4 reader

Domino XML Language reader

XML Export SDK C Programming Guide

Document Readers and Writers

File emlsr.* jtdsr.* kpagrdr.* kpanirdr.* kpbmprdr.* kpbmpwrt.* kpcdrrdr.* kpcgmrdr.* kpcgmwrt.* kpdcxrdr.* kpDWGrdr.* kpDXFrdr.* htmsr.* hwposr.* ichatsr.* icssr.* isosr.* iwsssr.* iwwpsr.* jp2000sr.* emxsr.* encasesr.* encase2sr.* entsr.* epubsr.* foliosr.* gwfssr.* hl7sr.*

Description

Microsoft Outlook Express (EML) reader. This is used to convert

EML files when the MBX reader is not licensed.

Legato EMailXtender archive (EMX) reader

Expert Witness Compression Format (EnCase) v6 reader

Expert Witness Compression Format (EnCase) v7 reader

Microsoft Entourage Database Format reader

Open Publication Structure eBook reader

Folio Flat File reader

GroupWise FileSurf reader

Health level7 reader (metadata only)

HTML and XHTML reader

Hangul 2002, 2005, 2007 reader

Apple iChat Log reader

Microsoft Outlook iCalendar reader

ISO-9660 CD Disc Image Format reader

Apple iWork Numbers reader

Apple iWork Pages reader

JPEG 2000 metadata reader

JustSystems Ichitaro reader

Applix Presents reader

Animated cursor reader

Windows Bitmap reader

Windows Bitmap writer

Corel Draw

Computer Graphics Metafile reader

Computer Graphics Metafile writer

DCX (fax) reader

AutoCAD Drawing format reader

AutoCAD Drawing Exchange format reader

XML Export SDK C Programming Guide

323

324

Appendix B Files Required for Redistribution

File kpemfrdr.* kpepsrdr.* kpgifrdr.* kpicordr.* kpiwpgrdr.* kpjbig2rdr.* kpjp2000rdr.* kpjpgrdr.* kpjpgwrt.* kpnbmprdr.* kpmacrdr.* kpmsordr.* kpodfrdr.* kpODArdr.* kpONErdr.* kppdfrdr.* kppdf2rdr.* kpp40rdr.* kpp95rdr.* kpp97rdr.* kppctrdr.* kppcxrdr.* kppicrdr.* kppngrdr.* kppngwrt.* kpppxrdr.* kpprerdr.* kpprzrdr.*

Description

Enhanced Metafile reader

Encapsulated PostScript (EPS) reader

Graphic Interchange Format (GIF) reader

Windows Icon reader

Apple iWork Keynote reader

JBIG2 reader

JPEG 2000 reader

JPEG file interchange format reader

JPEG file interchange format writer

Lotus Notes Bitmap reader (for embedded images in DXL files)

MacPaint reader

Microsoft Office Drawing Objects (office 97, 2000, and XP) reader

Oasis Open Document Format presentation (ODP) reader

AutoCAD reader (Windows only)

Microsoft OneNote reader

Adobe Portable Document File (PDF) graphic-based reader

High-fidelity Adobe Portable Document File (PDF) graphic-based reader

Microsoft PowerPoint PC 4.0 and PowerPoint Mac reader

Microsoft PowerPoint 95 reader

Microsoft PowerPoint 97 and higher reader

Macintosh Quick Draw Picture (PICT) reader

PC Paintbrush (PCX) reader

Pictor PC Paint format (PIC) reader

Portable Network Graphics (PNG) reader

Portable Network Graphics (PNG) writer

Microsoft PowerPoint XML reader 2007

Lotus Freelance Graphics for Windows V2.0 reader

Lotus Freelance Graphics 96/97/98 reader

XML Export SDK C Programming Guide

Document Readers and Writers

File lasr.* ltbenn30.dll

ltscsn10.dll

lwpapin.dll

lwppann.dll

lwpsr.dll

macbinsr.* mbsr.* mbxsr.* mdbsr.* mifsr.* misr.* kpsdwrdr.* kpsgirdr.* kpshwrdr.* kpsunrdr.* kptgardr.* kptifrdr.* kpvsdrdr.dll

kpwg2rdr.* kpwmfrdr.* kpwmfwrt.* kpwpgrdr.* kpxfdlrdr.* kvgzsr.* kvhqxsr.* kvzeesr.* l123sr.*

Description

Lotus Ami Pro Graphics reader

SGI RGB reader

Corel Presentations reader

Sun Raster reader

Truevision Targa reader

Tagged Image File Format (TIFF) reader

Microsoft Visio reader

WordPerfect Graphics 2 reader

Windows Metafile reader

Windows Metafile writer

WordPerfect Graphics 1 reader

Extensible Forms Description Language reader

GZIP reader

BinHex reader

UNIX Compress reader

Lotus 123 v96/97/98 reader

Lotus AMI Pro reader

Lotus Word Pro support (supported on Windows x86 platform only)

Lotus Word Pro support (supported on Windows x86 platform only)

Lotus Word Pro support (supported on Windows x86 platform only)

Lotus Word Pro support (supported on Windows x86 platform only)

Lotus Word Pro reader (supported on Windows x86 platform only)

MacBinary reader

Microsoft Word Macintosh reader

Mailbox (MBX)

1

and Microsoft Outlook Express (EML) reader

Microsoft Access reader.

Adobe Maker Interchange Format reader

Microsoft Word 2 reader

XML Export SDK C Programming Guide

325

326

Appendix B Files Required for Redistribution

File mwsr.* mwssr.* mwxsr.* nsfsr.* oa2sr.* odfsssr.* odfwpsr.* olesr.* mp3sr.* mppsr.* msgsr.* mspubsr.* msw6sr.* mswsr.* mw6sr.* mw8sr.* olmsr.* oo3sr.* pdfsr.* pffsr.* pstsr.dll

pstnsr.* qpssr.* rarsr.* rtfsr.* skypesr.* sosr.* swfsr.*

Description

MP3 reader for metadata extraction

Microsoft Project reader

Microsoft Outlook (MSG) reader

Microsoft Publisher reader

Microsoft Works 6 and 2000 reader

Microsoft Works V1 and 2 reader

Microsoft Word 95 reader

Microsoft Word 97, 2000, and XP reader

Microsoft Word for DOS and Microsoft Write reader

Microsoft Works Spreadsheet reader

Microsoft Word 2007 XML reader

Lotus Notes Database reader

1

Fujitsu Oasys reader

Oasis Open Document Format spreadsheets (ODS) reader

Oasis Open Document Format word processing (ODT) reader

Embedded OLE object reader.

Microsoft Outlook for Macintosh reader

Omni Outliner reader

Adobe Portable Document File (PDF) reader

Microsoft Outlook Offline Storage File reader

Microsoft Outlook Personal Folders file MAPI-based reader

(supported on Windows platform only)

1

Microsoft Outlook Personal Folders file native reader

1

Quattro Pro spreadsheet reader

RAR Archive reader

Microsoft Rich Text Format reader

Skype log file reader

StarOffice/OpenOffice reader

Macromedia Flash reader

XML Export SDK C Programming Guide

Document Readers and Writers

File wkssr.* wosr.* wp6sr.* wpmsr.* xlsbsr.* xlssr.* xlsxsr.* xmlsr.* tarsr.* tnefsr.* unihtmsr.* unisr.* unzip.* uudsr.* vsdsr.* vcfsr.* xpssr.* xywsr.* yimsr.* z7zsr.*

Description

Tape archive reader

Transfer Neutral Encapsulation Format reader

Unicode HTML reader

Unicode reader

Zip file reader

UUEncoding reader

Microsoft Visio reader

Microsoft Outlook vCard Contact reader

Lotus 123 v2.0 through 5.0 reader

WordPerfect 5.x reader

WordPerfect 6.0 through 10.0 reader

WordPerfect for Macintosh reader

Microsoft Office 2007 Excel Binary Format reader

Microsoft Excel reader

Microsoft Excel 2007 XML reader

Generic XML reader

XML Paper Specification reader

XYWrite reader

Yahoo! Instant Messenger reader

7-Zip reader

1. This reader is an advanced feature and is sold and licensed separately from KeyView Export

SDK.

XML Export SDK C Programming Guide

327

328

Appendix B Files Required for Redistribution

Document Type Definition Files

The following files related to the verity.dtd may be redistributed with your application:

File

Verity.dtd

HTMLlat1x.ent

HTMLspecialx.ent

HTMLsymbolx.ent

wp.xsl

pg.xsl

ss.xsl

Description

The document type definition file that defines the structure of an XML document. XML document validity is based on the

Verity.dtd

. The Verity.dtd is required and must be in the same directory as the output XML file.

The file defining Latin characters. This file is referenced in the verity.dtd

. This file is required and must be in the same directory as the Verity.dtd.

The file defining special characters. This file is referenced in the verity.dtd. This file is required and must be in the same directory as the Verity.dtd.

The file defining symbols. This file is referenced in the verity.dtd

. This file is required and must be in the same directory as the Verity.dtd.

The default style sheet for word processing documents. This file is optional and must be in the same directory as the output

XML file.

The default style sheet for presentation graphics. This file is optional and must be in the same directory as the output XML file.

The default style sheet for spreadsheets. This file is optional and must be in the same directory as the output XML file.

XML Export SDK C Programming Guide

Token

$ANCHOR

$BASE

$CHARSET

A PPENDIX C

Export Tokens

This section contains an alphabetized list of the Export tokens.

Tokens are special strings inserted into the KVXMLTemplate structure,

XmlTemplateInfo class, and template files. They are placeholders for markup that appears in the XML output. For example, the $CHARSET token marks the place in the XML output where the name of the source document’s character set is inserted. It would be used in the tag <charset=$CHARSET>.

See the template files for examples of how to use tokens.

Description

Inserts an anchor for a heading level (h2-h6) for the current block.

Inserts the base URL for the XML file. Use in the tag

<base href=xx> .

Inserts the character set of the source document, if that information is ascertainable. The section

“Supported Formats” on page lists the

file formats for which character set information can be determined.

XML Export SDK C Programming Guide

329

330

Appendix C Export Tokens

Token

$CONTENT

$ENDNOTE

$FOOTER

$FOOTNOTE

$FOOTNOTEALL

$HEADER

$MAINURL

$NAME

$NEXT

$PREV

$STYLESHEET

Description

Inserts the content of the metadata field specified by the $NAME token.

This token is used in conjunction with the $SUMMARY, $USERSUMMARY, and $NAME tokens to insert source document metadata into the XML output. An example of this token’s use is: pszUserSummary=<MetaData name="$NAME" content="$CONTENT">

The section

“Supported Formats” on page

lists file formats that support metadata.

Inserts endnotes from the current page of the document at this point in the output stream. Currently only implemented for Microsoft Word documents.

Inserts the footer from the current page of the document at this point in the output stream.

Inserts footnotes from the current page of the document at this point in the output stream. Currently only implemented for Microsoft Word documents.

Inserts all footnotes from the current document at this point in the output stream. Currently only implemented for Microsoft Word documents.

Inserts the header from the page of the document at this point in the output stream.

Inserts the URL to the file containing the start of the generated XML, that is, the main output stream.

Inserts the name of a metadata field. This token is used in conjunction

with the $SUMMARY ,

$USERSUMMARY , and

$CONTENT

tokens to insert source document metadata into the XML output. An example of this token’s use is: pszUserSummary=< M etaData name="$NAME" content="$CONTENT">

The section

“Supported Formats” on page

lists file formats that support metadata.

Inserts the anchor to the next block. If this is the last block, a link to the first block is inserted.

Inserts the anchor to the previous block. If the current block is the first block, a link to the last block is inserted.

Inserts the path to the style sheet.

XML Export SDK C Programming Guide

Token

$SUMMARY

$SUMMARYNN

$SPLITBLOCKNUMBER

$TOC

$TOCB

$TOCBE

$TOCE

$TOCTE

$TOCPE

$TOPANCHOR

Description

Inserts the data from standard metadata fields using the markup provided in the pszUserSummary member of the structure

KV XML Template . Standard fields are enumerated from 0 to 33 in

KVSumType

in kvtypes.h. See the tokens $USERSUMMARY ,

$NAME ,

and

$CONTENT .

The section

“Supported Formats” on page

lists file formats that support metadata.

Inserts the data from a specified metadata field. NN is a number from 0 through 33 enumerated in the KVSumType structure in kvtypes.h. An example of this token’s use is: pszMainTop =$SUMMARY01

The section

“Supported Formats” on page

lists file formats that support metadata.

Inserts the page number for each block generated as a result of bHardPageMakesNewBlock or lcbBlockSize.

Inserts the table of contents at this point in the current output stream.

This token is typically embedded in pszMainTop.

Inserts the table of contents at this point for the current block.

Inserts the beginning entry for the table of contents at this point in the current output stream.

Inserts a table of contents entry at this point in the current output stream.

Inserts a text entry without XML markup at this point in the current output stream.

Inserts a partial table of contents entry at this point in the current output stream. XML tags are removed; however, character entities are retained. This allows angle brackets to appear in the table of contents entries (for example, <text>). Without this token, <text> would be interpreted as a non-valid XML tag and would be ignored by the browser.

Inserts the anchor for the top heading level (h1) for the current block.

XML Export SDK C Programming Guide

331

Appendix C Export Tokens

Token

$USERCB

$USERSUMMARY

$XANCHOR

Description

Triggers the callback function UserCB() and identifies the callback used in the function.

Inserts the data from every valid non-standard metadata field using the markup provided in the pszUserSummary member of the structure

KV XML Template . Non-standard metadata are any fields not listed from

0 to 33 in KVSumType, such as user-defined fields (for example, custom property fields in Word documents), or fields that are unique to a particular file type (for example, “Artist” or “Genre” fields in MP3 files).

See the tokens

$SUMMARY ,

$NAME , and $CONTENT

.

The section

“Supported Formats” on page

lists file formats that support metadata.

Inserts the anchor to an extra file into the XML output. The contents of the extra file is defined by pszXFile, and the block generated by this token is defined by pszXStartBlock and pszXEndBlock.

332

• XML Export SDK C Programming Guide

A PPENDIX D

Character Sets

This section provides information on the handling of character sets in the KeyView suite of products, which includes KeyView Filter SDK, KeyView Export SDK, and

KeyView Viewing SDK. It contains the following topics.

Multi-Byte and Bi-Directional Support

Coded Character Sets

Multi-Byte and Bi-Directional Support

The KeyView SDKs can process files containing multi-byte characters. A multi-byte character encoding represents a single character with consecutive bytes. KeyView can also process text from files that contain bi-directional text.

Bi-directional text contains both Latin-based text which is read from left to right, and text that is read from right to left (Hebrew and Arabic).

Table indicates which character encodings are supported by KeyView for each

format.

Format

Archive

7-Zip (7Z)

AD1 Evidence file

XML Export SDK C Programming Guide

Single-byte Multi-byte n/a n/a n/a n/a

Bi-directional n/a n/a

333

Appendix D Character Sets

334

Format Single-byte

BinHex (HQX)

Bzip2 (BZ2)

EnCase – Expert Witness Compression

Format (E01)

GZIP (GZ)

ISO (ISO)

Java Archive (JAR)

Legato EMailXtender Archive (EMX)

MacBinary (BIN)

Mac Disk Copy Disk Image (DMG)

Microsoft Backup File (BKF)

Microsoft Cabinet format (CAB)

Microsoft Compiled HTML Help (CHM)

Microsoft Compressed Folder (LZH)

PKZip (ZIP)

Microsoft Outlook DBX (DBX)

Microsoft Outlook Offline Storage File (OST) Y

RAR Archive (RAR) n/a

Tape Archive (TAR)

UNIX Compress (Z) n/a n/a n/a n/a n/a

Y n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a

UUEncoding (UUE)

Windows Scrap File (SHS)

WinZip (ZIP)

Binary

Executable (EXE)

Link Library (DLL)

Computer-aided Design

AutoCAD Drawing (DWG) n/a n/a n/a n/a n/a

Y n/a n/a

Y

Y n/a n/a n/a n/a n/a n/a

Y n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a

Multi-byte n/a n/a n/a

XML Export SDK C Programming Guide n/a n/a

Y

Y n/a n/a n/a n/a n/a n/a

Y n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a

Bi-directional n/a n/a n/a

Multi-Byte and Bi-Directional Support

Format

AutoCAD Drawing Exchange (DXF)

CATIA formats (CAT)

Microsoft Visio (VSD)

Database dBase Database

Microsoft Access (MDB)

Microsoft Project (MPP)

Desktop Publishing

Microsoft Publisher

Display

Adobe Portable Document Format (PDF)

Graphics

Computer Graphics Metafile (CGM)

Corel DRAW (CDR)

DCX Fax System (DCX)

DICOM – Digital Imaging and

Communications in Medicine (DCM)

Encapsulated PostScript (EPS)

Enhanced Metafile (EMF)

Graphic Interchange Format (GIF)

JBIG2

JPEG

JPEG 2000

Lotus AMIDraw Graphics (SDW)

Lotus Pic (PIC)

Macintosh Raster (PICT/PCT)

MacPaint (PNTG)

Microsoft Office Drawing (MSO)

Y

Y

Y

N

Y n/a n/a n/a n/a

Y

Y n/a n/a n/a n/a n/a

Y n/a

Y n/a

Single-byte

Y

Y

Y

Multi-byte

Y

N

Y

N

Y

Y

Y

Y

1 n/a n/a n/a n/a

N

Y n/a n/a n/a n/a n/a

N n/a

N n/a

XML Export SDK C Programming Guide

N

Y

N

N

N n/a n/a n/a n/a

N

N n/a n/a n/a n/a n/a

N n/a

N n/a

Bi-directional

Y

N

Y

335

Appendix D Character Sets

336

Format

Omni Graffle (GRAFFLE)

PC PaintBrush (PCX)

Portable Network Graphics (PNG)

SGI RGB Image (RGB)

Sun Raster Image (RS)

Tagged Image File (TIFF)

Truevision Targa (TGA)

Windows Animated Cursor (ANI)

Windows Bitmap (BMP)

Windows Icon Cursor (ICO)

Windows Metafile (WMF)

WordPerfect Graphics 1 (WPG)

WordPerfect Graphics 2 (WPG)

Mail

Documentum EMCMF Format

Domino XML Language (DXL)

GroupWise FileSurf

Legato Extender (ONM)

Lotus Notes database (NSF)

Mailbox (MBX)

Microsoft Entourage Database

Microsoft Outlook (MSG)

Microsoft Outlook Express (EML)

Microsoft Outlook iCalendar

Microsoft Outlook for Macintosh

Microsoft Outlook Offline Storage File

Microsoft Outlook Personal File Folders

(PST)

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Single-byte n/a

Y n/a n/a

Y n/a n/a n/a

Y

Y n/a n/a

Y

Multi-byte n/a

N n/a n/a

N n/a n/a n/a

Y

N n/a n/a

N

Y

Y

Y

Y

N

Y

Y

Y

Y

Y

Y

Y

Y

XML Export SDK C Programming Guide

Bi-directional n/a

N n/a n/a

N n/a n/a n/a

N

N n/a n/a

N

Y

Y

Y

Y

N

N

Y

N

Y

Y

Y

Y

Y

Multi-Byte and Bi-Directional Support

Format

Microsoft Outlook vCard Contact

Text Mail (MIME)

Transport Neutral Encapsulation Format

Multimedia

Advanced Systems Format (ASF)

Audio Interchange File Format (AIFF)

Microsoft Wave Sound (WAV)

MIDI (MID)

MPEG 1 Audio Layer 3 (MP3)

MPEG 1 Video (MPG)

MPEG 2 Audio (MPEGA)

MPEG 4 Audio (MP4)

NeXT/Sun Audio (AU)

QuickTime Movie (QT/MOV)

Windows Video (AVI)

Presentations

Apple iWork Keynote (GZ)

Applix Presents (AG)

Corel Presentations (SHW)

Extensible Forms Description Language

(XFD)

Lotus Freelance Graphics 2 (PRE)

Lotus Freelance Graphics (PRZ)

Macromedia Flash (SWF)

Single-byte

Y

Y n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a

Multi-byte

Y

Y n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a

Bi-directional

Y character set

1252 only character set

1252 only

Y

Y

N

N

Y character set

850 only

Y

N

Y

Japanese, Simple

Chinese,

Traditional Chinese,

Thai only

Y N

N

N

N

N

N

N

Y

Y n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a

XML Export SDK C Programming Guide

337

Appendix D Character Sets

338

Format

Microsoft OneNote

Microsoft PowerPoint PC (PPT)

Microsoft PowerPoint Windows (PPT)

Single-byte

Y character set

1252 only

Y

Microsoft PowerPoint Macintosh (PPT)

Microsoft PowerPoint Windows XML 2007 and 2010 (PPTX)

OASIS Open Document (ODP)

OpenOffice Impress (ODP)

StarOffice Impress (ODP)

Spreadsheets

Apple iWork Numbers (GZ)

Applix Spreadsheets (AS)

Y

Y

Y

Y

Y

Y character set

1252 only

Comma Separated Values (CSV)

Corel Quattro Pro (QPW/WB3)

Data Interchange Format (DIF)

Lotus 1-2-3 (123)

Lotus 1-2-3 (WK4)

Lotus 123 Charts (123)

Microsoft Excel Charts (XLS)

Microsoft Excel Macintosh (XLS)

Microsoft Excel Windows (XLS)

Microsoft Excel Windows XML 2007 (XLSX) Y

Microsoft Office Excel Binary Format (XLSB) Y

Microsoft Works Spreadsheet (S30/S40)

Y

Y

Y

Y

Y

Y

Y

Y

Y character set

1252 only

Y

N

Y

Y

Y

N

N

Y

Y

Y

Y

Y

N

Y

Y

Y

N

Multi-byte

Y

Traditional Chinese only

N

Y

Japanese, Simple

Chinese,

Traditional Chinese,

Korean only

Bi-directional

N

N

Hebrew only

N

Y

N

N

N

Y

2

Y

N

N

Y

2

N

N

N

N

N

N

N

N

N

XML Export SDK C Programming Guide

Multi-Byte and Bi-Directional Support

Format

OASIS Open Document (ODS)

OpenOffice Calc (ODS)

StarOffice Calc (ODS)

Text and Markup

ANSI (TXT)

ASCII (TXT)

HTML (HTM)

Microsoft Excel Windows XML 2003

Microsoft Word for Windows XML 2003

Microsoft Visio XML 2003

Rich Text Format (RTF)

Unicode HTML

Unicode Text (TXT)

XHTML

XML

Word Processing

Adobe Maker Interchange Format (MIF)

Apple iChat Log (ICHAT)

Apple iWork Pages (GZ)

Applix Words (AW)

DisplayWrite (IP)

Folio Flat File (FFF)

Founder Chinese E-paper Basic (CEB)

Fujitsu Oasys (OA2)

Hangul (HWP)

Single-byte

Y

Y

Y

Multi-byte

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y character set

1252 only character set

1252 only character set

500, 1026 only

Y

Y character set

1252 only

Y

N

Y

Y

N

N

N

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

XML Export SDK C Programming Guide

N

N

N

N

N

N

N

N

N

Y

Y

Y

2

Y

2

Y

2 , 3

Y

Y

3

Y

2 , 3

Y

2

Y

3

Y

Bi-directional

N

N

N

339

Appendix D Character Sets

340

Format

Health level7 (HL7)

IBM DCA/RTF (DC)

JustSystems Ichitaro (JTD)

Lotus AMI Pro (SAM)

Lotus AMI Professional Write Plus (AMI)

Lotus Word Pro (LWP)

Lotus SmartMaster (MWP)

Microsoft Word PC (DOC)

Microsoft Word Windows V1-2 (DOC) Y

Microsoft Word Windows V6, 7, 8, 95 (DOC) Y

Y Microsoft Word Windows V97 through 2003

(DOC)

Microsoft Word Windows XML 2007 and

2010 (DOCX)

Y

Y

Y character set

1252 only

Microsoft Word Macintosh (DOC)

Microsoft Works (WPS)

Microsoft Write (WRI)

OASIS Open Document (ODT)

Omni Outliner (OO3)

OpenOffice Writer (ODT)

Open Publication Structure eBook (EPUB) Y

StarOffice Writer (ODT) Y

Skype Log (DBB)

Y

Y

Y

Y

Y

Y

Y

WordPad (RTF)

Single-byte

Y

Y

Y character sets

500, 1026 only

Y

Multi-byte

Y

N

Y

Simple Chinese,

Traditional Chinese,

Japanese, Thai only

Y

Y

Simple Chinese,

Traditional Chinese,

Japanese, Thai only

N

Y

N

Y

Y

Y

Y

Y

Y

Y

N

Japanese only

Japanese only

Y

Y (null-terminated charsets)

Y

N

Y

Bi-directional

Y

N

N

Y

3

N

N

N

Y

N

N

N

N

Y

3

N

N

N

Hebrew only

3

Y

3

Y

3

Y

XML Export SDK C Programming Guide

Coded Character Sets

Format Single-byte Multi-byte Bi-directional

WordPerfect Linux (WPS)

WordPerfect Macintosh (WPS)

WordPerfect Windows (WO)

XML Paper Specification (XPS)

XYWrite Windows (XY4)

Y

Y

Y

Y character set

1252 only

Y

N

N

N

Y

N

N

N

N

N

N

Yahoo! Instant Messenger (DAT) Y (null-terminated charsets)

N

1. Multi-byte PDFs are supported, provided the PDF document is created using either Character ID-keyed (CID) fonts, predefined CJK CMap files, or ToUnicode font encodings, and does not contain embedded fonts. See the Adobe website and the Adobe Acrobat documentation for more information. Any multi-byte characters that are not supported are displayed using the replacement character. By default, the replacement character is a question mark (?).

To determine the type of font encodings that are used in a PDF, open the PDF in Adobe Acrobat, and select File |

Document Info | Fonts. If the Encoding column lists Custom or Embedded encodings, you may encounter problems converting the PDF.

2. Text direction in the output file may not be correct.

3. In Export SDK, a bi-directional right-to-left ( RTL ) tag is extracted from this format and included in the direction element

( <dir=RTL> ) of the output.

Coded Character Sets

Table lists which character set can be used to specify the target character set.

The coded character sets are enumerated in kvtypes.h and defined in the

Export class.

Coded Character Set

KVCS_UNKNOWN

KVCS_SJIS

KVCS_GB

KVCS_BIG5

KVCS_KSC

Description

Unknown character set

Japanese (uses multi-byte encoding), cp932

Simplified Chinese (China, Singapore, Malaysia) cp936

Traditional Chinese (Taiwan, Hong Kong, Macaw) cp950

Korean, cp949

Y

Y

Y

Can be set as target charset?

N

Y

XML Export SDK C Programming Guide

341

Appendix D Character Sets

342

KVCS_8859_6

KVCS_8859_7

KVCS_8859_8

KVCS_8859_9

KVCS_8859_14

KVCS_8859_15

KVCS_437

KVCS_737

KVCS_775

KVCS_850

KVCS_851

KVCS_852

KVCS_855

Coded Character Set

KVCS_1250

KVCS_1251

KVCS_1252

KVCS_1253

KVCS_1254

KVCS_1255

KVCS_1256

KVCS_1257

KVCS_1258

KVCS_8859_1

KVCS_8859_2

KVCS_8859_3

KVCS_8859_4

KVCS_8859_5

Description

Windows Latin 2 (Central Europe)

Windows Cyrillic (Slavic)

Windows Latin 1 (ANSI)

Windows Greek

Windows Latin 5 (Turkish)

Windows Hebrew

Windows Arabic

Windows Baltic Rim

Windows Vietnamese

ISO 8859-1 Latin 1 (Western Europe, Latin America)

ISO 8859-2 Latin 2 (Central Eastern Europe)

ISO 8859-3 Latin 3 (S.E. Europe)

ISO 8859-4 Latin 4 (Scandinavia/Baltic)

ISO 8859-5 Latin/Cyrillic

ISO 8859-6 Latin/Arabic

ISO 8859-7 Latin/Greek

ISO 8859-8 Latin/Hebrew

ISO 8859-9 Latin/Turkish

ISO 8859-14

ISO 8859-15

DOS Latin US

DOS Greek

DOS Baltic Rim

DOS Latin 1

DOS Greek

DOS Latin 2

DOS Cyrillic

XML Export SDK C Programming Guide

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Y

Can be set as target charset?

Y

Y

Coded Character Sets

Coded Character Set

KVCS_857

KVCS_860

KVCS_861

KVCS_862

KVCS_863

KVCS_864

KVCS_865

KVCS_866

KVCS_869

KVCS_874

KVCS_PDFMACDOC

KVCS_PDFWINDOC

KVCS_STDENC

KVCS_PDFDOC

KVCS_037

KVCS_1026

KVCS_500

KVCS_875

KVCS_LMBCS

KVCS_UNICODE

KVCS_UTF16

KVCS_UTF8

KVCS_UTF7

KVCS_2022_JP

KVCS_2022_CN

KVCS_2022_KR

KVCS_WP6X

Description

DOS Turkish

DOS Portuguese

DOS Icelandic

DOS Hebrew

DOS Canadian French

DOS Arabic

DOS Nordic

DOS Cyrillic Russian

DOS Greek 2

Thai

PDF MAC DOC

PDF WIN DOC

Adobe Standard Encoding

Adobe standard PDF character set

EBCDIC code page 037

EBCDIC code page 1026

EBCDIC code page 500

EBCDIC code page 875

Lotus multibyte character set Group 1 and Group 2

Unicode, UCS-2

16-bit Unicode transformation format

8-bit Unicode transformation format

7-bit Unicode transformation format Y

ISO 2022-JP, Japanese mail and news safe encoding (JIS-7) N

ISO 2022-CN, Chinese mail and news safe encoding

ISO 2022-KR, Korean mail and news safe encoding

Word Perfect 6.x and higher character mapping N

N

N

N

Y

N

N

Y

Y

Y

Y

N

N

N

N

Y

Y

Y

Y

Y

Y

Y

Y

Can be set as target charset?

Y

Y

XML Export SDK C Programming Guide

343

Appendix D Character Sets

344

Coded Character Set

KVCS_10000

KVCS_KSC5601

KVCS_GB2312

KVCS_GB12345

KVCS_CNS11643

KVCS_JIS0201

KVCS_JIS0212

KVCS_EUC_JP

KVCS_EUC_GB

KVCS_EUC_BIG5

KVCS_EUC_KSC

KVCS_424

KVCS_856

KVCS_1006

Description

Western European (Macintosh)

Unified Hangul

Simplified Chinese (China, Singapore, Hong Kong)

Traditional Chinese (China) - analogue of GB2312

Traditional Chinese - Taiwan. Supplement to Big5

Japanese - contains ASCII character set (JIS-Roman)

Japanese. Supplement to JIS0208.

Japanese Extended UNIX Code

Simplified Chinese Extended UNIX Code

Traditional Chinese Extended UNIX Code

Korean Extended UNIX Code

EBCDIC Hebrew

PC Hebrew (old)

IBM AIX Pakistan (Urdu)

KVCS_KOI8R

KVCS_PDF_JAPAN1

Cyrillic (Russian)

Adobe-Japan1-2 character collection

KVCS_PDF_KOREA1 Adobe-Korea1-0 character collection

KVCS_PDF_GB1 Adobe-GB1-3 character collection

KVCS_PDF_CNS1

KVCS_2022_JP_8

KVCS_720

KVCS_VISCII

KVCS_8859_10

KVCS_8859_13

KVCS_57002

KVCS_57003

KVCS_57004

Adobe-CNS1-2 character collection

ISO 2022-JP, Japanese mail and news safe encoding (JIS8)

Arabic DOS-720

Vietnamese VISCII

ISO 8859-10 (Latin 6 Nordic)

ISO 8859-13 (Latin 7 Baltic)

ISCII Devanagari (x-iscii-de)

ISCII Bengali (x-iscii-be)

ISCII Tamil (x-iscii-ta)

N

N

Y

N

N

N

Y

Y

Y

1

Y

1

Y

1

Y

1

Y

1

N

N

N

N

Y

N

Y

Y

Y

N

Y

Y

Can be set as target charset?

Y

Y

XML Export SDK C Programming Guide

Coded Character Sets

Coded Character Set

KVCS_57005

KVCS_57006

KVCS_57007

KVCS_57008

KVCS_57009

KVCS_57010

KVCS_57011

KVCS_GB18030b2

KVCS_GB18030

KVCS_8859_11

KVCS_8859_16

KVCS_ARABICMAC

KVCS_KOI8U

KVCS_HZGB2312

Description

ISCII Telugu (x-iscii-te)

ISCII Assamese (x-iscii-as)

ISCII Oriya (x-iscii-or)

ISCII Kannada (x-iscii-ka)

ISCII Malayalam (x-iscii-ma)

ISCII Gujarathi (x-iscii-gu)

ISCII Panjabi (x-iscii-pa)

Reserved for internal use

GB18030 (Chinese 4-byte character set)

ISO 8859-11 (Thai)

ISO 8859-16 (Latin-10 South-Eastern Europe)

Arabic Mac (x-mac-arabic)

Cyrillic (KOI8U Ukrainian)

The 7-bit representation of GB 2312 / RFC 1842

Y

Y

Y

Y n/a

Y

1

Y

1

Y

1

Y

1

Y

1

Can be set as target charset?

Y

1

Y

1

n/a

Y

1. Character set cannot be forced as output in Export SDK and Viewing SDK because the character set is not supported by the major browsers.

XML Export SDK C Programming Guide

345

Appendix D Character Sets

346

• XML Export SDK C Programming Guide

A PPENDIX E

File Format Detection

This section describes how file formats are detected in the KeyView Export SDK.

It contains the following topics:

Introduction

Extract Format Information

Determine Format Support

Translate Format Information

Determine a Document Reader

Category Values in formats_e.ini

Introduction

The KeyView format detection module (kwad) detects a file’s format, and reports the information to the API, which in turn reports the information to the developer’s application. If the detected format is supported by the KeyView SDK, the detection module also loads the appropriate structured access layer and document reader for further processing.

For a list of supported formats, see

“Supported Formats” on page .

XML Export SDK C Programming Guide

347

Appendix E File Format Detection

348

Extract Format Information

You can extract format information from a document using the fpGetStreamInfo() function. If required, this format information can then be reported to the developer’s application. The fpGetStreamInfo() function extracts format information, such as file class, format and version, and populates the ADDOCINFO structure. This structure is defined in the header file adinfo.h.

For information on how to translate the extracted format information, see

“Translate Format Information” on page .

Determine Format Support

Once the file format is extracted, the detection module then uses the formats_e.ini

file to determine whether the format is supported by KeyView, and the appropriate structured access layer and reader to load.

The formats_e.ini file is in the directory install\OS\bin, where install is the pathname of the Export installation directory and OS is the name of the operating system. It contains the following information:

 Coded format information. To translate this information, see

Information” on page .

“Translate Format

 Reader associated with each format. See

page .

“Determine a Document Reader” on

Configuration parameters for out-of-process conversions.

Locale settings for internal use.

Below are some entries from the formats_e.ini file:

123=mw

152=xyw

178=wp6

189=mw6

2=af

200=pdf

205=mb

210=htm

251=htm

NOTE The formats_e.ini file applies to all formats except graphics.

Detection of graphics formats are handled by an internal module named

KeyView Picture Interchange Format (KPIF).

XML Export SDK C Programming Guide

Determine Format Support

Refine Detection of Text Files

During text detection, KeyView analyses the first 1kB and last 1kB of data in a document, and if less than 10% of that data consists of non-ASCII characters,

KeyView detects the document as a text file.

However, depending on the type of documents you are working with, the default settings may not provide the desired level of accuracy. Configuration flags allow you to change the amount of data to read at the end of a file, the percentage of non-ASCII characters permitted in a text file, and whether to use or ignore the file extension to determine the document format.

Change the Amount of File Data to Read

During file detection, KeyView reads characters from the beginning and end of a file—by default, it reads the first and last 1024 bytes of data. Large text files may contain many irrelevant characters at the end of a file, so KeyView may not accurately detect the file format. You can set a configuration flag to increase the amount of data to read from the end of a file during detection.

To change the amount of data to read during detection

 In the formats_e.ini file, set the following flag in the detection_flags section:

[detection_flags] non_ascii_chars_end_block_size=kB where kB is the number of kilobytes to read from the end of the file, from 0 to

10. The default value is 1.

NOTE The file size must be greater than the value specified in the flag. If the flag value is greater than the file size, KeyView does not use the flag.

Change the Percentage of Allowed Non-ASCII Characters

By default, if less than 10% of the analyzed data in a document consists of non-ASCII characters, it is detected as a text file. Depending on the type of files you are working with, changing the default percentage may increase detection accuracy.

To change the percentage of non-ASCII characters allowed in text files

 In the formats_e.ini file, set the following flag in the detection_flags section:

[detection_flags]

XML Export SDK C Programming Guide

349

Appendix E File Format Detection

350

• non_ascii_chars_in_text=N where N is the percentage of non-ASCII characters to allow in text files. Files that contain a lower percentage of non-ASCII characters than N are detected as text files. The default value is 10.

Use the File Extension for Detection

Sometimes KeyView detects certain file formats, such as CSV, as ASCII because of the content of the documents. In such cases, you can configure KeyView to use the file extension to determine the document format. Using the file extension can improve detection of formats such as CSV, but might not detect text files successfully if they have incorrect file extensions.

To use the file extension for ASCII files during detection

 In the formats.ini file, set the following flag in the detection_flags section:

[detection_flags] use_extension_for_ascii=1

The default is 0 (do not use the file extension).

Translate Format Information

Format information can include file attributes in the following categories:

Major Format

File Class

Minor Format

Major Version

Minor Version

Not all categories are required. Many formats only include major format and file class, or major format only.

The format information has the following structure:

MajorFormat.FileClass.MinorFormat.MajorVersion.MinorVersion

For example:

81.2.0.9.0

XML Export SDK C Programming Guide

Translate Format Information

Each number in the format information represents a file attribute. The entry

81.2.0.9.0

represents a Lotus 1-2-3 Spreadsheet file version 9.0, where

81 = Lotus 1-2-3 Spreadsheet (major format)

2 = Spreadsheet (file class)

0 = not defined (minor format)

9 = 9 (major version)

0 = 0 (minor version)

The example above applies to formats_e.ini file. When extracting format information using the fpGetStreamInfo() function, the same format information is represented as 294.2.0.9.

NOTE The format values returned by fpGetStreamInfo() differ from those in formats_e.ini because the former defines a unique ID for each major format, while the latter uses a major version, minor version and minor format to distinguish between formats.

Distinguish Between Formats

The ADDOCINFO structure provides a unique ID for each major format. For example, a call to fpGetStreamInfo() would return 351.1.0 for a Microsoft

Word 2003 XML format. The major format 351 is unique to this format.

Unlike ADDOCINFO, the formats_e.ini file distinguishes between formats using the major version number. For example, in formats_e.ini, a Microsoft

Word 2003 XML format is defined as 285.1.0.100.0. The major format 285 and file class 1 are the same values for generic XML. The major version 100 distinguishes the format as Microsoft Word 2003 XML.

The major version is used in formats_e.ini to specify the following formats:

 The Microsoft Office 2003 XML format has the same major format and file class as generic XML (285.1). It is distinguished from generic XML using the following major versions:

Word: 100

Excel: 101

Visio: 110

The XHTML format has the same major format and file class as HTML

(210.1). It is distinguished from HTML using the major version 100.

XML Export SDK C Programming Guide

351

Appendix E File Format Detection

352

Determine a Document Reader

The format detection module uses the formats_e.ini file to determine whether a format is supported and which reader should be used to parse a format. The entries in the formats_e.ini file lists each format’s coded value, and an abbreviation for the format’s reader. For example:

81.2.0.9.0=l123

The reader abbreviation is a truncated version of the reader’s library name.

Adding “sr” to the end of an abbreviation creates the name of the reader. The example entry above specifies that a Lotus 1-2-3 Spreadsheet file version 9.0 is parsed by the Lotus 1-2-3 reader, l123sr.

“Files Required for Redistribution” on page lists the document readers

provided with KeyView.

Category Values in formats_e.ini

This section lists the possible category values for format information in the formats_e.ini

file. The corresponding values for the format information extracted from a call to fpGetStreamInfo() are listed in the header file adinfo.h

.

Major Formats

File Classes

Minor Formats

5

6

3

4

7

Number

1

2

Format

AES Multiplus Comm Format

ASCII File word processor/MS DOS Batch File format

Applix Asterix

Microsoft Windows Bitmap image (BMP)

Convergent Tech DEF Comm. format

Corel Draw (CDR)

Keyword COM.FILE (KSIF)

File Class

Word processor

Word processor

Word processor

Raster image

Word processor

Vector graphic

XML Export SDK C Programming Guide

Category Values in formats_e.ini

Number

28

29

30

31

24

25

26

27

32

33

34

20

21

22

23

16

17

18

19

12

13

14

15

8

9

10

11

Format

Computer Graphics Metafile (CGM)

Word Connection

COMET TOP Word

DG CEOwrite

Honey Bull DSA101

IBM DCA-RFT

Dummy File (Internal)

DG Common Data Stream (CDS)

Dummy Print File (Internal)

Windows Micrografx Draw (DRW)

Data Point VISTAWORD

Encapsulated PostScript (EPS)

DOS/Windows Executable (EXE, DLL)

CCITT Group 3 1-Dimensional (G31D)

Graphics Interchange format (GIF)

IBM 1403 Line Printer

IBM DCF Script

IBM DCA-FFT

GEM Bit Image

IBM Display Write 4

Raster Graphics

Keywords PICL

XML Export SDK C Programming Guide

File Class

Vector graphic

Word processor

Word processor

Word processor

Word processor

Word processor

Word processor

Vector graphic

Word processor

Raster image

Executable

Raster image

Raster image

Word processor

Word processor

Word processor

Raster image

Word processor

Raster image

353

354

Appendix E File Format Detection

Number

55

56

57

58

51

52

53

54

59

60

61

47

48

49

50

43

44

45

46

39

40

41

42

35

36

37

38

Format

Lotus AMI Pro

MORE Database Outliner (Mac)

MacPaint

Microsoft Word Mac

Informix SmartWare II Communication File

Microsoft Word for Windows

MultiMate 4.0

Multiplan Spreadsheet

Microsoft Rich Text Format (RTF)

Microsoft Word 5.0 (PC)

NBI Async Archive Format

Navy DIF

NBI Net Archive Format

NIOS TOP

FileMaker (Mac)

ODA/ODIF

Keyword OSM

Office Writer

PC Paint Brush Graphics (PCX)

CPT Communication Format

Lotus PIC

Macintosh Quick Draw Picture Format (PICT)

Philips Script

PostScript File

File Class

Word processor

Outline/planning

Raster image

Word processor

Communications

Word processor

Word processor

Spreadsheet

Word processor

Word processor

Word processor

Word processor

Word processor

Word processor

Database

Word processor

Word processor

Raster image

Word processor

Vector graphic

Raster image

Word processor

Vector graphic

XML Export SDK C Programming Guide

Category Values in formats_e.ini

Number

82

83

84

85

78

79

80

81

86

87

88

74

75

76

77

70

71

72

73

66

67

68

69

62

63

64

65

Format

Quadratron Q-One (V1.93J)

Quadratron Q-One (V2.0)

SAMNA Word IV

Lotus AMI Pro Draw (SDW)

SYLK Spreadsheet

Informix SmartWare II

Symphony Spreadsheet

Truevision Targa

Tagged Image File (TIFF)

Targon Word (V 2.0)

Uniplex Ucalc Spreadsheet

Uniplex (V6.01)

Microsoft Word (UNIX)

WANG PC

WordERA (V 1.0)

WANG WPS Comm. format

WordPerfect Mac

WordPerfect 5.2

Lotus 1-2-3 Spreadsheet

WordMARC word processor

Microsoft Windows Metafile (WMF) Graphics

Informix SmartWare II Database

WordPerfect Graphics V1.0 (WPG)

Wang WITA Word processor

File Class

Word processor

Word processor

Word processor

Raster image

Spreadsheet

Word processor

Spreadsheet

Raster image

Raster image

Word processor

Spreadsheet

Word processor

Word processor

Word processor

Word processor

Word processor

Word processor

Word processor

Spreadsheet

Word processor

Raster image

Database

Raster image

XML Export SDK C Programming Guide

355

356

Appendix E File Format Detection

98

99

100

101

107

108

109

111

103

104

105

106

112

113

114

115

116

Number

93

94

95

96

97

89

90

91

92

Format

Xerox 860 Comm. format

Microsoft Excel Spreadsheet

Xerox Writer word processor

DIF Spreadsheet

ENABLE Spreadsheet

Supercalc Spreadsheet

Ultracalc Spreadsheet

Informix SmartWare Spreadsheet

Serialized Object Format (SOF) Encapsulation format

Microsoft PowerPoint (PC)

Microsoft PowerPoint (Mac)

Aldus PageMaker (Mac)

Aldus PageMaker (DOS)

Microsoft Works (Mac)

Microsoft Works Database (Mac)

Microsoft Works Spreadsheet (Mac)

Microsoft Works Communication (Mac)

Microsoft Works (PC)

Microsoft Works Database (PC)

Microsoft Works Spreadsheet (PC)

PC Library Module

MacWrite II

Aldus Freehand Mac

Disk Doubler Compression format

HP Graphics Language (HP-GL)

File Class

Word processor

Spreadsheet

Word processor

Spreadsheet

Spreadsheet

Spreadsheet

Spreadsheet

Spreadsheet

Encapsulation

Presentation

Presentation

Desktop

Publishing

Desktop

Publishing

Word processor

Database

Spreadsheet

Communication

Word processor

Database

Spreadsheet

Library module

Word processor

Vector graphic

Encapsulation

Vector graphic

XML Export SDK C Programming Guide

Category Values in formats_e.ini

Number

117

135

136

137

138

139

140

141

142

143

123

124

126

127

118

119

120

121

128

129

131

132

133

134

Format

Adobe Maker Interchange Format (MIF)

JPEG File Interchange Format (JFIF)

Reflex Database

Framework II

Paradox (PC) Database

Microsoft Windows Write

Quattro Pro Spreadsheet (DOS)

Persuasion Presentation

Corel Presentation

Microsoft Windows Icon Format (ICO) Graphics

Microsoft Project

Harvard Graphics

Zip Archive Format

Microsoft Windows Cursor (CUR) Graphics

Quark Express (Mac)

ARC/PAK Archive format

Adobe FrameMaker

Microsoft Publisher

Plan Perfect

WordPerfect General File Format

Lotus Freelance

Microsoft Wave Sound File

MIDI Sound File

AutoCAD DXF Graphics

File Class

Desktop

Publishing

Raster image

Database

Mixed format

Database

Word processor

Spreadsheet

Presentation

Presentation

Raster image

Time scheduling

Desktop publishing

Encapsulation

Raster image

Desktop publishing

Encapsulation

Desktop publishing

Desktop publishing

Time scheduling

Miscellaneous

Presentation

Sound

Sound

Vector graphic

XML Export SDK C Programming Guide

357

358

Appendix E File Format Detection

Number

164

165

166

167

160

161

162

163

168

169

170

156

157

158

159

152

153

154

155

148

149

150

151

144

145

146

147

Format dBase Database

OS/2 PM Metafile Graphics

Lasergraphics Language

AutoShade Rendering File Format

Graphics Environment Manager (GEM VDI)

Microsoft Windows Help File

Ability Office (SS, DB, GR, WP, COM)

XyWrite/Nota Bene

Comma Separated Values (CSV)

Writing Assistant word processor

WordStar 2000

WordStar 6.0

HP Printer Control Language (PCL)

(UNIX/VAX/SUN) Executable

(UNIX/VAX/SUN) Object Module

(UNIX/VAX/SUN) Link Library

NeXT SUN Audio Data

NeWS font file (SUN) cpio Archive Format (UNIX/VAX/SUN)

PEX Binary Archive (SUN)

SUN vfont definition

Curses Screen Image (UNIX/VAX/SUN)

UU Encoded Encryption File

PC Object Module

Microsoft Windows Group File

File Class

Database

Vector graphic

Vector graphic

Vector graphic

Vector graphic

Miscellaneous

Word processor

Spreadsheet

Word processor

Word processor

Word processor

Vector graphic

Executable

Object module

Library module

Sound

Font

Encapsulation

Encapsulation

Font

Raster image

Encapsulation

Object module

Miscellaneous

XML Export SDK C Programming Guide

Category Values in formats_e.ini

187

188

189

190

183

184

185

186

191

192

193

194

195

179

180

181

182

175

176

177

178

Number

171

172

173

174

Format

PC True Type Font

Program Information File

PC COM executable file

Adobe FrameMaker Markup Language

Stuff It Archive (Mac)

PeachCalc Spreadsheet

Wang Office GDL Header Encapsulation

WordPerfect 6.0

Q & A for DOS

Q & A for Windows

DEC WPS PLUS

DCX Fax format

Microsoft Windows OLE 2 Encapsulation

Quattro Pro for Windows

Keyword Viewer Markup Format

EBCDIC Text

DCS

Microsoft Excel Spreadsheet 95, 2000

Microsoft Word for Windows 95

UNIX SHAR Encapsulation

Lotus Notes Bitmap

UNIX Compress Encapsulation

Lotus Notes CDF

UNIX TAR Encapsulation

WordPerfect Graphics V2.0 (WPG2)

196 ODA/ODIF (FOD 26)

XML Export SDK C Programming Guide

File Class

Font

Miscellaneous

Executable

Desktop publishing

Encapsulation

Spreadsheet

Encapsulation

Word processor

Word processor

Word processor

Word processor

Fax

Encapsulation

Spreadsheet

Word processor

Word processor

Spreadsheet

Word processor

Encapsulation

Raster image

Encapsulation

Word processor

Encapsulation

Raster image

Vector graphic

Word processor

359

360

Appendix E File Format Detection

215

216

217

218

211

212

213

214

219

220

221

222

Number

201

202

203

204

197

198

199

200

205

206

207

208

209

210

Format File Class

GZ Compress Encapsulation

Envoy (EVY)

Adobe Portable Document Format (PDF)

KW ODA Internal Raw Bitmap (RBM)

KW ODA G4 (G4)

KW ODA G31D (G31)

KW ODA Internal G32D (G32)

Microsoft Word for Mac V 4.x/5.x

BinHex 4.0 encoded file

SMTP document

MIME format - Microsoft Outlook Express (EML)/

Mailbox (MBX)

SGML document

HTML document

XHTML

1

ACT Format

Microsoft PowerPoint 95

Portable Network Graphics (PNG)

Video for Windows

Windows Animated Cursor

Windows C++ Object Storage

Windows Palette

RIFF Device Independent Bitmap

RIFF MIDI

RIFF Multimedia Movie

MPEG Movie

QuickTime Movie

Encapsulation

Word processor

Word processor

Raster image

Raster image

Raster image

Raster image

Word processor

Encapsulation

Encapsulation

Encapsulation

Word processor

Word processor

Word processor

Presentation

Raster image

Movie

Raster image

Mixed format

Raster image

Raster image

Sound

Movie

Movie

Movie

XML Export SDK C Programming Guide

Category Values in formats_e.ini

Number

243

244

245

246

239

240

241

242

247

248

249

235

236

237

238

231

232

233

234

227

228

229

230

223

224

225

226

Format

Audio Interchange File Format (AIFF) Sound

Amiga MOD Sound

Amiga IFF (8SVX) Sound

Creative Voice (VOC) Sound

Microsoft Works (Windows)

Microsoft Works Spreadsheet (Windows)

AutoDesk Animator FLIC Animation

AutoDesk Animator Pro FLIC Animation

Microsoft Works Database (Windows)

Microsoft Works Communication (Windows)

Compactor / Compact Pro Archive

VRML

QuickDraw 3D Metafile (3DMF)

PGP Secret Keyring

PGP Public Keyring

PGP Encrypted Data

PGP Signed Data

PGP Signed and Encrypted Data

PGP Signature Certificate

ASCII-armored PGP Public Keyring

ASCII-armored PGP encoded

ASCII-armored PGP signed

OLE DIB object

PGP Compressed Data

SGI Image

Lotus Screen Cam

MPEG Audio

File Class

Sound

Sound

Sound

Sound

Word processor

Spreadsheet

Animation

Animation

Database

Communications

Encapsulation

Vector graphic

Vector graphic

Encapsulation

Encapsulation

Encapsulation

Encapsulation

Encapsulation

Encapsulation

Encapsulation

Encapsulation

Encapsulation

Raster image

Encapsulation

Raster image

Animation

Sound

XML Export SDK C Programming Guide

361

362

Appendix E File Format Detection

Number

270

271

272

273

266

267

268

269

274

275

276

262

263

264

265

258

259

260

261

254

255

256

257

250

251

252

253

Format

FTP Session Data

Netscape Bookmark file

Corel Draw CMX

AutoCAD Drawing (DWG)

AutoDesk WHIP

Macromedia Director

Real Audio

MS DOS Device Driver

Micrografx Designer

Simple Vector format (SVF)

WordPerfect Office document (WPD)

Applix Words

Applix Graphics

Microsoft Access

Usenet format

MacBinary

Apple Single

Apple Double

Lotus Word Pro

Microsoft Word 97, 2000

Enhanced Window Metafile

Microsoft Office Drawing

Microsoft PowerPoint 97, 2000

Extended or Custom XML

Device Independent file (DVI)

Unicode

Framework

File Class

Communications

Word processor

Vector image

Vector graphic

Vector graphic

Animation

Sound

Executable

Vector graphic

Vector graphic

Word processor

Presentation

Database

Word processor

Encapsulation

Encapsulation

Encapsulation

Word processor

Word processor

Vector graphic

Vector graphic

Presentation

Word processor

Vector graphic

Word processor

Mixed

XML Export SDK C Programming Guide

Category Values in formats_e.ini

294

295

296

297

298

299

290

291

292

293

286

287

288

289

300

301

302

Number

281

282

283

284

285

277

278

279

280

Format

KPIF Chart Stream

Applix Spreadsheet

Microsoft Device Independent Bitmap

KeyView GPF Filter

Microsoft Project 98, 2000, 2002

Folio Flat file

HWP (Arae-Ah Hangul)

JustSystems Ichitaro

Generic XML format

Microsoft Office 2003 XML format

2

Fujitsu Oasys

Portable Bitmap Utilities (PBM)

Portable Greymap Utilities (PGM)

Portable Pixmap Utilities (PPM)

X Bitmap (XBM)

X Pixmap (XPM)

X Image

PCD Image

Microsoft Visio

Microsoft Outlook (MSG)

XHTML document

Microsoft Outlook Personal Folders file (PST)

WinRAR Compressed Archive format (RAR)

Lotus Notes Database (NSF)

Legato Extender ONM

Macromedia Flash

Microsoft Word 2007 (XML format)

Microsoft Excel 2007 (XML format)

File Class

Spreadsheet

Raster image

Time scheduling

Word processor

Word processor

Word processor

Word processor

Word processor

Raster image

Raster image

Raster image

Raster image

Raster image

Raster image

Raster image

Presentation

Encapsulation

Word processor

Encapsulation

Encapsulation

Encapsulation

Word processor

Word processor

Spreadsheet

XML Export SDK C Programming Guide

363

364

Appendix E File Format Detection

Number

324

325

326

327

320

321

322

323

328

329

330

315

316

317

319

311

312

313

314

307

308

309

310

303

304

305

306

Format

Microsoft PowerPoint 2007 (XML format)

Open PGP (new format packets only)

Intergraph version 7 DGN

Microstation version 8 DGN

Microsoft Word 2007 Macro

Microsoft Excel 2007 Macro

Microsoft PowerPoint Macro

Microsoft Compression folder (LZH)

Office 2007 Document

XML Paper Specification

Lotus Domino Extensible Language

OASIS Open Document (ODT)

OASIS Open Document (ODS)

OASIS Open Document (ODP)

Legato EMailXtender Native Message

Transfer Neutral Encapsulation Format (TNEF)

CADAM Drawing

CADAM Drawing Overlay

NURSTOR Drawing

HP Graphics Language (Plotter)

Advanced Systems Format

Windows Media Audio Format

Windows Media Video Format

Legato EMailXtender Archive

7-Zip

Microsoft Office 2007 Excel Binary Format

Microsoft Cabinet File

File Class

Presentation

Encapsulation

Vector graphic

Vector graphic

Word processor

Spreadsheet

Presentation

Encapsulation

Miscellaneous

Word processor

Encapsulation

Word processor

Spreadsheet

Presentation

Word Processor

Encapsulation

Vector graphic

Vector graphic

Vector graphic

Vector graphic

Miscellaneous

Sound

Movie

Encapsulation

Encapsulation

Spreadsheet

Encapsulation

XML Export SDK C Programming Guide

Number

351

352

353

354

347

348

349

350

355

356

357

343

344

345

346

339

340

341

342

335

336

337

338

331

332

333

334

Format

CATIA formats

Yahoo! Instant Messenger

Founder Chinese E-paper Basic

Corel Quattro Pro X4

MIME HTML

Microsoft Document Imaging Format

Microsoft Office Groove File Format

Apple iWorks Pages

Apple iWorks Numbers

Apple iWorks Keynote

Microsoft Backup File

Microsoft Access 2007

Microsoft Entourage Database

Mac Disk Copy Disk Image File

Appleworks File

Omni Outliner (OO3) File

Omni Outliner (OPML) File

Omni Graffle XML File

Apple Photoshop Document

Apple Binary Property List

Apple iChat Format

Omni Outliner (OOUTLINE) File

Bzip 2 Compressed File

ISO-9660 CD Disc Image Format

Xerox DocuWorks

RealMedia Streaming Media

AC3 Audio File Format

XML Export SDK C Programming Guide

Category Values in formats_e.ini

File Class

Vector graphic

Word processor

Word processor

Spreadsheet

Word processor

Raster image

Word processor

Word processor

Spreadsheet

Presentation

Encapsulation

Database

Encapsulation

Encapsularion

Word processor

Word processor

Word processor

Vector graphic

Raster image

Miscellaneous

Word processor

Word processor

Encapsulation

Encapsulation

Word processor

Movie

Sound

365

366

Appendix E File Format Detection

381

382

383

384

377

378

379

380

385

386

387

388

389

371

372

373

374

375

376

Number

358

359

366

367

368

370

Format

Nero Encrypted File

SolidWorks

Extensible Forms Description Language

Apple XML Property List

OneNote Note Format

Digital Imaging and Communications in Medicine

(DICOM)

Expert Witness Compression Format

Shell Scrap Object File

Microsoft Project 2007

Microsoft Publisher 98–

Skype Log File

Lotus Notes Bitmap Format (DXL embedded images)

Health level7 message

Microsoft Outlook Offline Storage File

Open Publication Structure eBook

Microsoft Outlook Express DBX

BlackBerry Activation File

Disk Image

Milestone

RealLegal E-Transcript File

PostScript Type 1 Font

Ghost Disk Image File

JPEG-2000 JP2 File Format Syntax (ISO/IEC

15444-1)

Unicode HTML

Microsoft Compiled HTML Help

File Class

Encapsulation

Vector graphic

Presentation

Miscellaneous

Presentation

Raster image

Encapsulation

Encapsulation

Time scheduling

Desktop publishing

Word processor

Raster image

Word processor

Encapsulation

Word processor

Encapsulation

Word processor

Encapsulation

Raster Image

Word processor

Font

Encapsulation

Raster Image

Word processor

Encapsulation

XML Export SDK C Programming Guide

Category Values in formats_e.ini

Number Format File Class

390

393

395

397

409

412

414

Documentum EMCMF

JBIG2 File

AD1 Evidence file

Group Wise File Surf email

Microsoft Outlook for Macintosh

Microsoft Outlook vCard Contact

Microsoft Outlook iCalendar

Encapsulation

Raster image

Encapsulation

Encapsulation

Encapsulation

Word processor

Encapsulation

1. If the major version is 100, the file format is XHTML.

2. The major version determines whether the Microsoft Office XML file is a Word, Excel or Visio document. The major version for each format is as follows:

Word: 100

Excel: 101

Visio: 110

XML Export SDK C Programming Guide

367

368

Appendix E File Format Detection

Attribute

Number

12

13

14

15

08

09

10

11

04

05

06

07

0

01

02

03

16

17

18

19

20

21

File Class

No file class

Word processor

Spreadsheet

Database

Raster image

Vector graphic

Presentation

Executable

Encapsulation

Sound

Desktop publishing

Outline/planning

Miscellaneous

Mixed format

Font

Time scheduling

Communications

Object module

Library module

Fax

Movie

Animation

XML Export SDK C Programming Guide

22

23

24

25

18

19

20

21

Attribute

Number

12

13

14

15

08

09

10

11

16

17

04

05

06

07

00

01

02

03

Minor Format

Minor format not defined

Standard

Book

Chart

Macro

Text

Binary

PC

Windows

DOS

Macintosh

RGB

TIFF

IFF

Experimental

Format Information

RLE

Symbol

Old

Footnote

Style

Palette

Configuration

Activity

Resource

Calculation

XML Export SDK C Programming Guide

Category Values in formats_e.ini

369

Appendix E File Format Detection

31

32

33

34

27

28

29

30

Attribute

Number

26

Minor Format

Glossary

Spelling

Thesaurus

Hyphenation

Miscellaneous

UNIX

VAX

Driver

Archive

370

• XML Export SDK C Programming Guide

A PPENDIX F

File Formats and Extensions

This section lists the KeyView file format numbers and their associated file extensions. It contains the following topics:

File Format and Extension Table

File Format and Extension Table

Table lists the KeyView file format codes and the file extensions they are most

commonly associated with.

NOTE

Table

is not a complete list of file extensions. KeyView returns format codes based on file content, which cannot always be predicted from the file extension. Some file extensions may also be associated with multiple format numbers.

XML Export SDK C Programming Guide

371

Appendix F File Formats and Extensions

372

Format Name

AES_Multiplus_Comm_Fmt

ASCII_Text_Fmt

MSDOS_Batch_File_Fmt

Applix_Alis_Fmt

BMP_Fmt

CT_DEF_Fmt

Corel_Draw_Fmt

CGM_ClearText_Fmt

CGM_Binary_Fmt

CGM_Character_Fmt

Word_Connection_Fmt

COMET_TOP_Word_Fmt

CEOwrite_Fmt

DSA101_Fmt

DCA_RFT_Fmt

CDA_DDIF_Fmt

DG_CDS_Fmt

Micrografx_Draw_Fmt

Data_Point_VistaWord_Fmt

DECdx_Fmt

Enable_WP_Fmt

EPSF_Fmt

Preview_EPSF_Fmt

11

12

13

14

15

16

17

18

19

20

21

22

23

9

10

7

8

3

4

1

2

5

6

Format

Number Format Description

Multiplus (AES)

Text

MS-DOS Batch File

APPLIX ASTERIX

Windows Bitmap

Convergent Technologies DEF

Comm. Format

Corel Draw

Computer Graphics Metafile

(CGM)

Computer Graphics Metafile

(CGM)

Computer Graphics Metafile

(CGM)

Word Connection

COMET TOP

CEOwrite

DSA101 (Honeywell Bull)

DCA-RFT (IBM Revisable

Form)

CDA / DDIF

DG Common Data Stream

(CDS)

Windows Draw (Micrografx)

Vistaword

DECdx

Enable Word Processing

Encapsulated PostScript

Encapsulated PostScript

Associated File

Extension

PTF

BAT

AX

BMP

CDR

CGM

CGM

CGM

CN

CW

RFT

CDS

DRW

DX

WPF

EPS

1

EPS

1

1

1

1

XML Export SDK C Programming Guide

File Format and Extension Table

Format Name

MS_Executable_Fmt

G31D_Fmt

GIF_87a_Fmt

GIF_89a_Fmt

HP_Word_PC_Fmt

IBM_1403_LinePrinter_Fmt

IBM_DCF_Script_Fmt

IBM_DCA_FFT_Fmt

Interleaf_Fmt

GEM_Image_Fmt

IBM_Display_Write_Fmt

Sun_Raster_Fmt

Ami_Pro_Fmt

Ami_Pro_StyleSheet_Fmt

MORE_Fmt

Lyrix_Fmt

MASS_11_Fmt

MacPaint_Fmt

MS_Word_Mac_Fmt

SmartWare_II_Comm_Fmt

MS_Word_Win_Fmt

Multimate_Fmt

Multimate_Fnote_Fmt

Multimate_Adv_Fmt

Multimate_Adv_Fnote_Fmt

40

41

42

43

36

37

38

39

44

45

46

47

48

32

33

34

35

28

29

30

31

Format

Number

24

25

26

27

Format Description

MSDOS/Windows Program

CCITT G3 1D

Graphics Interchange Format

(GIF87a)

Graphics Interchange Format

(GIF89a)

HP Word PC

IBM 1403 Line Printer

DCF Script

DCA-FFT (IBM Final Form)

Interleaf

GEM Bit Image

Display Write

Sun Raster

Lotus Ami Pro

Lotus Ami Pro Style Sheet

MORE Database MAC

Lyrix Word Processing

MASS-11

MacPaint

Microsoft Word for Macintosh

SmartWare II

Microsoft Word for Windows

MultiMate

MultiMate Footnote File

MultiMate Advantage

MultiMate Advantage Footnote

File

Associated File

Extension

EXE

GIF

GIF

HW

I4

IC

IF

IMG

IP

RAS

SAM

M1

1

1

PNTG

DOC

1

DOC

MM

FNX

1

1

1

XML Export SDK C Programming Guide

373

Appendix F File Formats and Extensions

374

Format Name

Multimate_Adv_II_Fmt

Multimate_Adv_II_Fnote_Fmt

Multiplan_PC_Fmt

Multiplan_Mac_Fmt

MS_RTF_Fmt

MS_Word_PC_Fmt

MS_Word_PC_StyleSheet_Fmt

MS_Word_PC_Glossary_Fmt

MS_Word_PC_Driver_Fmt

MS_Word_PC_Misc_Fmt

NBI_Async_Archive_Fmt

Navy_DIF_Fmt

NBI_Net_Archive_Fmt

NIOS_TOP_Fmt

FileMaker_Mac_Fmt

ODA_Q1_11_Fmt

ODA_Q1_12_Fmt

OLIDIF_Fmt

Office_Writer_Fmt

PC_Paintbrush_Fmt

CPT_Comm_Fmt

Lotus_PIC_Fmt

Mac_PICT_Fmt

Philips_Script_Word_Fmt

PostScript_Fmt

71

72

73

67

68

69

70

63

64

65

66

59

60

61

62

56

57

58

51

52

53

54

55

Format

Number

49

50

Format Description

MultiMate Advantage II

Associated File

Extension

MM

1

FNX

1

MultiMate Advantage II

Footnote File

Multiplan (PC)

Multiplan (Mac)

Rich Text Format (RTF)

Microsoft Word for PC

RTF

DOC

1

DOC

1

Microsoft Word for PC Style

Sheet

Microsoft Word for PC Glossary DOC

1

Microsoft Word for PC Driver DOC

1

Microsoft Word for PC

Miscellaneous File

DOC

1

NBI Async Archive Format

Navy DIF

NBI Net Archive Format

NIOS TOP

ND

NN

Filemaker MAC

ODA / ODIF

ODA / ODIF

OLIDIF (Olivetti)

FP5, FP7

OD

OD

1

1

Office Writer OW

PC Paintbrush Graphics (PCX) PCX

CPT

Lotus PIC

QuickDraw Picture

Philips Script

PIC

PCT

PostScript PS

XML Export SDK C Programming Guide

File Format and Extension Table

Format Name

PRIMEWORD_Fmt

Quadratron_Q_One_v1_Fmt

Quadratron_Q_One_v2_Fmt

SAMNA_Word_IV_Fmt

Ami_Pro_Draw_Fmt

SYLK_Spreadsheet_Fmt

SmartWare_II_WP_Fmt

Symphony_Fmt

Targa_Fmt

TIFF_Fmt

Targon_Word_Fmt

Uniplex_Ucalc_Fmt

Uniplex_WP_Fmt

MS_Word_UNIX_Fmt

WANG_PC_Fmt

WordERA_Fmt

WANG_WPS_Comm_Fmt

WordPerfect_Mac_Fmt

WordPerfect_Fmt

WordPerfect_VAX_Fmt

WordPerfect_Macro_Fmt

WordPerfect_Dictionary_Fmt

WordPerfect_Thesaurus_Fmt

WordPerfect_Resource_Fmt

WordPerfect_Driver_Fmt

WordPerfect_Cfg_Fmt

96

97

98

99

84

85

86

87

80

81

82

83

76

77

78

79

Format

Number

74

75

92

93

94

95

88

89

90

91

Format Description

PRIMEWORD

Q-One V1.93J

Q-One V2.0

SAMNA Word

Lotus Ami Pro Draw

SYLK

SmartWare II

Symphony

Targa

TIFF

Targon Word

Uniplex Ucalc

Uniplex

Microsoft Word UNIX

WANG PC

WordERA

WANG WPS

WordPerfect MAC

WordPerfect

WordPerfect VAX

WordPerfect Macro

WordPerfect Spelling

Dictionary

WordPerfect Thesaurus

WordPerfect Resource File

WordPerfect Driver

WordPerfect Configuration File

Associated File

Extension

Q1

Q1

WF

1

1

, QX

, QX

SAM

SDW

WR1

TGA

TIF, TIFF

TW

SS

UP

DOC

1

1

1

WPM, WPD

WO, WPD

WPD

1

1

1

XML Export SDK C Programming Guide

375

Appendix F File Formats and Extensions

376

Format Name

WordPerfect_Hyphenation_Fmt

WordPerfect_Misc_Fmt

WordMARC_Fmt

Windows_Metafile_Fmt

Windows_Metafile_NoHdr_Fmt

SmartWare_II_DB_Fmt

WordPerfect_Graphics_Fmt

WordStar_Fmt

WANG_WITA_Fmt

Xerox_860_Comm_Fmt

Xerox_Writer_Fmt

DIF_SpreadSheet_Fmt

Enable_Spreadsheet_Fmt

SuperCalc_Fmt

UltraCalc_Fmt

SmartWare_II_SS_Fmt

SOF_Encapsulation_Fmt

PowerPoint_Win_Fmt

PowerPoint_Mac_Fmt

PowerPoint_95_Fmt

PowerPoint_97_Fmt

PageMaker_Mac_Fmt

PageMaker_Win_Fmt

MS_Works_Mac_WP_Fmt

MS_Works_Mac_DB_Fmt

110

111

112

113

114

115

116

106

107

108

109

102

103

104

105

121

122

123

124

117

118

119

120

Format

Number

100

101

Format Description

WordPerfect Hyphenation

Dictionary

Associated File

Extension

WordPerfect Miscellaneous

File

WPD

1

WordMARC

Windows Metafile

WM, PW

WMF

1

Windows Metafile (no header) WMF

1

SmartWare II

WordPerfect Graphics

WordStar

WANG WITA

WPG, QPG

WS

WT

Xerox 860

Xerox Writer

Data Interchange Format (DIF) DIF

Enable Spreadsheet SSF

Supercalc

UltraCalc

CAL

SmartWare II

Serialized Object Format

(SOF)

SOF

PowerPoint PC

PowerPoint MAC

PowerPoint 95

PowerPoint 97

PPT

1

PPT

1

PPT

1

PPT

1

PageMaker for Macintosh

PageMaker for Windows

Microsoft Works for MAC

Microsoft Works for MAC

XML Export SDK C Programming Guide

File Format and Extension Table

Format Name

MS_Works_Mac_SS_Fmt

MS_Works_Mac_Comm_Fmt

MS_Works_DOS_WP_Fmt

MS_Works_DOS_DB_Fmt

MS_Works_DOS_SS_Fmt

MS_Works_Win_WP_Fmt

MS_Works_Win_DB_Fmt

MS_Works_Win_SS_Fmt

PC_Library_Fmt

MacWrite_Fmt

MacWrite_II_Fmt

Freehand_Fmt

Disk_Doubler_Fmt

HP_GL_Fmt

FrameMaker_Fmt

FrameMaker_Book_Fmt

Maker_Markup_Language_Fmt

Maker_Interchange_Fmt

JPEG_File_Interchange_Fmt

Reflex_Fmt

Framework_Fmt

Framework_II_Fmt

Paradox_Fmt

MS_Windows_Write_Fmt

Quattro_Pro_DOS_Fmt

Quattro_Pro_Win_Fmt

135

136

137

138

131

132

133

134

139

140

141

142

127

128

129

130

Format

Number

125

126

147

148

149

150

143

144

145

146

Format Description

Microsoft Works for MAC

Microsoft Works for MAC

Microsoft Works for DOS

Microsoft Works for DOS

Microsoft Works for DOS

Microsoft Works for Windows

Microsoft Works for Windows

Microsoft Works for Windows

DOS/Windows Object Library

MacWrite

MacWrite II

Freehand MAC

Disk Doubler

HP Graphics Language

FrameMaker

FrameMaker

Maker Markup Language

Maker Interchange Format

(MIF)

Interchange Format

Reflex

Framework

Framework II

Paradox

Windows Write

Quattro Pro for DOS

Quattro Pro for Windows

Associated File

Extension

WPS

WDB

WPS

WDB

FW3

DB

WRI

1

1

S30, S40

HPGL

FM, FRM

BOOK

MIF

1

1

JPG, JPEG

WB2, WB3

XML Export SDK C Programming Guide

377

Appendix F File Formats and Extensions

378

Format Name

Persuasion_Fmt

Windows_Icon_Fmt

Windows_Cursor_Fmt

MS_Project_Activity_Fmt

MS_Project_Resource_Fmt

MS_Project_Calc_Fmt

PKZIP_Fmt

Quark_Xpress_Fmt

ARC_PAK_Archive_Fmt

MS_Publisher_Fmt

PlanPerfect_Fmt

WordPerfect_Auxiliary_Fmt

MS_WAVE_Audio_Fmt

MIDI_Audio_Fmt

AutoCAD_DXF_Binary_Fmt

AutoCAD_DXF_Text_Fmt dBase_Fmt

OS_2_PM_Metafile_Fmt

Lasergraphics_Language_Fmt

AutoShade_Rendering_Fmt

GEM_VDI_Fmt

Windows_Help_Fmt

Volkswriter_Fmt

Ability_WP_Fmt

Ability_DB_Fmt

Ability_SS_Fmt

Ability_Comm_Fmt

169

170

171

172

165

166

167

168

173

174

175

176

177

161

162

163

164

157

158

159

160

153

154

155

156

Format

Number

151

152

Format Description

Persuasion

Windows Icon Format

Windows Cursor

Microsoft Project

Microsoft Project

Microsoft Project

ZIP Archive

Quark Xpress MAC

PAK/ARC Archive

Microsoft Publisher

PlanPerfect

WordPerfect auxiliary file

Microsoft Wave

MIDI

AutoCAD DXF

AutoCAD DXF dBase

OS/2 PM Metafile

Lasergraphics Language

AutoShade Rendering

GEM VDI

Windows Help File

Volkswriter

Ability

Ability

Ability

Ability

VDI

HLP

VW4

XML Export SDK C Programming Guide

Associated File

Extension

ICO

CUR

MPP

1

MPP

1

MPP

1

ZIP

ARC, PAK

PUB

1

WPW

WAV

MID, MIDI

DXF

1

DXF

1

DBF

MET

File Format and Extension Table

Format Name

Ability_Image_Fmt

XyWrite_Fmt

CSV_Fmt

IBM_Writing_Assistant_Fmt

WordStar_2000_Fmt

HP_PCL_Fmt

UNIX_Exe_PreSysV_VAX_Fmt

UNIX_Exe_Basic_16_Fmt

UNIX_Exe_x86_Fmt

UNIX_Exe_iAPX_286_Fmt

UNIX_Exe_MC68k_Fmt

UNIX_Exe_3B20_Fmt

UNIX_Exe_WE32000_Fmt

UNIX_Exe_VAX_Fmt

UNIX_Exe_Bell_5_Fmt

UNIX_Obj_VAX_Demand_Fmt

UNIX_Obj_MS8086_Fmt

UNIX_Obj_Z8000_Fmt

AU_Audio_Fmt

NeWS_Font_Fmt cpio_Archive_CRChdr_Fmt cpio_Archive_CHRhdr_Fmt

PEX_Binary_Archive_Fmt

Sun_vfont_Fmt

Curses_Screen_Fmt

189

190

191

192

193

185

186

187

188

194

199

200

201

202

195

196

197

198

181

182

183

184

Format

Number

178

179

180

Format Description

Ability

XYWrite / Nota Bene

CSV (Comma Separated

Values)

IBM Writing Assistant

WordStar 2000

HP Printer Control Language

Unix Executable (PDP-11/ pre-System V VAX)

Unix Executable (Basic-16)

Unix Executable (x86)

Unix Executable (iAPX 286)

Unix Executable (MC680x0)

Unix Executable (3B20)

Unix Executable (WE32000)

Unix Executable (VAX)

Unix Executable (Bell 5.0)

Unix Object Module (VAX

Demand)

Unix Object Module (old MS

8086)

Unix Object Module (Z8000)

NeXT/Sun Audio Data

NeWS bitmap font cpio archive (CRC Header) cpio archive (CHR Header)

SUN PEX Binary Archive

SUN vfont Definition

Curses Screen Image

Associated File

Extension

XY4

CSV

IWA

WS2

PCL

AU

XML Export SDK C Programming Guide

379

Appendix F File Formats and Extensions

380

Format Name

UUEncoded_Fmt

WriteNow_Fmt

PC_Obj_Fmt

Windows_Group_Fmt

TrueType_Font_Fmt

Windows_PIF_Fmt

MS_COM_Executable_Fmt

StuffIt_Fmt

PeachCalc_Fmt

Wang_GDL_Fmt

Q_A_DOS_Fmt

Q_A_Win_Fmt

WPS_PLUS_Fmt

DCX_Fmt

OLE_Fmt

EBCDIC_Fmt

DCS_Fmt

UNIX_SHAR_Fmt

Lotus_Notes_BitMap_Fmt

Lotus_Notes_CDF_Fmt

Compress_Fmt

GZ_Compress_Fmt

TAR_Fmt

ODIF_FOD26_Fmt

ODIF_FOD36_Fmt

ALIS_Fmt

Envoy_Fmt

221

222

223

224

217

218

219

220

225

226

227

228

229

213

214

215

216

209

210

211

212

205

206

207

208

Format

Number

203

204

Format Description

UU encoded

WriteNow MAC

PC (.COM)

StuffIt (MAC)

PeachCalc

WANG Office GDL Header

Associated File

Extension

UUE

DOS/Windows Object Module

Windows Group

TrueType Font TTF

Program Information File (PIF) PIF

COM

HQX

Q & A for DOS

Q & A for Windows

WPS-PLUS

JW

WPL

DCX FAX Format(PCX images DCX

OLE Compound Document

EBCDIC Text

OLE

DCS

SHAR

Lotus Notes Bitmap

Lotus Notes CDF

SHAR

Unix Compress

GZ Compress

TAR

ODA / ODIF

CDF

Z

GZ

1

TAR

F26

F36 ODA / ODIF

ALIS

Envoy EVY

XML Export SDK C Programming Guide

File Format and Extension Table

Format Name

PDF_Fmt

BinHex_Fmt

SMTP_Fmt

MIME_Fmt

USENET_Fmt

SGML_Fmt

HTML_Fmt

ACT_Fmt

PNG_Fmt

MS_Video_Fmt

Windows_Animated_Cursor_Fmt

239

240

Windows_CPP_Obj_Storage_Fmt 241

Windows_Palette_Fmt 242

RIFF_DIB_Fmt 243

RIFF_MIDI_Fmt

RIFF_Multimedia_Movie_Fmt

MPEG_Fmt

QuickTime_Fmt

AIFF_Fmt 248

244

245

246

247

Amiga_MOD_Fmt

Amiga_IFF_8SVX_Fmt

Creative_Voice_Audio_Fmt

AutoDesk_Animator_FLI_Fmt

AutoDesk_AnimatorPro_FLC_Fmt 253

Compactor_Archive_Fmt 254

249

250

251

252

232

233

234

235

Format

Number

230

231

236

237

238

Format Description

Portable Document Format

BinHex

SMTP

MIME

2

USENET

SGML

HTML

ACT

Portable Network Graphics

(PNG)

Video for Windows (AVI)

Windows Animated Cursor

Windows C++ Object Storage

Windows Palette

RIFF Device Independent

Bitmap

RIFF MIDI

RIFF Multimedia Movie

MPEG Movie

QuickTime Movie, MPEG-4

Audio

Audio Interchange File Format

(AIFF)

Amiga MOD

Amiga IFF (8SVX) Sound

Creative Voice (VOC)

AutoDesk Animator FLIC

AutoDesk Animator Pro FLIC

Compactor / Compact Pro

SGML

HTM

1

, HTML

1

ACT

PNG

AVI

ANI

PAL

RMI

MPG

1

, MPEG

MOV, QT, MP4

AIF, AIFF

MOD

IFF

VOC

FLI

FLC

Associated File

Extension

PDF

HQX

SMTP

EML, MBX

XML Export SDK C Programming Guide

381

Appendix F File Formats and Extensions

382

Format Name

VRML_Fmt

QuickDraw_3D_Metafile_Fmt

PGP_Secret_Keyring_Fmt

PGP_Public_Keyring_Fmt

PGP_Encrypted_Data_Fmt

PGP_Signed_Data_Fmt

257

258

259

260

PGP_SignedEncrypted_Data_Fmt 261

Format

Number

255

256

PGP_Sign_Certificate_Fmt

PGP_Compressed_Data_Fmt

PGP_ASCII_Public_Keyring_Fmt

262

263

264

PGP_ASCII_Encoded_Fmt

PGP_ASCII_Signed_Fmt

OLE_DIB_Fmt

SGI_Image_Fmt

Lotus_ScreenCam_Fmt

MPEG_Audio_Fmt

FTP_Software_Session_Fmt

Netscape_Bookmark_File_Fmt

Corel_Draw_CMX_Fmt

AutoDesk_DWG_Fmt

AutoDesk_WHIP_Fmt

Macromedia_Director_Fmt

Real_Audio_Fmt

MSDOS_Device_Driver_Fmt

Micrografx_Designer_Fmt

SVF_Fmt

277

278

279

280

273

274

275

276

269

270

271

272

265

266

267

268

Format Description

VRML

QuickDraw 3D Metafile

PGP Secret Keyring

PGP Public Keyring

PGP Encrypted Data

PGP Signed Data

PGP Signed and Encrypted

Data

Associated File

Extension

WRL

PGP Signature Certificate

PGP Compressed Data

ASCII-armored PGP Public

Keyring

ASCII-armored PGP encoded PGP

1

ASCII-armored PGP encoded PGP

1

OLE DIB object

SGI Image

Lotus ScreenCam

MPEG Audio

RGB

FTP Session Data

Netscape Bookmark File

Corel CMX

AutoDesk Drawing (DWG)

AutoDesk WHIP

Macromedia Director

Real Audio

MSDOS Device Driver

Micrografx Designer

Simple Vector Format (SVF)

MPEGA

STE

HTM

1

CMX

DWG

WHP

DCR

RM

SYS

DSF

SVF

XML Export SDK C Programming Guide

File Format and Extension Table

Format Name

Applix_Words_Fmt

Applix_Graphics_Fmt

MS_Access_Fmt

MS_Access_95_Fmt

MS_Access_97_Fmt

MacBinary_Fmt

Apple_Single_Fmt

Apple_Double_Fmt

Enhanced_Metafile_Fmt

MS_Office_Drawing_Fmt

XML_Fmt

DeVice_Independent_Fmt

Unicode_Fmt

Lotus_123_Worksheet_Fmt

Lotus_123_Format_Fmt

Lotus_123_97_Fmt

Lotus_Word_Pro_96_Fmt

Lotus_Word_Pro_97_Fmt

Freelance_DOS_Fmt

Freelance_Win_Fmt

Freelance_OS2_Fmt

Freelance_96_Fmt

Freelance_97_Fmt

MS_Word_95_Fmt

MS_Word_97_Fmt

Excel_Fmt

Excel_Chart_Fmt

299

300

301

302

295

296

297

298

303

304

305

306

307

291

292

293

294

287

288

289

290

283

284

285

286

Format

Number

281

282

Format Description

Applix Words

Applix Graphics

Microsoft Access

Microsoft Access 95

Microsoft Access 97

MacBinary

Apple Single

Apple Double

Enhanced Metafile

Microsoft Office Drawing

EMF

XML

XML

DeVice Independent file (DVI) DVI

1

Unicode

Lotus 1-2-3

Lotus 1-2-3 Formatting

Lotus 1-2-3 97

Lotus Word Pro 96

Lotus Word Pro 97

Lotus Freelance for DOS

Lotus Freelance for Windows

UNI

WK1

1

FM3

WK1

1

LWP

1

LWP

1

Lotus Freelance for OS/2

Lotus Freelance 96

Lotus Freelance 97

Microsoft Word 95

Microsoft Word 97

Microsoft Excel

Microsoft Excel

PRE

PRS

PRZ

1

PRZ

1

DOC

1

DOC

1

XLS

1

XLS

1

Associated File

Extension

AW

AG

MDB

1

MDB

1

MDB

1

BIN

XML Export SDK C Programming Guide

383

Appendix F File Formats and Extensions

384

Format Name

Excel_Macro_Fmt

Excel_95_Fmt

Excel_97_Fmt

Corel_Presentations_Fmt

Harvard_Graphics_Fmt

Harvard_Graphics_Chart_Fmt

Harvard_Graphics_Symbol_Fmt

Harvard_Graphics_Cfg_Fmt

Harvard_Graphics_Palette_Fmt

Lotus_123_R9_Fmt

Applix_Spreadsheets_Fmt

MS_Pocket_Word_Fmt

MS_DIB_Fmt

MS_Word_2000_Fmt

Excel_2000_Fmt

PowerPoint_2000_Fmt

MS_Access_2000_Fmt

MS_Project_4_Fmt

MS_Project_41_Fmt

MS_Project_98_Fmt

Folio_Flat_Fmt

HWP_Fmt

ICHITARO_Fmt

IS_XML_Fmt

Oasys_Fmt

316

317

318

319

320

310

311

312

313

Format

Number

308

309

314

315

325

326

327

328

321

322

323

324

329

330

331

332

Format Description

Microsoft Excel

Microsoft Excel 95

Associated File

Extension

XLS

1

XLS

1

XLS

1

XFD, XFDL

Microsoft Excel 97

Corel Presentations

Harvard Graphics

Harvard Graphics Chart CH3, CHT

Harvard Graphics Symbol File SY3

Harvard Graphics

Configuration File

Harvard Graphics Palette

Lotus 1-2-3 Release 9

Applix Spreadsheets

Microsoft Pocket Word

MS Windows Device

Independent Bitmap

AS

PWD, DOC

1

Microsoft Word 2000

Microsoft Excel 2000

Microsoft PowerPoint 2000

Microsoft Access 2000

Microsoft Project 4

Microsoft Project 4.1

Microsoft Project 98

Folio Flat File

DOC

1

XLS

1

PPT

MDB

1

, MPP

1

MPP

1

MPP

1

MPP

1

FFF

HWP HWP(Arae-Ah Hangul)

ICHITARO V4-10

Extended or Custom XML

Oasys format

XML

1

OA2, OA3

XML Export SDK C Programming Guide

File Format and Extension Table

Format Name

PBM_ASC_Fmt

PBM_BIN_Fmt

PGM_ASC_Fmt

PGM_BIN_Fmt

PPM_ASC_Fmt

PPM_BIN_Fmt

XBM_Fmt

XPM_Fmt

FPX_Fmt

PCD_Fmt

MS_Visio_Fmt

MS_Project_2000_Fmt

MS_Outlook_Fmt

ELF_Relocatable_Fmt

ELF_Executable_Fmt

ELF_Dynamic_Lib_Fmt

MS_Word_XML_Fmt

MS_Excel_XML_Fmt

MS_Visio_XML_Fmt

SO_Text_XML_Fmt

SO_Spreadsheet_XML_Fmt

SO_Presentation_XML_Fmt

XHTML_Fmt

336

337

338

351

352

353

354

355

347

348

349

350

343

344

345

346

339

340

341

342

Format

Number

333

334

335

Format Description

Portable Bitmap Utilities ASCII

Format

Portable Bitmap Utilities Binary

Format

Portable Greymap Utilities

ASCII Format

Portable Greymap Utilities

Binary Format

Portable Pixmap Utilities ASCII

Format

Portable Pixmap Utilities Binary

Format

X Bitmap Format

X Pixmap Format

FPX Format

PCD Format

Microsoft Visio

Microsoft Project 2000

Microsoft Outlook

ELF Relocatable

ELF Executable

ELF Dynamic Library

Microsoft Word 2003 XML

Microsoft Excel 2003 XML

Microsoft Visio 2003 XML

StarOffice Text XML

StarOffice Spreadsheet XML

StarOffice Presentation XML

XHTML

Associated File

Extension

PGM

XBM

XPM

FPX

PCD

VSD

MPP

1

MSG, OFT

O

SO

XML

1

XML

1

VDX

SXW

1

, ODT

1

SXC

1

, ODS

1

SXI

1

, SXP

1

, ODP

1

XML

1

XML Export SDK C Programming Guide

385

Appendix F File Formats and Extensions

386

Format Name

MS_OutlookPST_Fmt

RAR_Fmt

Lotus_Notes_NSF_Fmt

Macromedia_Flash_Fmt

MS_Word_2007_Fmt

MS_Excel_2007_Fmt

MS_PPT_2007_Fmt

OpenPGP_Fmt

Intergraph_V7_DGN_Fmt

MicroStation_V8_DGN_Fmt

MS_Word_Macro_2007_Fmt

MS_Excel_Macro_2007_Fmt

MS_PPT_Macro_2007_Fmt

LZH_Fmt

Office_2007_Fmt

MS_XPS_Fmt

Lotus_Domino_DXL_Fmt

ODF_Text_Fmt

ODF_Spreadsheet_Fmt

ODF_Presentation_Fmt

365

366

367

368

369

370

371

372

359

360

361

362

363

Format

Number

356

357

358

364

373

374

375

Format Description

Microsoft Outlook PST

RAR

IBM Lotus Notes Database

NSF/NTF

SWF

Microsoft Word 2007 XML

Microsoft Excel 2007 XML

Microsoft PPT 2007 XML

OpenPGP Message Format

(with new packet format)

Intergraph Standard File

Format (ISFF) V7 DGN

(non-OLE)

MicroStation V8 DGN (OLE)

Microsoft Word Macro 2007

XML

Microsoft Excel Macro 2007

XML

Microsoft PPT Macro 2007

XML

LHA Archive

Office 2007 document

Microsoft XML Paper

Specification (XPS)

IBM Lotus representation of

Domino design elements in

XML format

ODF Text

ODF Spreadsheet

ODF Presentation

Associated File

Extension

PST

RAR

NSF

SWF

DOCX, DOTX

XLSX, XLTX

PPTX, POTX, PPSX

PGP

DGN

1

DGN

PPTM, POTM,

PPSM, PPAM

LZH, LHA

XLSB

XPS

DXL

1

DOCM, DOTM

XLSM, XLTM, XLAM

ODT

ODS

1

SXD

1

1

ODP

1

, SXW

, SXC

, SXI

1

1

1

, STW

, STC

, ODG

1

,

XML Export SDK C Programming Guide

File Format and Extension Table

Format Name

Legato_Extender_ONM_Fmt bin_Unknown_Fmt

TNEF_Fmt

CADAM_Drawing_Fmt

CADAM_Drawing_Overlay_Fmt

NURSTOR_Drawing_Fmt

HP_GLP_Fmt

ASF_Fmt

WMA_Fmt

WMV_Fmt

EMX_Fmt

Z7Z_Fmt

MS_Excel_Binary_2007_Fmt

CAB_Fmt

CATIA_Fmt

YIM_Fmt

ODF_Drawing_Fmt

Founder_CEB_Fmt

QPW_Fmt

MHT_Fmt

MDI_Fmt

379

380

381

382

383

384

Format

Number

376

377

378

385

386

387

388

389

390

391

392

393

394

395

396

Format Description

Legato Extender Native

Message ONM n/a

Transport Neutral

Encapsulation Format (TNEF)

CADAM Drawing

CADAM Drawing Overlay

NURSTOR Drawing

HP Graphics Language

(Plotter)

Advanced Systems Format

(ASF)

Window Media Audio Format

(WMA)

Window Media Video Format

(WMV)

Legato EMailXtender Archives

Format (EMX)

7 Zip Format(7z)

Microsoft Excel Binary 2007

Microsoft Cabinet File (CAB)

CATIA Formats (CAT*)

Yahoo Instant Messenger

History

ODF Drawing

Founder Chinese E-paper

Basic (ceb)

Quattro Pro 9+ for Windows

MHT format

2

Microsoft Document Imaging

Format

Associated File

Extension

ONM various

CDD

CDO

NUR

HPG

ASF

WMA

WMV

EMX

7Z

XLSB

CAB

CAT

3

DAT

1

SXD

1

CEB

QPW

MHT

MDI

, SXI

1

, ODG

1

XML Export SDK C Programming Guide

387

Appendix F File Formats and Extensions

388

Format Name

GRV_Fmt

IWWP_Fmt

IWSS_Fmt

IWPG_Fmt

BKF_Fmt

MS_Access_2007_Fmt

ENT_Fmt

DMG_Fmt

CWK_Fmt

OO3_Fmt

OPML_Fmt

Omni_Graffle_XML_File

PSD_Fmt

Apple_Binary_PList_Fmt

Apple_iChat_Fmt

OOUTLINE_Fmt

BZIP2_Fmt

ISO_Fmt

DocuWorks_Fmt

RealMedia_Fmt

AC3Audio_Fmt

NEF_Fmt

SolidWorks_Fmt

XFDL_Fmt

404

405

406

407

408

409

410

399

400

401

402

403

Format

Number

397

398

415

416

417

418

419

411

412

413

414

420

Format Description

Associated File

Extension

Microsoft Office Groove Format GRV

Apple iWork Pages format PAGES, GZ

1

Apple iWork Numbers format

Apple iWork Keynote format

NUMBERS, GZ

1

KEY, GZ

1

Windows Backup File

Microsoft Access 2007

Microsoft Entourage Database

Format

BKF

ACCDB

Mac Disk Copy Disk Image File

AppleWorks File

Omni Outliner File

Omni Outliner File

Omni Graffle XML File

Photoshop Document

Apple Binary Property List format

OO3

OPML

GRAFFLE

PSD

Apple iChat format

OOutliner File

Bzip 2 Compressed File

ISO-9660 CD Disc Image

Format

DocuWorks Format

RealMedia Streaming Media

AC3 Audio File Format

Nero Encrypted File

SolidWorks Format Files

Extensible Forms Description

Language

OOUTLINE

BZ2

ISO

XDW

RM, RA

AC3

NEF

SLDASM, SLDPRT,

SLDDRW

XFDL, XFD

XML Export SDK C Programming Guide

File Format and Extension Table

Format Name

Apple_XML_PList_Fmt

OneNote_Fmt

Dicom_Fmt

EnCase_Fmt

Scrap_Fmt

MS_Project_2007_Fmt

MS_Publisher_98_Fmt

Skype_Fmt

Hl7_Fmt

MS_OutlookOST_Fmt

Epub_Fmt

MS_OEDBX_Fmt

BB_Activ_Fmt

DiskImage_Fmt

Milestone_Fmt

E_Transcript_Fmt

PostScript_Font_Fmt

Ghost_DiskImage_Fmt

JPEG_2000_JP2_File_Fmt

Unicode_HTML_Fmt

CHM_Fmt

EMCMF_Fmt

437

438

439

440

441

442

443

426

427

428

429

430

431

432

433

Format

Number

421

422

424

425

434

435

436

Format Description

Apple XML Property List format

OneNote Note Format

Digital Imaging and

Communications in Medicine

Expert Witness Compression

Format (EnCase)

Associated File

Extension

ONE

DCM

E01, L01, Lx01

Shell Scrap Object File

Microsoft Project 2007

Microsoft Publisher 98/2000/

2002/2003/2007/

Skype Log File

Health level7 message

Microsoft Outlook OST

Electronic Publication

Microsoft Outlook Express

DBX

SHS

MPP

PUB

DBB

HL7

OST

EPUB

DBX

1

1

BlackBerry Activation File

Disk Image

Milestone Document

DAT

1

RealLegal E-Transcript File

PostScript Type 1 Font

Ghost Disk Image File

MLS, ML3, ML4,

ML5, ML6, ML7,

ML8, ML9

PTX

PFB

GHO, GHS

JPEG-2000 JP2 File Format

Syntax (ISO/IEC 15444-1)

Unicode HTML

JP2, JPF, J2K,

JPWL, JPX, PGX

HTM

Microsoft Compiled HTML Help CHM

1

, HTML

1

Documentum EMCMF format EMCMF

XML Export SDK C Programming Guide

389

Appendix F File Formats and Extensions

390

Format Name

MS_Access_2007_Tmpl_Fmt

Jungum_Fmt

JBIG2_Fmt

EFax_Fmt

AD1_Fmt

SketchUp_Fmt

GWFS_Email_Fmt

JNT_Fmt

Yahoo_yChat_Fmt

PaperPort_MAX_File_Fmt

ARJ_Fmt

RPMSG_Fmt

MAT_Fmt

SGY_Fmt

CDXA_MPEG_PS_Fmt

EVT_Fmt

EVTX_Fmt

MS_OutlookOLM_Fmt

WARC_Fmt

JAVACLASS_Fmt

VCF_Fmt

455

456

457

458

459

460

461

462

463

464

450

451

452

453

454

446

447

448

449

Format

Number

444

445

Format Description

Microsoft Access 2007

Template

Samsung Electronics Jungum

Global document

JBIG2 File Format eFax file

AD1 Evidence file

Google SketchUp

Group Wise File Surf email

Windows Journal format

Yahoo! Messenger chat log

PaperPort image file

ARJ (Archive by Robert Jung) file format

Microsoft Outlook Restricted

Permission Message

MATLAB file format

SEG-Y Seismic Data format

MPEG-PS container with

CDXA stream

Microsoft Windows NT Event

Log

Microsoft Windows Vista Event

Log

Microsoft Outlook for

Macintosh format

Web ARChive

Java Class format

Microsoft Outlook vCard file format

Associated File

Extension

ACCDT

GUL

JB2, JBIG2

EFX

AD1

SKP

GWFS

JNT

YCHAT

MAX

ARJ

RPMSG

MAT, FIG

SGY, SEGY

MPG

1

EVT

EVTX

OLM

WARC

CLASS

VCF

XML Export SDK C Programming Guide

File Format and Extension Table

Format Name

EDB_Fmt

ICS_Fmt

Format

Number

465

466

Format Description

Microsoft Exchange Server

Database file format

Microsoft Outlook iCalendar file format

Microsoft Visio 2013

Microsoft Visio 2013 macro

Associated File

Extension

EDB

ICS, VCS

MS_Visio_2013_Fmt

MS_Visio_2013_Macro_Fmt

467

468

VSDX, VSTX, VSSX

VSDM, VSTM, VSSM

1. This file extension can return more than one format number.

2. MHT, EML, and MBX files may return either format 2, 233 or 395, depending on the text contained in the file. In general, files that contain fields such as To, From, Date, or Subject are considered e-mail messages; files that contain fields such as content-type and mime-version are considered to be MHT files; and files that do not contain any of those fields are considered to be text files.

3. All CAT file extensions, for example CATDrawing, CATProduct, CATPart, and so on.

XML Export SDK C Programming Guide

391

Appendix F File Formats and Extensions

392

• XML Export SDK C Programming Guide

A PPENDIX G

Extract and Format Lotus

Notes Sub Files

This section describes how to create XML templates to alter the appearance of extracted Lotus mail note sub-files so that they maintain the look and feel of the original notes.

Overview

Customize XML Templates

Template Elements and Attributes

Date and Time Formats

Overview

KeyView uses the NSF reader, nsfsr, to extract Lotus database files, and places

Lotus mail notes in sub-files. The NSF reader uses a set of default XML templates to extract the notes and apply formatting, thereby approximating the look and feel of the original notes.

In some cases, you might need to customize the XML templates, for instance if your notes contain custom data. In such cases, you can modify the existing XML templates or create your own.

XML Export SDK C Programming Guide

393

Appendix G Extract and Format Lotus Notes Sub Files

394

During extraction, the NSF reader loads all XML files in the NSFtemplates directory and its subdirectories (except for the NSFtemplates\images directory, which is reserved for images). During initialization, the KeyView XML parser verifies the XML templates. If the templates contain any invalid XML, elements, or attributes, initialization fails and errors are recorded in the nsfsr.log

file.

Customize XML Templates

XML templates are enabled by default. In most cases, the default templates should be sufficient; however, you can customize them or create your own as required.

To customize XML templates for Lotus note extraction

1. Modify the template files in the following directory.

install\OS\bin\NSFtemplates

The main.xml file must exist in the NSFtemplates directory. It is the top-level template file that extracts all sub-files, usually by calling other templates.

2. Ensure that any modifications or additional XML files conform to the supported elements and attributes described in

“Template Elements and Attributes” on page .

3. Extract the Lotus database file.

Use Demo Templates

For testing purposes, you can extract notes using a set of demo templates, which are provided to demonstrate the proper usage of all the XML elements and attributes, because the default templates do not use all the XML elements.

The demo templates are available at:

install\OS\bin\NSFtemplates\demo

To use the demo XML templates

1. In the formats.ini file, set the following parameter.

[nsfsr]

UseDemoTemplate=1

2. In the main.xml file, uncomment the following section.

XML Export SDK C Programming Guide

Template Elements and Attributes

<ifini name="UseDemoTemplate" text="1">

<call file="demo.xml"/>

<quit/>

</ifini>

Use Old Templates

For testing purposes, you can extract notes using legacy templates, which produce MHTML output. You can generate similar output by disabling the XML templates, but using the old templates allows you to see the XML code and compare it to the standard and demo templates.

To use the old XML templates

1. In the formats.ini file, set the following parameter.

[nsfsr]

UseOldTemplate=1

2. In the main.xml file, uncomment the following section.

<ifini name="UseOldTemplate" text="1">

<call file="default_old.xml"/>

<quit/>

</ifini>

Disable XML Templates

For testing purposes, you can disable XML templates; KeyView will extract the notes in MHTML format. You can compare the MHTML output directly by the NSF reader with the MHTML output indirectly by the NSF reader through the XML templates.

To disable XML templates

 In the formats.ini file, set the following parameter.

[nsfsr]

ExtractByTemplate=0

Template Elements and Attributes

This section lists the valid XML elements and attributes that you can use when creating or modifying templates. Refer to the demo templates for examples.

XML Export SDK C Programming Guide

395

Appendix G Extract and Format Lotus Notes Sub Files

396

Conditional Elements

The following table lists the valid conditional elements.

Element Description

<keyview>

<if*>

<ifex>, <ifnx>

KeyView XML template container (“root”) element

If condition from comparison is true, process XML.

Conditions can be nested up to 25 levels deep.

Attributes

 name . (Required) Name of main item to compare to item or text.

 item . (Required if no text) Name of item to compare to item specified by name.

 text . (Required if no item) Text to compare to item specified by name.

If name item exists and has a text value or not.

The Notes item might have a value that cannot be converted to text, such as an image.

<ifeq>, <ifne>,

<iflt>, <ifle>,

<ifgt>, <ifge>

Respectively, if text ==, !=, <, >, <=, >, >=.

Text comparison uses a case-insensitive string compare.

<iftdeq>, <iftdne>,

<iftdlt>, <iftdle>,

<iftdgt>, <iftdge>

Respectively, if time/date ==, !=, <, >, <=, >, >=.

Time/date comparison converts dates to text in local time using the Notes default, TZFMT_NEVER, because Notes also sometimes converts fields to text internally. For example: text="06/30/2005 02:52:04 PM"

<iftzeq>, <iftzne> Respectively, if the time zone equals or does not equal the comparison text, for example CDT, EST, and so on.

<ifini> If the value of the INI option specified in name equals the text value.

<else>

<switch>

If the condition from the last <if> or <switch> was false, process XML.

If name value exists, process XML.

Attributes

 name . (Required) Name of main item to compare in

<case> sub-elements.

XML Export SDK C Programming Guide

Template Elements and Attributes

<iftdeq>, <iftdne>,

<iftdlt>, <iftdle>,

<iftdgt>, <iftdge>

Respectively, if time/date ==, !=, <, >, <=, >, >=.

Time/date comparison converts dates to text in local time using the Notes default, TZFMT_NEVER, because Notes also sometimes converts fields to text internally. For example: text="06/30/2005 02:52:04 PM"

<iftzeq>, <iftzne> Respectively, if the time zone equals or does not equal the comparison text, for example CDT, EST, and so on.

<ifini> If the value of the INI option specified in name equals the text value.

<else>

<switch>

If the condition from the last <if> or <switch> was false, process XML.

If name value exists, process XML.

Attributes

 name . (Required) Name of main item to compare in

<case> sub-elements.

XML Export SDK C Programming Guide

397

398

Appendix G Extract and Format Lotus Notes Sub Files

Control Elements

The following table lists the valid control elements.

Element

<call>

<log>

<quit>

<stop>

Description

Call another XML template. You can nest templates up to

10 levels deep.

Attributes

 file . (Required) Template file name. Must be unique.

Log message to the NSF log file.

Attributes

 text . (Required) Text to log.

 type . (Optional) Type of log message. The following values are valid.

ERROR

WARN

INFO

DIAG (default)

DEBUG

DUMP

Quit processing the template. Exits without error.

Attributes

 text . (Optional) Text to log.

 type

. (Optional) Type of log message. See <log>

.

Stop processing the template. Exits with an ERROR type of log message.

Attributes

 text . (Required) Text to log.

XML Export SDK C Programming Guide

Template Elements and Attributes

Data Elements

The following table lists the valid data elements.

Element

<text>

<rich>

<body>

<form>

<addr>

<name>

Description

Output text.

Attributes

 name . (Required if no parent) Name of the item to output.

Output rich text (MHTML). Images are output in the next part or parts of the MHTML, after the first <HTML> part.

Attributes

 name . (Required if no parent) Name of the item to output.

Output the message body in rich text (MHTML). As with

<rich>

, images are output in the next part or parts of the

MHTML.

Output the message form (usually $Body field) in rich text

(MHTML).

Attributes

 name . (Required if no parent) Name of the item to output.

Output an address.

Attributes

 name . (Required if no parent) Name of the item to output.

 type . (Optional) Type of address to output. If you use this attribute, you must set it to CN (Common Name), which is the only supported type.

Output the name of the last name item, or in other words the current main item. The item must exist.

XML Export SDK C Programming Guide

399

400

Appendix G Extract and Format Lotus Notes Sub Files

Element

<format>

<date>

<date_kv>

Description

Set default format for <date> and <date_kv>. This element does not set the <text> format. See

“Date and

Time Formats” on page

for a list of all Notes and

KeyView date and time formats and integer values

Attributes

 format . (Optional. Omit to reset to defaults) Notes and

KeyView date/time format. You can set the following formats:

TD=int . Time Date format (TDFMT_*)

TS=int . Time Show format (TSFMT_*)

TT=int . Time Time format (TTFMT_*)

TZ=int . Time Zone format (TZFMT_*)

KV=int . KeyView date and time format.

where int is an integer value that corresponds to the desired format.

Separate multiple formats with commas. For example: format="TD=0,TS=2,TT=1,TZ=1,KV=55"

Output a Notes date.

Attributes

 name . (Required if no parent) Name of the item to output.

 format . (Optional) See

<format>

. You can set the following values:

TD

TS

TT

TZ

Output a KeyView date.

Attributes

 name . (Required if no parent) Name of the item to output.

 format . (Optional) See

<format>

. You can set the following values:

TZ

KV

XML Export SDK C Programming Guide

Date and Time Formats

Element

<time>

<zone>

<zone_utc>

<logo>

<image>

<image_uri>

Description

Output a time range, for example 1 hour, 30 minutes.

Attributes

 name . (Required if no parent) Item name of the start date/time.

 item . (Required) Item name of the end date/time.

Output a Notes time zone mnemonic, for example MST.

Attributes

 name . (Required if no parent) Name of date item to output.

Output a time zone as UTC, for example (UTC-06:00).

Output the mail header logo.

The image link is output; the actual image is output to a different part of the MHTML sub-file.

Output an image.

The image link is output; the actual image is output to the

MHTML next part, as with

<rich> and <body> .

Output an image URI, in quotes. The actual image is output to a different part of the MHTML sub-file.

Attributes

 link . (Required if no file) The image link, such as a form or title name. For example: link=”StdNotesLtr0”

 file . (Required if no link) Image file name. The file must exist in the ../../templates/images directory. For example: file=”boxcheck.gif”

Date and Time Formats

This section lists the supported Notes and KeyView date/time formats for use with

<format> , <date> , and <date_kv>

.

XML Export SDK C Programming Guide

401

402

Appendix G Extract and Format Lotus Notes Sub Files

Lotus Notes Date and Time Formats

Table lists supported Lotus Notes date and time formats, and the integer

values that specify each one.

Format

TDFMT_FULL

TDFMT_CPARTIAL

TDFMT_PARTIAL

TDFMT_DPARTIAL

TDFMT_FULL4

TDFMT_CPARTIAL4

TDFMT_DPARTIAL4

TTFMT_FULL

TTFMT_PARTIAL

TTFMT_HOUR

TZFMT_NEVER

TZFMT_SOMETIMES

TZFMT_ALWAYS

TSFMT_DATE

TSFMT_TIME

TSFMT_DATETIME

TSFMT_CDATETIME

1

2

6

0

0

1

2

2

0

4

2

3

0

1

4

5

Integer

Value

1

Description

(Notes default) Year, month, and day.

Month and day, year if not this year.

Month and day.

Year and month.

Four-digit year, month, and day.

Month and day, four-digit year if not this year.

Four-digit year and month

(Notes default) Hour, minute, and second.

Hour and minute.

Hour.

(Notes default) All time zones are converted to current time zone.

Show only when outside the current time zone.

Show for all time zones.

Date.

Time.

(Notes default) Date and time.

Date and time, or time Today or time

Yesterday.

XML Export SDK C Programming Guide

Date and Time Formats

KeyView Date and Time Formats

Table lists KeyView date and time formats. The KeyView formats use the

following syntax:

Month

Weekday

Year

Day

Time

Separators

Month = full month name

Mon = abbreviated month name.

m = month (number) mm = two-digit month (leading 0)

Weekday = full weekday name

Wday = abbreviated weekday name yy = two-digit year yyyy = four-digit year d = day (number) dd = two-digit day (leading 0) h = 12-hour

H = 24-hour m = minutes s = seconds

P = AM/PM p = am/pm

_ = space c = comma s = slash a = dash o = dot

Format

12-Hour and 24-Hour Time Formats

KVDTF_P

KVDTF_P_hmm

KVDTF_hmm_P

XML Export SDK C Programming Guide

Output

P

P h:mm h:mm P

1

2

3

Integer

Value

403

Appendix G Extract and Format Lotus Notes Sub Files

404

Format

KVDTF_P_hhmm

KVDTF_hhmm_P

KVDTF_P_hhmmss

KVDTF_hhmmss_P

KVDTF_Hmm

KVDTF_HHmm

KVDTF_mmss

KVDTF_Hmmss

KVDTF_HHmmss

Numerical Date Formats with Slashes

KVDTF_mmsdd

KVDTF_msdsyy

KVDTF_mmsddsyy

KVDTF_mmsddsyyyy

KVDTF_ddsmm

KVDTF_ddsmmsyy

KVDTF_ddsmmsyy_Hmm

KVDTF_ddsmm_P_hmm

KVDTF_ddsmm_hmm_P

KVDTF_ddsmm_P_hhmm

KVDTF_ddsmm_hhmm_P

KVDTF_ddsmmsyy_P_hmm

KVDTF_ddsmmsyy_hmm_P

KVDTF_ddsmmsyy_P_hmmss

KVDTF_ddsmmsyy_hmmss_P

KVDTF_ddsmmsyy_P_hhmmss

KVDTF_ddsmmsyy_hhmmss_P

Output

P hh:mm hh:mm P

P hh:mm:ss hh:mm:ss P

H:mm

HH:mm mm:ss

H:mm:ss

HH:mm:ss mm/dd m/d/yy mm/dd/yy mm/dd/yyyy dd/mm dd/mm/yy dd/mm/yy H:mm dd/mm P h:mm dd/mm h:mm P dd/mm P hh:mm dd/mm hh:mm P dd/mm/yy P h:mm dd/mm/yy h:mm P dd/mm/yy P h:mm:ss dd/mm/yy h:mm:ss P dd/mm/yy P hh:mm:ss dd/mm/yy hh:mm:ss P

Integer

Value

8

9

10

11

12

6

7

4

5

25

26

27

28

29

21

22

23

24

17

18

19

20

13

14

15

16

XML Export SDK C Programming Guide

Format

KVDTF_yysmmsdd_P_hhmmss

KVDTF_yysmmsdd_hhmmss_P

KVDTF_msdsyy_Hmm

KVDTF_mmsddsyy_Hmm

KVDTF_msdsyy_P_hmm

KVDTF_msdsyy_hmm_P

KVDTF_mmsddsyy_hmm_P

KVDTF_mmsdd_P_hhmm

KVDTF_mmsdd_hhmm_P

KVDTF_mmsddsyy_P_hhmmss

KVDTF_mmsddsyy_hhmmss_P

KVDTF_msd

KVDTF_yysm

KVDTF_yysmm

KVDTF_yysmsd

KVDTF_yysmmsdd

KVDTF_yyyysmmsdd

Numerical Date Formats with Dashes

KVDTF_ddammayy

KVDTF_mmadd

KVDTF_mmayy

KVDTF_yyammadd

KVDTF_yyyyammadd

KVDTF_yyyyammaddaHHmmss

Numerical Date Formats with Dots

KVDTF_yyomod

KVDTF_yyommodd

XML Export SDK C Programming Guide

Output yy/mm/dd P hh:mm:ss yy/mm/dd hh:mm:ss P m/d/yy H:mm mm/dd/yy H:mm m/d/yy P h:mm m/d/yy h:mm P mm/dd/yy h:mm P mm/dd P hh:mm mm/dd hh:mm P mm/dd/yy P hh:mm:ss mm/dd/yy hh:mm:ss P m/d yy/m yy/mm yy/m/d yy/mm/dd yyyy/mm/dd dd-mm-yy mm-dd mm-yy yy-mm-dd yyyy-mm-dd yyyy-mm-dd-HH:mm:ss yy.m.d

yy.mm.dd

Date and Time Formats

47

48

49

50

51

52

53

54

Integer

Value

42

43

44

45

46

38

39

40

41

34

35

36

37

30

31

32

33

405

Appendix G Extract and Format Lotus Notes Sub Files

406

Format Output

KVDTF_mod m.d

KVDTF_mmodd mm.dd

Numerical/String Date Formats with Dashes, Commas, and Spaces

KVDTF_ddaMon

KVDTF_daMonayy

KVDTF_ddaMonayy

KVDTF_ddaMonayyyy dd-Mon d-Mon-yy dd-Mon-yy dd-Mon-yyyy

KVDTF_Mon

KVDTF_Monayy

KVDTF_Monayyyy

KVDTF_Monaddayy

KVDTF_yyammadd_P_hhmmss

KVDTF_mmadd_P_hhmm

KVDTF_Mon_yy

KVDTF_Monc_yy

Mon

Mon-yy

Mon-yyyy

Mon-dd-yy yy-mm-dd P hh:mm:ss mm-dd P hh:mm

Mon yy

Mon, yy

KVDTF_Month

KVDTF_Monthayy

KVDTF_Month_yy

KVDTF_Monthc_yy

KVDTF_Monthayyyy

KVDTF_Month_yyyy

KVDTF_Monthc_yyyy

KVDTF_Mon_dc_yyyy

KVDTF_d_Monc_yyyy

KVDTF_yyyy_Mon_d

KVDTF_Month_dc_yyyy

KVDTF_d_Monthc_yyyy

Month

Month-yy

Month yy

Month, yy

Month-yyyy

Month yyyy

Month, yyyy

Mon d, yyyy d Mon, yyyy yyyy Mon d

Month d, yyyy d Month, yyyy

XML Export SDK C Programming Guide

69

70

71

72

65

66

67

68

61

62

63

64

57

58

59

60

77

78

79

80

73

74

75

76

Integer

Value

55

56

Format

KVDTF_yyyy_Month_d

Weekday Date Formats

KVDTF_Wday

KVDTF_Weekday

KVDTF_Wdayc_Mon_dc_yyyy

KVDTF_Weekdayc_Month_dc_yyyy

KVDTF_Weekdayc_d_Monthc_yyyy

Output yyyy Month d

Wday

Weekday

Wday, Mon d, yyyy

Weekday, Month d, yyyy

Weekday, d Month, yyyy

Date and Time Formats

82

83

84

85

86

Integer

Value

81

XML Export SDK C Programming Guide

407

Appendix G Extract and Format Lotus Notes Sub Files

408

• XML Export SDK C Programming Guide

A PPENDIX H

Password Protected Files

This section lists supported password-protected container and non-container files and describes how to open them.

Supported Password Protected File Types

Open Password Protected Container Files

Export Password Protected Files

Supported Password Protected File Types

Table lists the password-protected file types that KeyView supports.

Symbol

S

V

Y

N

P

C

Description

Format is supported.

Format is not supported.

Support for viewing sub-files.

Support for viewing content.

Password required.

Password and certificate or User

ID file required.

XML Export SDK C Programming Guide

409

Appendix H Password Protected Files

File Type

PST (Windows)

PST (non-Windows)

1

ZIP

7-Zip

RAR

SMIME in MSG,

EML, MBX

Lotus Notes NSF

Adobe PDF

Microsoft Office

Version n/a n/a n/a n/a n/a n/a n/a n/a

97-2003

2007

2010

Filter

N

N

N

N

N

N

N

Y

Y

Export

N

N

N

N

N

N

N

Y

Y

Extract

Y

Y

Y

Y

Y

Y

Y

Y

Y

View

S

S

S

S

S

N

N

V

V

Credentials

P

P

P

N

P

C

C

P

P

1. The native PST reader, pstnsr, does not require credentials to open password-protected PST files that use Compressible Encryption.

410

Open Password Protected Container Files

This section describes how to extract password-protected container files using the

C API. The following guidelines apply to specific file types.

Lotus Notes NSF files. If you are running a Notes client with an active user connected to a Domino server, you must specify the user’s password as a credential regardless of whether the NSF files you are opening are protected.

This allows KeyView to access the Notes client and the Lotus Notes API. If the

Notes client is not running with an active user, KeyView does not require credentials to access the client.

PST files.To open password-protected PST files that use High Encryption

(Microsoft Outlook 2003 only), you must use the MAPI-based PST reader

(pstsr). The native PST reader (pstnsr) returns the error message

KVERR_PasswordProtected if a PST is encrypted with High Encryption.

XML Export SDK C Programming Guide

Export Password Protected Files

To open container files

1. Define the credential information in the KVOpenFileArg data structure. See

“KVOpenFileArg” on page .

2. Pass KVOpenFileArg to the fpOpenFile() function. See

“fpOpenFile()” on page .

3. Call fpCloseFile(). See

“fpCloseFile()” on page .

Export Password Protected Files

This section describes how to export password-protected non-container files with the C API.

To export password-protected files

1. Call the fpInit() function. See

“fpInit()” on page .

2. Call the KVXMLConfig() function with the following arguments (see

“KVXMLConfig()” on page ):

Argument nType nValue pData

Parameter

KVCFG_SETPASSWORD

TRUE

The source file password. The password is a null-terminated string with a maximum length of

255 characters (the final byte is null).

For example:

(*fpXMLConfig)(pKVXML, KVCFG_SETPASSWORD, TRUE, password); where password is a null-terminated string of 255 or fewer characters.

3. Call the fpConvertStream() or KVXMLConvertFile() function. See

“fpConvertStream()” on page

or

214

.

XML Export SDK C Programming Guide

411

Appendix H Password Protected Files

412

• XML Export SDK C Programming Guide

Index

Symbols

$ANCHOR

329

$BASE

329

$CHARSET

329

$CONTENT

330

$ENDNOTE

330

$FOOTER

330

$FOOTNOTE

330

$FOOTNOTEALL

330

$HEADER

330

$MAINURL

330

$NAME

330

$NEXT

330

$PREV

330

$SPLITBLOCKNUMBER

331

$STYLESHEET

330

$SUMMARY

331

$SUMMARYNN

331

$TOC

331

$TOCB

331

$TOCBE

331

$TOCE

331

$TOCPE

331

$TOCTE

331

$TOPANCHOR

331

$USERCB

332

$USERSUMMARY

332

$XANCHOR

332

7-Zip

295

7-Zip reader

327

A

absolute text positioning

PDF

58 ,

205 –

212

XML Export SDK C Programming Guide

Abstract Windowing Toolkit

109

access layer

41

AD1

295

AD1 Evidence file reader

322

ad1sr

322

ADDOCINFO

169 ,

177

,

234 ,

239

adInfo

239

adinfo.h

31

,

34 ,

233 ,

234

,

239

,

270

Adobe Maker Interchange Format (MIF)

310

reader

325

Adobe PDF

299

advanced document readers enabling in an existing installation

33

license information

32

Lotus Notes database (NSF)

86

Mailbox (MBX)

81

Microsoft Outlook Personal Folders

82

Advanced Systems Format (ASF)

304

afsr

322

allocating memory

42 ,

236

Ami Pro Graphics reader

325

anchor

36

token

329 ,

330

,

331

Animated cursor reader

323

ANSI (TXT)

309

Apple iChat Log

310

Apple iChat Log reader

323

Apple iWork

Keynote (GZ)

305

Numbers (GZ)

307

Pages (GZ)

310

Apple iWork Keynote reader

324

Apple iWork Numbers reader

323

Apple iWork Pages reader

323

Applix

413

Index

414

Presents (AG)

305

Presents reader

323

Spreadsheets (AS)

307

Spreadsheets reader

322

Words (AW)

310

Words reader

322

architecture

40

archive formats

295

ASCII (TXT)

309

reader

322

assr

322

attachment external path

149 ,

154 ,

177 ,

179

Audio Interchange File Format (AIFF)

304

AutoCAD Drawing

Exchange Format (DXF)

297

format (DWG)

297

AutoCAD Drawing Exchange format reader

323

AutoCAD Drawing format reader

323

AutoCAD reader

324

automatic heading generation

248 ,

267 ,

280

awsr

322

AWT

See Abstract Windowing Toolkit

B

bAllowHeadingsInTables

268

base URL token

329

bEnableEmptyRows

260

bentofio

321

bForceOutputCharSet

100 ,

104 ,

256

bForceSrcCharSet

100 ,

256

bGenerateURLs

258

bHardPageMakesNewBlock

265

bi-directional text

333

in PDF file

112

right-to-left (RTL) tag

341

Big Endian

163 ,

167

,

169 ,

179

binary files supported

296

bIndexOnly

255

BinHex

295

reader

325

bKeepServantAlive

217

bkfsr

322

block chunks

36

blocks

36

bMustBeBold

249

bMustBeItalic

249

bMustBeUnderlined

249

bNbspEmptyCells

256

bNoMultiSpaces

250

bNonZeroIndent

249

bNoTabs

249

bookmarks

28 ,

205 ,

207

converting to XLinks

205 ,

207

bPutBlocksInSeparateFiles

265

bRasterizeFiles

258

bRemoveEmptyColumns

260

bRemoveEmptyRows

260

bSupportCellSpan

260

bSupportColumnHeadings

259

bSupportColumnWidth

260

bSupportRowHeadings

259

bSupportRowSpan

260

bUseDocumentColors

256

bUseDocumentFontInfo

256

bUseExistingStyleSheet

255

bUseVerityDTD

254

Bzip2

295

bzip2sr

322

C

C API configure XML element extraction

122

enable logical order for PDF files

114

enabling logical order for PDF files

208

extensible stylesheet language

108

extracting sub file metadata

73

, 75

extracting sub files

61 ,

70

map styles

105

opening a file

61

,

70

running in out-of-process mode

47

XML Export SDK C Programming Guide

style sheets

108

cabsr

322

cache configuring

42

CAD. See Computer-aided design

callback functions

36 ,

225

Cascading Style Sheets

54

, 108 ,

140

CATIA

297

cbAnchorMax

229

cbHTML

229

cbmap

321

cbString

238

cebsr

322

character encoding supported

333

character entities

63

character set determining output

99

force output

256

force source

256

license information

32

mapping

99

setting during file extraction

104

setting source

99

, 103 ,

239

setting target

100

supported

341

token

329

character styles

104

charset

239

chartbls.ux

321

childArray

153 ,

180

chmdll

321

chmsr

322

chunks

36

CloseFile()

62

,

147

, 157

closing a file

62 ,

147 ,

157

cnv2xml sample program

35 ,

136

cnv2xmloop sample program

35 ,

137

Comma-Separated Values (CSV)

307

reader

322

compound documents

68

Computer Graphics Metafile (CGM)

299

XML Export SDK C Programming Guide

D writer

323

Computer Graphics Metafile reader

323

Computer-aided design

286 ,

297

configuration options setting

205

ConnectRetry

46

ConnectRetryInterval

46

container files

51 ,

68

archive

295

default filenames

91

determining number of sub file

151 ,

169

,

170

email

302

example tree structure

69

recreating file hierarchy

70 –

72 ,

153

,

174

sub file infoflag

177

supported

302

Continue()

227

conversion options

52 ,

253 ,

262 ,

267

setting using template files

53

setting using the API

53

converting spreadsheets

117 –

119

XML files

120

126

,

140

ConvertStream()

61

,

186

,

195

, 252

Corel

CorelDraw (CDR)

299

Draw reader

323

Presentations (SHW)

305

Presentations reader

325

Quattro Pro (QPW, WB3)

307

cRedact

259

credentials defining for protected files

160 ,

173

cReplaceChar

259

CSS template

54

csvsr

322

cxVectorToRasterXRes

258 ,

261

cyVectorToRasterYRes

258 ,

261

D

Data Interchange Format (DIF)

307

Data Interchange Format reader

322

415

Index

416

• dBase

298

dBase Database reader

322

dbfsr

322

dbxsr

322

DCA/RFT reader

322

dcasr

322

DCX (fax) reader

323

DCX Fax System

299

definition of terms

36

deleted text

110

detectPSTbyExtension

85

difsr

322

Digital Imaging and Communications in Medicine

299

directory structure

34

DiskCacheSize

42

DisplayWrite (IP)

310

reader

322

dmgsr

322

document readers

41

,

352

document type

99

,

239

Documentum EMCMF

302

Domino XML Language

302

Domino XML Language reader

322

DTD

63

character entities

63

modifying

64

root element

63

dw4sr

322

dwFlags

106

,

242

dxlsr

322

E

eClass

234

,

239

eEmptyParaType

259

eFormat

234

,

239

eHardPageBreakType

259

eKVFormat

123 ,

245

email files supported

302

embedded OLE objects

51

,

68

converting using Conversion APIs

90

converting using File Extraction APIs

90

linked

148 ,

153 ,

177

naming convention

92

reader

326

writer

320

emlsr

323

emxsr

323

Encapsulated PostScript (EPS)

299 ,

335

reader

324

encase2sr

323

encasesr

323

EndCharStyle

241

endnote token

330

ENdocAttributes

234

ENDocClass

234

ENdocFmt

234

,

239 ,

245

Enhanced Windows Metafile (EMF)

299

reader

324

ENSATableBorder

271

entsr

323

eOutputLanguageID

256

eOutputRasterGraphicType

258

eOutputVectorGraphicType

109

, 110 ,

258

epubsr

323

error codes

272

extended

245

,

274

Outlook PST

276

eSATableBorder

257

eSrcCharSet

103 ,

256

Executable (EXE)

296

Expat XML parser

322

Expert Witness Compression Format (Encase)

295

Expert Witness Compression Format (EnCase) v6 reader

323

Expert Witness Compression Format (EnCase) v7 reader

323

Export Demo sample program

35 ,

56 –

59 ,

142

extended error codes

245 ,

274

Extensible Forms Description Language

305

Extensible Forms Description Language reader

325

ExtractSubFile()

61 ,

70 ,

104 ,

148 ,

176

XML Export SDK C Programming Guide

F

F

file cache

42

configuring

42

file extraction extract sub file

148

extraction flags

164

extraction path

165 ,

173 ,

176

get main file information

151 ,

169

get sub file information

153 ,

176 ,

178

get sub file metadata from mail formats

155

,

167 ,

181

input parameters

163

Lotus Domino XML (DXL)

85

Lotus Notes database (NSF)

86

Mailbox (MBX)

81

Microsoft Outlook

80

Microsoft Outlook Express (EML)

80

Microsoft Outlook Personal Folders

81

output to file

165

output to stream

165

PDF file

90

sub file properties

177

ZIP file

91

File Extraction interface

52

entry point

60 ,

69 ,

146

file hierarchy

70 –

72

childArray

153 ,

180

parentIndex

153 ,

180

file time

EPOCH

286

filenames default for sub files

91

FileToInputStreamCreate()

48 ,

60

,

61 ,

189 ,

190

FileToInputStreamFree()

189

, 190 ,

191 ,

192

FileToOutputStreamCreate()

60 ,

191 ,

252

FileToOutputStreamFree()

192

, 252

flags extraction flags

164

KVCFG_DELSOFTHYPHEN

116 ,

208

KVCFG_DISABLEZONE

207

KVCFG_ENABLEPOSITIONINFO

207

XML Export SDK C Programming Guide

KVCFG_INCLREVISIONMARK

209

, 211

KVCFG_INCLTRACKCHANGES

111

KVCFG_LOGICALPDF

114 ,

208

KVCFG_PG_HIDECOMMENT

127 ,

129 ,

209

KVCFG_PG_HIDEHIDDENSLIDE

126

, 129 ,

209

KVCFG_PG_SHOWCOMMENTSSLIDE

127

,

129

,

210

KVCFG_PG_SHOWSLIDENOTES

127

, 129 ,

210

KVCFG_SETPASSWORD

210 ,

211 ,

411

KVCFG_SETTEMPDIRECTORY

208

KVCFG_SETXMLCONFIGINFO

122

, 208

KVCFG_SS_SHOWCOMMENTS

126

, 129 ,

209

KVCFG_SS_SHOWFORMULA

126 ,

129 ,

209

KVCFG_SS_SHOWHIDDENINFOR

126

, 129 ,

209

KVCFG_SUPPRESSIMAGES

255

KVCFG_SUPPRESSTOCPRINTIMAGE

207

KVCFG_WP_NOCOMMENTS

126 ,

129 ,

209

KVCFG_WP_SHOWDATEFIELDCODE

126

,

129

,

209

KVCFG_WP_SHOWFILENAMEFIELDCODE

126

,

129 ,

209

KVCFG_WP_SHOWHIDDENTEXT

126

, 129 ,

209

KVExtractionFlag_CreateDir

164

,

166

KVExtractionFlag_ExcludeMailHeader

80 ,

164

KVExtractionFlag_GetFormattedBody

164

KVExtractionFlag_Overwrite

164

,

166

KVExtractionFlag_SaveAsMSG

81 ,

164

KVMainFileInfoFlag_HasContent

169

KVOpenFileFlag_CreateRootNode

174

KVSubFileExtractInfoFlag_CharsetCon verted

177

KVSubFileExtractInfoFlag_External

177

KVSubFileExtractInfoFlag_FileCreate d

177

KVSubFileExtractInfoFlag_FolderCrea ted

177

KVSubFileExtractInfoFlag_NeedsExtra ction

177

KVSubFileExtractInfoFlag_NonFormatt edBodyExtracted

177

KVSubFileInfoFlag_External

179

417

Index

418

KVSubFileInfoFlag_MailItem

179

KVSubFileInfoFlag_NeedsExtraction

179

KVSubFileInfoFlag_Secure

179

KVSubFileInfoFlag_SMIME

179

KVSubFileMetaInfoFlag_CharsetConver ted

181

main file properties

169

metadata

181

open file

174

sub file properties

179

Flash reader

326

Folio Flat File (FFF)

310

reader

323

foliosr

323

fontSizeMax

249

fontSizeMin

249

footer token

330

footnote token

330

format detection

41 ,

99

, 347 –

352

ADDOCINFO

234

coding practice

234

,

239

determining format support

348

extracting format information

348

file class

350

KVStreamInfo

239

major format

350 ,

352

major version

350

minor format

350

minor version

350

module

28 ,

294

,

313

,

347

translating format information

350

formats

293

312

binary

296

container

302

container (email)

302

graphic

299

multimedia

304

presentations

305

word processing

310

formats_e

34

formats_e.ini

33

,

34 ,

199 ,

320 ,

349

configuring file cache

42

converting hidden text in spreadsheets

117

converting MSG files directly using the MSG reader

52

determining document reader

352

enable logical order for PDF files

114

out-of-process configuration

44

formats.ini

350

formulas extracting from Excel files

118

supported Excel formula functions

119

Founder Chinese E-paper Basic (CEB)

310

Founder Chinese E-paper Basic reader

322

fpCloseFile()

62

,

147

,

157

fpContinue()

273

fpConvertStream()

61 ,

186

,

195

,

252

callbacks

226

fpExtractSubFile()

61 ,

70 ,

104 ,

148 ,

176

fpFileToInputStreamCreate()

48 ,

60

,

61

,

189 ,

190 ,

252

fpFileToInputStreamFree()

189

, 190 ,

252

fpFileToOutputStreamCreate()

60 ,

191 ,

192

,

252

fpFileToOutputStreamFree()

191

,

192

, 252

fpFreeStruct()

148 ,

150 ,

151

,

153 ,

155

fpGetAnchor()

193 ,

252

fpGetAuxOutput()

247

fpGetConvertFileList()

195

, 252

fpGetMainFileInfo()

61 ,

70 ,

151 ,

169

fpGetStreamInfo()

99 ,

196

,

234

,

239

, 252

fpGetSubFileInfo()

61 ,

70 ,

153 ,

178

fpGetSubFileMetadata()

155

, 167

fpGetSummaryInfo()

96 ,

197

,

244

,

252

fpInit()

60

,

199

, 252

fpOpenFile()

61

,

157

,

173

fpSetStyleMapping()

105 ,

201 ,

241 ,

252

fpShutDown()

195 ,

202

,

203 ,

227 ,

252

fpValidateTemplate()

204

fpXMLConfig()

111 ,

114

free File Extraction structures

150

FreeStruct()

148 ,

150

,

151 ,

153 ,

155

Fujitsu Oasys (OA2)

310

reader

326

function suites

59

XML Export SDK C Programming Guide

G

G

generating minimal attributes

54

generating output with minimal markup and without images

187

,

215

,

255

generating output with verbose markup and without images

57 ,

58 ,

187

,

205 –

207

,

255

GetAnchor()

193 ,

228

,

230 ,

252

GetAuxOutput()

230 ,

247

GetConvertFileList()

195 ,

252

GetMainFileInfo()

61 ,

70 ,

151 ,

169

GetStreamInfo()

99

,

196

,

234

, 239 ,

252

GetSubFileInfo()

61 ,

70 ,

153 ,

178

GetSubFileMetadata()

155 ,

167

GetSummaryInfo()

96 ,

197

,

244

, 252

glossary

36

Graphic Interchange Format (GIF)

300

,

335

reader

324

graphics displaying vector graphics on Windows

109

setting resolution

261

supported

299

suppressing

57 ,

58

,

205

, 213 ,

255

GroupWise FileSurf

302

GroupWise FileSurf reader

323

gwfssr

323

GZIP

295

reader

325

H

Hangul (HWP)

310

Hangul 2002, 2005, 2007 reader

323

header files

270

header token

330

heading generation

248 ,

267 ,

280

headingCreateType

268

Health level7

310

Health level7 reader

323

hidden data

126

,

129

Excel comments

126 ,

129 ,

209

formulas

126

,

129 ,

209

hidden information

126

,

129 ,

209

PowerPoint comments

127 ,

129 ,

209

comments slides

127 ,

129

,

210

hidden slides

126 ,

129

,

209

slide notes

127 ,

129 ,

210

toggle output

127 ,

130

Word comments

126 ,

129 ,

209

date field codes

126 ,

129 ,

209

file name field codes

126

,

129

, 209

hidden text

126

, 129 ,

209

hidden text converting in spreadsheets

117

hl7sr

323

HTML

309

reader

323

HTML (MIME)

309

htmlexport

320

htmsr

323

hwposr

323

hyphenation

115 ,

206

,

208

I

I/O model

60

IBM DCA/RFT (Revisable Form Text) (DC)

311

ichatsr

323

icssr

323

index mode

187 ,

208 ,

215

and hyphenation

115

index template

54

initialization function

60

,

199 ,

252

input streams

48 ,

61

creating

189

extracting metadata

197

freeing

49

,

62 ,

190

KVInputStream

235

installation directory structure

34

error messages

200

ISO

295

ISO-9660 CD Disc Image Format reader

323

XML Export SDK C Programming Guide

419

Index

420

• isosr

323

iwsssr

323

iwwpsr

323

K

kp3dwrld

321

kpagrdr

323

kpanirdr

323

kpbmprdr

323

kpbmpwrt

323

kpcdrrdr

323

kpcgmrdr

323

kpcgmwrt

323

kpchtrdr

321

kpdcxrdr

323

kpDWGrdr

323

kpDXFrdr

323

kpemfrdr

324

kpepsrdr

324

kpgifrdr

324

kpicordr

324

kpifcnvt

320

J

Java API extensible stylesheet language

108

using style sheets

109

Java archive

295

javadoc

34

JBIG2

300 ,

335

JBIG2 reader

324

jp2000sr

323

JPEG

300 ,

335

reader

324

writer

324

JPEG 2000

300

,

335

JPEG 2000 metadata reader

323

JPEG 2000 reader

324

jtdsr

323

JustSystems Ichitaro (JTD)

311

reader

323

kpifutil

320

kpIWPGrdr

324

kpJAVwrt

321

kpjbig2rdr

324

kpjp2000rdr

324

kpjpeg

321

kpjpgrdr

324

kpjpgwrt

324

kpmacrdr

324

kpmsordr

324

kpnbmprdr

324

kpODArdr

324

kpodfrdr

324

kpONErdr

324

kpp40rdr

324

kpp95rdr

324

kpp97rdr

324

kppctrdr

324

kppcxrdr

324

kppdf2rdr

324

kppdfrdr

324

kppicrdr

324

kppng

321

kppngrdr

324

kppngwrt

324

kpppxrdr

324

kpprerdr

324

kpprzrdr

324

kpsdwrdr

325

kpsgirdr

325

kpSHWrdr

325

kpsunrdr

325

kptgardr

325

kptifrdr

325

kpvsdrdr

325

kpwg2rdr

325

kpwmfrdr

325

kpwmfwrt

325

kpwpgrdr

325

kpxfdlrdr

325

KV_Bool

286

KV_ClipBoard

286

XML Export SDK C Programming Guide

K

KV_DateTime

243 ,

286

KV_IEEE8

243

,

286

KV_Int4

286

KV_Other

286

KV_String

243 ,

286

KV_Unicode

243 ,

286

kv.lic

32

, 34 ,

321

updating in existing installation

33

KVCFG_DELSOFTHYPHEN

116 ,

137 ,

208

KVCFG_DISABLEZONE

207

KVCFG_ENABLEPOSITIONINFO

137

, 138 ,

207

KVCFG_INCLREVISIONMARK flag

209 ,

211

KVCFG_INCLTRACKCHANGES flag

111

KVCFG_LOGICALPDF

208

KVCFG_LOGICALPDF flag

114

KVCFG_PG_HIDECOMMENT flag

127 ,

129 ,

209

KVCFG_PG_HIDEHIDDENSLIDE flag

126 ,

129 ,

209

KVCFG_PG_SHOWCOMMENTSSLIDE flag

127 ,

129 ,

210

KVCFG_PG_SHOWSLIDENOTES flag

127 ,

129 ,

210

KVCFG_SETPASSWORD flag

210 ,

211 ,

411

KVCFG_SETTEMPDIRECTORY

208

KVCFG_SETXMLCONFIGINFO

122

, 208

KVCFG_SS_SHOWCOMMENTS flag

126 ,

129 ,

209

KVCFG_SS_SHOWFORMULA flag

126 ,

129 ,

209

KVCFG_SS_SHOWHIDDENINFOR flag

126 ,

129 ,

209

KVCFG_SUPPRESSIMAGES

137 ,

138 ,

255

KVCFG_SUPPRESSTOCPRINTIMAGE

207

KVCFG_WP_NOCOMMENTS flag

126 ,

129 ,

209

KVCFG_WP_SHOWDATEFIELDCODE flag

126 ,

129 ,

209

KVCFG_WP_SHOWFILENAMEFIELDCODE flag

126 ,

129 ,

209

KVCFG_WP_SHOWHIDDENTEXT flag

126 ,

129 ,

209

KVCharSet

99

,

104

, 163 ,

167

KVCredential

160 ,

173

KVCredentialComponent

161

KVEPT_EMPTY

282

KVEPT_SUPPRESS

282

KVEPT_VERBOSE

282

KVERR_ADSNotFound

273

KVERR_ArchiveFatalError

274

KVERR_ArchiveFileNotFound

274

KVERR_AutoDetFail

273

KVERR_AutoDetNoFormat

273

KVERR_badInputStream

273

KVERR_badOutputType

273

KVERR_ChildTimeOut

274

KVERR_CreateOutputFileFailed

273

KVERR_CreateProcessFailed

273

KVERR_CreateTempFileFailed

273

KVERR_DLLNotFound

273

KVERR_ErrorWritingToOutputFile

273

KVERR_FormatNotSupported

273

KVERR_General

273

KVERR_NoReader

273

KVERR_OutOfCore

273

KVERR_PasswordProtected

273

, 410

KVERR_processCancelled

273

KVERR_ReaderInitError

273

KVERR_SUCCESS

273

KVERR_WaitForChildFailed

273

KVError_CompressionNotSupported

277

KVError_GPF

275

KVError_InputFileNotFound

275

KVError_InterfaceFunctionNotFound

275

KVError_InvalidArgs

276

KVError_InvalidOopDriverSignature

277

KVError_InvalidOopServiceSignature

277

KVError_IPCTimeOut

276

KVError_KVoopLogFailed

275

KVError_MemoryLeak

275

KVError_MemoryOverwrite

275

KVError_OopBadConfig

276

KVError_OopBrokenPipe

276

KVError_OopCore

275

KVError_OopPipeOEF

276

KVError_OpenOutputFileFailed

275

KVError_OpenStreamFailure

275

KVError_OutputFileExists

164

KVError_OverNestedFileLimit

275

KVError_PasswordRequired

276

KVError_PSTAccessFailed

276

KVError_ReaderUsageDenied

276

KVError_ZeroFile

277

XML Export SDK C Programming Guide

421

Index

422

KVErrorCode

272

KVErrorCodeEx

274

KVExtractInterface

146 ,

162

KVExtractionFlag_CreateDir

164

,

166

KVExtractionFlag_ExcludeMailHeader

80 ,

164

KVExtractionFlag_GetFormattedBody

164

KVExtractionFlag_Overwrite

164

,

166

KVExtractionFlag_SaveAsMSG

81 ,

164

KVExtractSubFileArg

148 ,

163

KVFileType_Main

163

KVGetExtractInterface()

60 ,

69

,

146

KVGetSubFileMetaArg

167

KVGFX_CGM

280

KVGFX_GIF

280

KVGFX_JAVA

280

KVGFX_JPEG

280

KVGFX_PNG

280

KVGFX_WMF

280

kvgraph

321

kvgzsr

325

KVHC_CreateHeadingsAlways

281

KVHC_DocHeadingsOnly

281

KVHeadingCreateOptions

280

KVHPBT_EMPTY

283

KVHPBT_EMPTYID

283

KVHPBT_ID

283

KVHPBT_SUPPRESS

282

kvhqxsr

325

KVInputStream

60

,

173

,

235

KVMainFileInfo

151 ,

169

KVMainFileInfoFlag_HasContent

151

,

169

KVMemoryStream

236

KVMetadata_Binary

171 ,

284

KVMetadata_Bool

171 ,

284

KVMetadata_DateTime

171 ,

284

KVMetadata_Double

284

KVMetadata_Float

284

KVMetadata_Int4

171 ,

284

KVMetadata_Int8

284

KVMetadata_String

171 ,

284

KVMetadata_UInt4

284

KVMetadata_UInt8

284

KVMetadata_Unicode

171 ,

284

KVMetadata_Unknown

284

KVMetadataElem

171

KVMetadataType

171 ,

283

KVMetaName

172

kvolefio

320

KVOpenFileArg

61

,

69 ,

157 ,

173

KVOpenFileFlag_CreateRootNode

170

,

174

KVOutputStream

165 ,

175 ,

237

kvpie

321

kvradar

321

kvraster.class

34

,

321

KVSTR

238

KVStreamInfo

239

KVStructHead

240

KVStructInit

240

KVStyle

238

,

241

KVSTYLE_DELETECONTENT

107

KVSTYLE_HEADING[1-6]

107

KVSTYLE_ONCONSECUTIVEPARAGRAPHS

107

KVSTYLE_ORDERLIST

107

KVSTYLE_PRE

107

KVSTYLE_REDACT

107

KVSTYLE_UNORDEREDLIST

107

KVSubFileExtractInfo

148 ,

176

KVSubFileExtractInfoFlag_CharsetConver ted

177

KVSubFileExtractInfoFlag_External

148

,

177

KVSubFileExtractInfoFlag_FileCreated

177

KVSubFileExtractInfoFlag_FolderCreated

177

KVSubFileExtractInfoFlag_NeedsExtracti on

177

KVSubFileExtractInfoFlag_NonFormattedB odyExtracted

177

KVSubFileInfo

153 ,

178

KVSubFileInfoFlag_External

153

,

179

embedded objects in PowerPoint

154

KVSubFileInfoFlag_MailItem

179

XML Export SDK C Programming Guide

L

KVSubFileInfoFlag_NeedsExtraction

179

,

180

KVSubFileInfoFlag_Secure

179

KVSubFileInfoFlag_SMIME

179

KVSubFileMetaData

181

KVSubFileMetaInfoFlag_CharsetConverted

181

KVSubFileType_Attachment

178

, 179

KVSubFileType_Folder

178

KVSubFileType_Main

178 ,

180

KVSubFileType_OLE2

178

KVSumInfoElemEx

243

KVSumInfoType

243

KVSummaryInfoEx

197 ,

198 ,

244

KVSumType

96

,

97 ,

331

KVT_ZONE token

207

kvtypes.h

34

,

233

, 270

kvutil

320

kvVector.class

321

kvvector.jar

34

,

321

kvxconfig.ini

122 ,

123 –

126 ,

208

,

276

,

321

and xmlini sample program

140

KVXConfigInfo

245

kvxml

320

KVXML library

48

, 60

kvxml.h

31

, 34 ,

233 ,

270

KVXMLAnchorType

278

KVXMLCallbacks

226 ,

247

KVXMLConfig

205

KVXMLConfig()

116 ,

122

export password-protected files

411

KVXMLConvertFile()

61 ,

214

callbacks

226

KVXMLEmptyParaType

281

KVXMLEndOOPSession()

217

KVXMLGetInterface

185

KVXMLGetInterface()

48 ,

60

KVXMLGraphicType

109 ,

279

KVXMLHardPageBreakType

282

KVXMLHeadingInfo

248

KVXMLInit()

198

KVXMLInterface

48

,

60 ,

185 ,

251

KVXMLOptions

109 ,

230

,

247 ,

253

KVXMLSetStyleSheet

109 ,

219

KVXMLStartOOPSession()

221

KVXMLStyleSheetType

277

KVXMLTemplate

60

,

97 ,

262 ,

329

KVXMLTOCOptions

267

kvxpgsa

320

kvxsssa

320

kvxtract

320

kvxtract.h

31

,

161

, 270

kvxwpsa

320

kvzeesr

325

kwad

294

,

313

,

320

,

347

L

l123sr

325

language detection license information

32

lasr

325

lcbBlockSize

265

lcbFilesize

235

lcbMaxMemUsage

259

Legato EMailXtender Archive

295

Legato EMailXtender archive (EMX) reader

323

Legato Extender

302

Libraries

319

license information enabling a full version

32

kv.lic

32

, 321

Link Library (DLL)

296

ListenerPortList

45

ListenerTimeout

45

logical reading order direction flags

206

PDF file

112

Lotus

1-2-3

(123)

307

(WK4)

307

Charts (123)

307

V2 to 5 reader

327

V96/97/98 reader

325

AMI Draw Graphics (SDW)

300 ,

335

XML Export SDK C Programming Guide

423

Index

424

AMI Pro (SAM)

311

reader

325

AMI Professional Write Plus

311

Domino XML (DXL) file extraction

85

Freelance Graphics (PRE)

96/97/98 reader

324

reader

324

Freelance Graphics (SDW)

305

Notes embedded image reader

324

Notes database license information

32

Notes database (NSF)

68

,

302

file extraction

86

installation and configuration

87

licensing

86

reader

326

system requirements

87

Pic (PIC)

300 ,

335

SmartMaster (MWP)

311

Word Pro

311

Word Pro (LWP)

311

reader

325

LPDF_AUTO

114 ,

115

,

290

LPDF_DIRECTION

290

LPDF_LTR

114

,

115 ,

290

LPDF_RAW

114

,

115 ,

290

LPDF_RTL

114

,

115 ,

290

lVersion

234

,

239

lwpsr

325

M

Mac Disk Copy Disk Image

295

Mac Disk Copy Disk Image File reader

322

MacBinary

295

MacBinary reader

325

macbinsr

325

Macintosh Picture (PICT) reader

324

Macintosh Raster (PICT/PCT)

300 ,

335

MacPaint (PNTG)

300

, 335

reader

324

Macromedia Flash (SWF)

305

reader

326

mail default list of metadata

73

extracting metadata

72 –

80 ,

96

, 155

metadata

168

Mailbox license information

32

Mailbox (MBX)

68

,

302

file extraction

81

licensing

81

reader

325

main file get information

61

, 70 ,

151

main URL token

330

MAPI

78

,

82 ,

83

ATTACH_BY_REF_ONLY

84

ATTACH_BY_REF_RESOLVE

84

ATTACH_BY_REFERENCE

84

attachment methods

84

mapidefs.h

80

mapitags.h

80

PR_ATTACH_LONG_PATHNAME

84

PR_ATTACH_METHOD

84

PR_ATTACH_PATHNAME

84

property tag

78

supported property types

78

MAPI-based PST reader

82

mapping styles

104 ,

241

MarkUpEnd

242

MarkUpStart

242

maximum memory

259

maxParaLen

249

mbsr

325

mbxsr

325

mdbsr

325

memory allocation

42 ,

236

memory management

236

metadata

41 ,

55

, 197 ,

243 ,

244 ,

285

custom metadata in PDF

116

data types

243 ,

283 ,

284

extracting

96 –

98

XML Export SDK C Programming Guide

M extracting default mail metadata

73 ,

155

,

168

extracting default mail metadata set

72

extracting from mail formats

72 –

80 ,

96 ,

155

,

167

extracting from PST files

78

extracting mail metadata as text

80

field names

286

non-standard

96

sample program

35

, 138

standard

96

token

97

,

330

, 331 ,

332

metaNameArray

155 ,

167

metaNameCount

155 ,

167

Microsoft

Access

298

Access (MDB) reader

325

Drawing Objects reader

324

Excel

2007 XML reader

327

Binary Format

308

Charts (XLS)

307

Macintosh (XLS)

307

Windows (XLS)

308

Windows (XLSX)

308

Windows XML format (XLS)

309

Excel (XLS) converting formulas

118

reader

327

supported formula functions

119

OneNote

305

OneNote reader

324

Outlook

68

,

302

file extraction

80

metadata fields

74

Outlook (MSG) convert directly using the MSG reader

52

reader

326

Outlook Express

68 ,

302

file extraction

80

Outlook Express (EML)

51

reader

323

Outlook Personal Folders

68 ,

303

attachment methods

84

detect by extension

85

error codes

276

extracting metadata

78

file extraction

81

KVErrorPasswordRequired

276

license information

32

licensing

82

MAPI-based reader

83

native and MAPI-based reader

82

native reader

83

system requirements

83

Outlook Personal Folders (PST)

51

MAPI-based reader

326

native reader

326

pstnsr

326

pstsr.dll

326

PowerPoint

2007 XML reader

324

embedded objects

154

Macintosh (PPT)

306

PC (PPT)

306

Windows (PPT)

306

Windows (PPTX)

306

Project

298

Project (MPP) reader

326

Rich Text Format (RTF) reader

326

Visio

297

XML format (VDX)

309

Visio (VSD) reader

327

Wave Sound (WAV)

304

Windows Bitmap (BMP)

336

Windows Write (WRI)

312

Word

2007 XML reader.

326

6/95 reader

326

97, 2000, XP reader

326

DOS reader

326

XML Export SDK C Programming Guide

425

Index

426

Mac reader

325

Macintosh (DOC)

311

PC (DOC)

311

V2 reader

325

Windows (DOC)

311

Windows (DOCX)

311

Windows XML format (DOC)

309

Works

(WPS)

311

6, 2000 reader

326

Spreadsheet (S30,S40)

308

Spreadsheet reader

326

V1 and 2 reader

326

Write reader

326

Microsoft Backup File

295

Microsoft Backup File reader

322

Microsoft Cabinet format

295

Microsoft Cabinet format reader

322

Microsoft Compiled HTML Help

295

Microsoft Compiled HTML Help reader

322

Microsoft Compressed Folder

296

Microsoft Entourage Database

302

Microsoft Entourage Database Format reader

323

Microsoft Office 2007 Excel Binary Format reader

327

Microsoft Office Drawing

300

Microsoft OneNote

305

reader

324

Microsoft Outlook DBX

302

Microsoft Outlook Express DBX reader

322

Microsoft Outlook for Macintosh

302

Microsoft Outlook for Macintosh reader

326

Microsoft Outlook iCalendar

302

Microsoft Outlook iCalendar reader

323

Microsoft Outlook Offline Storage File

303

Microsoft Outlook Offline Storage File reader

326

Microsoft Outlook vCard Contact

303

Microsoft Outlook vCard Contact reader

327

Microsoft Publisher

298

,

335

Microsoft Publisher reader

326

Microsoft Visio reader

325

MIDI (MID)

304

mifsr

325

MIME HTML

309

minParaLen

249

misr

325

MP3 files

96

reader

326

mp3sr

326

MPEG-1

Audio layer 3 (MP3)

304

Video (MPG)

304

MPEG-2 Audio (MPEGA)

304

MPEG-4 Audio

304

mppsr

326

MSBLSB byte order

163 ,

167

,

169 ,

179

mscomctl.ocx

321

msgsr

326

mspubsr

326

msvbvm60

321

MSVCP60.dll

321

msvcrt

321

msw6sr

326

mswsr

326

multi-byte support

333

multimedia files supported

304

mw6sr

326

mw8sr

326

mwsr

326

mwssr.dll

326

mwxsr

326

N

namespace

125

native PST reader

82

nCompressionQuality

258

nElem

244

NeXT/Sun Audio (AU)

304

non-standard metadata

96

nRowsBeforeSplit

260

nsfsr

326

nSpaceAfter

250

nSpaceBefore

250

XML Export SDK C Programming Guide

O nTableBorderWidth

257

numSubFiles

151 ,

170

O

oa2sr

326

OASIS

Open Document Format (ODP)

306

Open Document Format (ODS)

308

Open Document Format (ODT)

312

ODF presentation reader

324

ODF spreadsheets reader

326

ODF word processing reader

326

odfsssr

326

odfwpsr

326

oleaut32

321

olepro32

322

olesr

326

olmsr

326

Omni Graffle

300

Omni Outliner

312

Omni Outliner reader

326

oo3sr

326

Open Publication Structure eBook

312

Open Publication Structure eBook reader

323

OpenFile()

61

,

157

, 173

opening a file

61

,

157

OpenOffice

312

Calc

308

Impress

306

out of process configuration

44

conversions

43 ,

51

"keep servant active" option

217

sample program

137

temporary files

45

output stream

KVOutputStream

175

output streams

48

, 61 ,

330

auxiliary

230

XML Export SDK C Programming Guide creating

191

freeing

49

,

62 ,

192

KVOutputStream

237

P

page number token

331

paragraph styles

104

parentIndex

153 ,

180

password-protected files

409

export

411

extract

410

supported file types

409

PC Paintbrush (PCX)

300 ,

336

reader

324

pCallbacks

226

pCallingContext

226

pcHTML

229

pcString

238

PDF file absolute positioning of text

58

,

205

212

configuration options

112

– 117

converting bi-directional text

112

converting PDFs with images

116

direction flags

206

enable logical order for PDF files in C API

114

enable logical order for PDF files in formats_e.ini

114

enabling logical order in C API

208

extracting custom metadata

116

file extraction

90

generating XLinks

205 ,

207

graphic-based reader

324

high-fidelity graphic-based reader

324

logical reading order

112 ,

113 ,

114

pdfsr.ini

116

reader

326

specifying paragraph direction

112

specifying text flow in cnv2xml sample program

137

structured text stream

112

unstructured text stream

112 ,

114 ,

290

427

Index

428

• pdfsr

326

pdfsr.ini

116

pElem

244

pffsr

326

Pictor PC Paint format (PIC) reader

324

PKZIP (ZIP)

296

Portable Network Graphics (PNG)

300 ,

336

reader

324

writer

324

PowerPoint

95 reader

324

97 reader

324

reader

324

presentations setting resolution

261

supported

305

process_images_with_min_height

116

process_images_with_min_width

116

pstnsr

82

, 83 ,

326

pstsr.dll

83

,

326

pszBaseURL

257

pszChunkTemplate

265

pszDefaultOutputDirectory

230

, 247 ,

257

pszEndBlock

265

pszExContent

246

pszExMeta

246

pszFirstH1End

263

pszFirstH1Start

263

pszH[2..6]XML

264

pszInAttribute

246

pszInContent

246

pszInMeta

245

pszJavaURL

257

pszLastH1End

263

pszLastH1Start

263

pszMainBottom

263

pszMainTop

263

pszMainURL

257

pszMiddleH1End

263

pszMiddleH1Start

263

pszPicPath

257

pszPicURL

257

pszRoot

245

pszStartBlock

265

pszStyleSheet

255

pszTOC_H[1..6]

264

pszTOCH[1..6]End

264

pszTOCH[1..6]LeafNode

265

pszTOCH[1..6]Start

264

pszUserSummary

265

pszXEndBlock

264

pszXFile

264

pszXStartBlock

264

Q

qpssr

326

Quattro Pro Spreadsheet reader

326

QuickTime Movie (QT/MOV)

304

R

RAR Archive (RAR)

296

reader

326

rarsr

326

RasterPictureAnchor

229

RasterPictureAnchorEx

229

reader initialization error

273

redacted (hidden) text

107

redistributable files

319

regsvr32.exe

320

resolution presentations

258

revision marks

206

revision tracking information

110

Rich Text Format (RTF)

309

root element

63

root node

153

,

170 ,

174

creating

70

rtfsr

326

S

SA_BaseOnDocument

271

SA_Border

271

SA_NoBorder

271

XML Export SDK C Programming Guide

T sample program cnv2xml

35

, 136

cnv2xmloop

35

,

137

Export Demo

35

,

56 –

59 ,

142

metadata

35

,

138

tstxtract

35

,

135

xmlcallback

35

,

141

xmlindex

35

,

138

xmlini

35

, 139

xmlmulti

36

xmlonefile

36

,

141

sample template for C API

53

secured NSF Files

89

secured PST Files

85

servant.exe

322

ServantName

46

SetStyleMapping

241 ,

252

SetStyleMapping()

105 ,

201

SGI RGB

Image

300 ,

336

reader

325

ShutDown()

195 ,

202

,

203 ,

227 ,

252

single file for presentation template

55

single file template

55

single file with TOC template

55

Skype Log

312

Skype log file reader

326

skypesr

326

sosr

326

spreadsheets converting

117 –

119

converting headers and footers

117

converting hidden rows and columns

117

standard metadata

96

StarOffice

312

Calc

308

Impress

306

stderr

200

streams

36

auxiliary output

230

input

189 ,

190

XML Export SDK C Programming Guide

KVInputStream

235

KVOutputStream

175 ,

237

output

191 ,

192

structured access layer

41

style sheets

108

token

330

StyleName

242

styles mapping

104

STYLESHEET_DISABLED

278

sub file external path to

148

, 153 ,

177

extract

61 ,

70

,

148

extract metadata

155 ,

167

get information

61

, 70 ,

153

summary information

41 ,

55 ,

197 ,

243

,

244 ,

285

extracting

96 –

98

token

97

,

330

, 331 ,

332

Sun Raster Image (RS)

300 ,

336

reader

325

supported formats

293 –

312

suppressing graphics

57

,

58 ,

205 ,

213 ,

255

swfsr

326

szExContentElement

124

szExMetaElement

124

szInAttribute

124

szInContentElement

124

szInMetaElement

123

szRoot

123

T

table border

271

table of contents generating

267

token

331

Tagged Image File Format (TIFF)

301 ,

336

reader

325

Tape Archive (TAR)

296

reader

327

tarsr

327

TempFilePath

45

TempFileSizeMark

45

429

Index

430

• template

53

C sample

53

css

54

index

54

single file

55

single file for presentations

55

single file with TOC

55

template file

55

map styles

105

setting conversion options

53

temporary files out of process

45

terms

36

defined

36

Text Mail (MIME)

303

threads

47 ,

62 ,

200

tnefsr

327

token

36

,

329

332

anchor

329 ,

330 ,

331

base URL

329

character set

329

endnote

330

footer

330

footnote

330

header

330

main URL

330

metadata

330

page number

331

style sheet

330

table of contents

331

user callback

332

zone

207

token buffer

259

Track Changes

110 ,

206

Transfer Neutral Encapsulation Format (TNEF)

303

Transfer Neutral Encapsulation Format reader

327

Truevision Targa (TGA)

301

,

336

reader

325

tstxtract sample program

35 ,

135

txtcnv

320

U

ulAttributes

234

Unicode reader

327

text

309

Unicode HTML

309

Unicode HTML reader

327

unihtmsr

327

unisr

327

UNIX converting graphics on

109

UNIX Compress

296

reader

325

unzip

327

URL base

329

main

330

user callback function

232

token

332

UserCB()

232

,

332

uudsr

327

UUEncoding (UUE)

296

reader

327

V

ValidateTemplate()

204

vcfsr

327

vector graphics converting

109

VectorPictureAnchor

229

verbose markup

57 ,

58

, 205 ,

255

Verity Document Type Definition

63 ,

64

Visio reader

327

vsdsr

327

W

W3C

63

WaitForConnectionTime

45

WaitForConvert

45

Windows

XML Export SDK C Programming Guide

Animated Cursor (ANI)

301

,

336

Bitmap (BMP) reader

323

writer

323

bitmap (BMP)

301

Icon Cursor

301

icon reader

324

Metafile (WMF) reader

325

writer

325

metafile (WMF)

301

,

336

Video (AVI)

304

Windows Scrap File

296

WinZIP (ZIP)

296

Wireless Markup Language

63

wkssr

327

WML

63

word processing files supported

310

WordPad

312

WordPerfect

6.x to 10.x reader

327

Graphics 1 (WPG)

301 ,

336

Graphics 2 (WPG)

301 ,

336

Graphics reader

325

Linux

310

Macintosh

310

MacIntosh reader

327

reader

327

Windows (WO)

310

wosr

327

wp6sr

327

wpmap

322

wpmsr

327

X

XHTML

63 ,

309

detection

351

reader

323

xlsbsr

327

xlssr

327

xlsxsr

327

XML Export SDK C Programming Guide

X

XML and format ID

123 ,

245

configuration flag

208

configuring custom document type

125

converting

120 –

126

converting using xmlini sample program

140

Expat XML parser

322

extracting elements

125

generic

309

kvxconfig.ini

123 –

126

modifying element extraction settings

121 ,

208

namespace

125

Paper Specification

312

reader

327

root element

121 ,

123

writers

41

XML Export API functions

183 –

224

XML Paper Specification reader

327

XML Style Language Transformation

63

xml_css.ini

54

xml_index.ini

54

,

255

xml1file_pg.ini

55

xml1file.ini

55

xml1filetoc.ini

55

xmlcallback sample program

35

,

141

xmlcnv

320

XMLConfig()

111 ,

114

xmlexport

320

xmlindex sample program

35 ,

138

xmlini sample program

35 ,

53 ,

139

xmlmulti sample program

36

xmlonefile sample program

36 ,

141

xmlsh

322

xmlsr

327

xpssr

327

XSLT

63

XyWrite

312

reader

327

xywsr

327

Y

Yahoo! Instant Messenger

312

431

Index

Yahoo! Instant Messenger reader

327

yimsr

327

Z

z7zsr

327

Zip archive

296

reader

327

ZIP file extraction

91

zone disable creation of

207

elements

207

432

• XML Export SDK C Programming Guide

advertisement

Was this manual useful for you? Yes No
Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement

Table of contents