advertisement
KeyView
XML Export SDK
™
C
Programming Guide
Version 10.23
Document Revision 0
20 January 2015
Copyright Notice
Notice
This documentation is a proprietary product of Autonomy and is protected by copyright laws and international treaty. Information in this documentation is subject to change without notice and does not represent a commitment on the part of Autonomy. While reasonable efforts have been made to ensure the accuracy of the information contained herein, Autonomy assumes no liability for errors or omissions. No liability is assumed for direct, incidental, or consequential damages resulting from the use of the information contained in this documentation.
The copyrighted software that accompanies this documentation is licensed to the End User for use only in strict accordance with the End User
License Agreement, which the Licensee should read carefully before commencing use of the software. No part of this publication may be reproduced, transmitted, stored in a retrieval system, nor translated into any human or computer language, in any form or by any means, electronic, mechanical, magnetic, optical, chemical, manual or otherwise, without the prior written permission of the copyright owner.
This documentation may use fictitious names for purposes of demonstration; references to actual persons, companies, or organizations are strictly coincidental.
Trademarks and Copyrights
Copyright © 2015 Hewlett-Packard Development Company, L.P. ACI API, Alfresco Connector, Arcpliance, Autonomy Process Automation,
Autonomy Fetch for Siebel eBusiness Applications, Autonomy, Business Objects Connector, Cognos Connector, Confluence Connector,
ControlPoint, DAH, Digital Safe Connector, DIH, DiSH, DLH, Documentum Connector, DOH, EAS Connector, Ektron Connector, Enterprise
AWE, eRoom Connector, Exchange Connector, FatWire Connector, File System Connector for Netware, File System Connector, FileNet
Connector, FileNet P8 Connector, FTP Fetch, HTTP Connector, Hummingbird DM Connector, IAS, IBM Content Manager Connector, IBM
Seedlist Connector, IBM Workplace Fetch, IDOL Server, IDOL, IDOLme, iManage Fetch, IMAP Connector, Import Module, iPlanet Connector,
KeyView, KVS Connector, Legato Connector, LiquidOffice, LiquidPDF, LiveLink Web Content Management Connector, MCMS Connector,
MediClaim, Meridio Connector, Meridio, Moreover Fetch, NNTP Connector, Notes Connector, Objective Connector, OCS Connector, ODBC
Connector, Omni Fetch SDK, Open Text Connector, Oracle Connector, PCDocs Fetch, PLC Connector, POP3 Fetch, Portal-in-a-Box, RecoFlex,
Retina, SAP Fetch, Schlumberger Fetch, SharePoint 2003 Connector, SharePoint 2007 Connector, SharePoint 2010 Connector, SharePoint
Fetch, SpeechPlugin, Stellent Fetch, TeleForm, Tri-CR, Ultraseek, Verity Profiler, Verity, VersiForm, WebDAV Connector, WorkSite Connector, and all related titles and logos are trademarks of Hewlett-Packard Development Company, L.P. and its affiliates, which may be registered in certain jurisdictions.
Microsoft is a registered trademark, and MS-DOS, Windows, Windows 95, Windows NT, SharePoint, and other Microsoft products referenced herein are trademarks of Microsoft Corporation.
UNIX is a registered trademark of The Open Group.
AvantGo is a trademark of AvantGo, Inc.
Epicentric Foundation Server is a trademark of Epicentric, Inc.
Documentum and eRoom are trademarks of Documentum, a division of EMC Corp.
FileNet is a trademark of FileNet Corporation.
Lotus Notes is a trademark of Lotus Development Corporation.
mySAP Enterprise Portal is a trademark of SAP AG.
Oracle is a trademark of Oracle Corporation.
Adobe is a trademark of Adobe Systems Incorporated.
Novell is a trademark of Novell, Inc.
Stellent is a trademark of Stellent, Inc.
All other trademarks are the property of their respective owners.
Notice to Government End Users
If this product is acquired under the terms of a DoD contract: Use, duplication, or disclosure by the Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of 252.227-7013. Civilian agency contract: Use, reproduction or disclosure is subject to 52.227-19 (a) through
(d) and restrictions set forth in the accompanying end user agreement. Unpublished-rights reserved under the copyright laws of the United States.
Autonomy, Inc., One Market Plaza, Spear Tower, Suite 1900, San Francisco, CA. 94105, US.
20 January 2015
Contents
Tables
............................................................................................................................................. 13
Figures
........................................................................................................................................... 15
About This Document
.............................................................................................................. 17
Part 1 Overview of XML Export
Chapter 1
Introducing XML Export
......................................................................................................... 27
Chapter 2
Getting Started
.......................................................................................................................... 39
XML Export SDK C Programming Guide
•
•
•
•
•
•
3
Contents
4
•
•
•
•
•
•
Part 2 Use the Export API
Chapter 3
Use the File Extraction API
................................................................................................... 67
XML Export SDK C Programming Guide
Contents
Chapter 4
Use the XML Export API
......................................................................................................... 95
XML Export SDK C Programming Guide
•
•
•
•
•
•
5
Contents
6
•
•
•
•
•
•
Add Configuration Settings for Custom XML Document Types .................................... 125
Chapter 5
Sample Programs
................................................................................................................... 133
XML Export SDK C Programming Guide
Contents
Part 3 C API Reference
Chapter 6
File Extraction API Functions
............................................................................................ 145
Chapter 7
File Extraction API Structures
........................................................................................... 159
XML Export SDK C Programming Guide
•
•
•
•
•
•
7
Contents
8
•
•
•
•
•
•
Chapter 8
XML Export API Functions
.................................................................................................. 183
Chapter 9
XML Export API Callback Functions
............................................................................... 225
XML Export API Structures
................................................................................................. 233
XML Export SDK C Programming Guide
Contents
Enumerated Types
................................................................................................................. 269
Appendixes
Appendix A
Supported Formats
................................................................................................................. 293
XML Export SDK C Programming Guide
•
•
•
•
•
•
9
Contents
10
•
•
•
•
•
•
Appendix B
Files Required for Redistribution
...................................................................................... 319
Appendix C
Export Tokens
........................................................................................................................... 329
Appendix D
Character Sets
Appendix E
File Format Detection
............................................................................................................. 347
Change the Percentage of Allowed Non-ASCII Characters ......................................... 349
XML Export SDK C Programming Guide
Contents
Appendix F
File Formats and Extensions
.............................................................................................. 371
Appendix G
Extract and Format Lotus Notes Sub Files
.................................................................... 393
Appendix H
Password Protected Files
.................................................................................................... 409
Index
............................................................................................................................................. 413
XML Export SDK C Programming Guide
•
•
•
•
•
•
11
Contents
12
•
•
•
•
•
• XML Export SDK C Programming Guide
Tables
Supported Compilers .................................................................................................. 30
Supported Compilers for Java and .NET Components ............................................... 31
XML Export Installed Directory Structure .................................................................... 34
Architectural Components........................................................................................... 41
Parameters for Out-of-Process Conversion ................................................................ 45
Default Mail Metadata List .......................................................................................... 73
MSG-specific Metadata List ........................................................................................ 74
Document Character Set Can be Determined .......................................................... 102
Document Character Set Cannot be Determined ..................................................... 103
Flags for Defining Styles ........................................................................................... 107
Supported Microsoft Excel Functions ....................................................................... 119
Hidden data settings ................................................................................................. 126
Hidden data settings ................................................................................................. 129
Options for the cnv2xml Sample Program .............................................................. 137
Options for the cnv2xmloop Sample Program........................................................ 138
Options for the xmlini Sample Program ................................................................ 140
Key to Support Tables .............................................................................................. 294
Supported Archive Formats ...................................................................................... 295
Supported Binary Formats ........................................................................................ 296
Supported CAD Formats........................................................................................... 297
Supported Database Formats ................................................................................... 298
Supported Desktop Publishing Formats ................................................................... 298
Supported Display Formats ...................................................................................... 299
Supported Graphic Formats...................................................................................... 299
Supported Mail Formats............................................................................................ 302
Supported Multimedia Formats ................................................................................. 304
Supported Presentation Formats .............................................................................. 305
Supported Spreadsheet Formats .............................................................................. 307
Supported Text and Markup Formats ....................................................................... 309
Supported Word Processing Formats ....................................................................... 310
XML Export SDK C Programming Guide
•
•
•
•
•
•
13
Tables
Export Tokens ...........................................................................................................329
Multi-byte and bi-directional support..........................................................................333
Code Character Sets .................................................................................................341
Major Formats ...........................................................................................................352
File Classes ...............................................................................................................368
Minor Formats ...........................................................................................................369
KeyView file formats and extensions.........................................................................372
Conditional elements .................................................................................................396
Control Elements .......................................................................................................398
Data elements ...........................................................................................................399
Lotus Notes date and time formats............................................................................402
KeyView date and time formats.................................................................................403
Key to support table...................................................................................................409
Supported password-protected file types ..................................................................410
14
•
•
•
•
•
• XML Export SDK C Programming Guide
Figures
XML Export Architecture ............................................................................................. 40
Export Demo: Launching ............................................................................................ 56
Export Demo: Setting Directories................................................................................ 57
Export Demo: Converting Files ................................................................................... 59
Example Container File Tree Structure....................................................................... 69
Extracted PST File ...................................................................................................... 71
Recreated File Hierarchy ............................................................................................ 72
Document Character Set Can Be Determined .......................................................... 100
Document Character Set Cannot Be Determined ..................................................... 101
XML Export SDK C Programming Guide
•
•
•
•
•
•
15
Figures
16
•
•
•
•
•
• XML Export SDK C Programming Guide
About This Document
This guide is for developers who incorporate KeyView XML conversion technology into their custom Web applications using a C development environment. It is intended for readers who are familiar with XML and C.
Documentation Updates
The information in this document is current as of XML Export SDK version 10.23.
The content was last modified 20 January 2015.
You can retrieve the most current product documentation from the HP Autonomy
Knowledge Base on the Customer Support Site.
A document in the Knowledge Base displays a version number in its name, such as IDOL Server 7.5 Administration Guide. The version number applies to the product that the document describes. The document may also have a revision
number in its name, such as IDOL Server 7.5 Administration Guide Revision 6.
The revision number applies to the document and indicates that there were revisions to the document since its original release.
Autonomy recommends that you periodically check the Knowledge Base for revisions to documents for the products your enterprise is using.
To access Autonomy documentation
1. Go to the Autonomy Customer Support site: https://customers.autonomy.com/
XML Export SDK C Programming Guide
•
•
•
•
•
•
17
18
•
•
•
•
•
•
About This Document
2. Click Login.
3. Type the login credentials that you were given, and then click Login.
The Customer Support Site opens.
4. Click Knowledge Base.
The Knowledge Base Search page opens.
5. Search or browse the Knowledge Base.
To search the knowledge base: a. In the Search box, type a search term or phrase and click Search.
Documents that match the query display in a results list.
To browse the knowledge base: a. Select one or more of the categories in the Browse list. You can browse by:
Repository. Filters the list by Documentation produced by technical publications, or Solutions to Technical Support cases.
Product Family. Filters the list by product suite or division. For example, you could retrieve documents related to the iManage, IDOL,
Virage or KeyView product suites.
Product. Filters the list by product. For example, you could retrieve documents related to IDOL Server, Virage Videologger, or KeyView
Filter.
Version. Filters the list by product or component version number.
Type. Filters the list by document type. For example, you could retrieve Guides, Help, Packages (ZIP files), or Release Notes.
Format. Filters the list by document format. For example, you could retrieve documents in PDF or HTML format. Guides are typically provided in both PDF and HTML format.
6. To open a document, click its title in the results list.
To download a PDF version of a guide, open the PDF version, click the
Download icon in the PDF reader, and save the PDF to another location.
To download a documentation ZIP package, click Get Documentation
Package under the document title in the results list. Alternatively, browse to the desired ZIP package by selecting either the Packages document Type or the ZIP document Format from the Browse list.
Autonomy welcomes your comments.
XML Export SDK C Programming Guide
Related Documentation
To send feedback on Autonomy documentation
send e-mail to [email protected]
provide:
full document title with version and revision number location: heading, a snippet of text or screen capture your comments your contact information in the event we need clarification
Related Documentation
The following documents provide more details on XML Export.
XML Export Release Notes
XML Export SDK Java Programming Guide
Conventions
The following conventions are used in this document.
Notational Conventions
This document uses the following conventions.
Convention
Bold
Italics
Usage
User-interface elements such as a menu item or button.
For example:
Click Cancel to halt the operation.
Document titles and new terms. For example:
For more information, see the IDOL Server
Administration Guide.
An action command is a request, such as a query or indexing instruction, sent to IDOL Server.
XML Export SDK C Programming Guide
•
•
•
•
•
•
19
20
•
•
•
•
•
•
About This Document
Convention monospace font monospace bold monospace italics
Usage
File names, paths, and code. For example:
The FileSystemConnector.cfg file is installed in
C:\Program Files\FileSystemConnector\ .
Data typed by the user. For example:
Type run at the command prompt.
In the User Name field, type Admin.
Replaceable strings in file paths and code. For example: user UserName
Command-Line Syntax Conventions
This document uses the following command-line syntax conventions.
Convention
[ optional ]
|
{ required }
Usage
Brackets describe optional syntax. For example:
[ -create ]
Bars indicate “either | or” choices. For example:
[ option1 ] | [ option2 ]
In this example, you must choose between option1 and option2.
Braces describe required syntax in which you have a choice and that at least one choice is required. For example:
{ [ option1 ] [ option2 ] }
In this example, you must choose option1, option2, or both options.
XML Export SDK C Programming Guide
Conventions
Notices
Convention required variable
<variable>
...
Usage
Absence of braces or brackets indicates required syntax in which there is no choice; you must type the required syntax element.
Italics specify items to be replaced by actual values. For example:
-merge filename1
(In some documents, angle brackets are used to denote these items.)
Ellipses indicate repetition of the same pattern. For example:
-merge filename1, filename2 [, filename3
... ] where the ellipses specify, filename4, and so on.
The use of punctuation—such as single and double quotes, commas, periods— indicates actual syntax; it is not part of the syntax definition.
This document uses the following notices:
CAUTION A caution indicates an action can result in the loss of data.
IMPORTANT An important note provides information that is essential to completing a task.
NOTE A note provides information that emphasizes or supplements important points of the main text. A note supplies information that may apply only in special cases—for example, memory limitations, equipment configurations, or details that apply to specific versions of the software.
XML Export SDK C Programming Guide
•
•
•
•
•
•
21
22
•
•
•
•
•
•
About This Document
TIP A tip provides additional information that makes a task easier or more productive.
Autonomy Customer Support
Autonomy Customer Support provides prompt and accurate support to help you quickly and effectively resolve any issue you may encounter while using
Autonomy products. Support services include access to the Customer Support
Site (CSS) for online answers, expertise-based service by Autonomy support engineers, and software maintenance to ensure you have the most up-to-date technology.
To access the Customer Support Site
go to https://customers.autonomy.com
The Customer Support Site includes:
Knowledge Base documentation, FAQs, and technical articles that is easy to navigate and search.
: The CSS contains an extensive library of end user
Case Center : The Case Center is a central location to create, monitor, and manage all your cases that are open with technical support.
Download Center : Products and product updates can be downloaded and requested from the Download Center.
Resource Center : Other helpful resources appropriate for your product.
To contact Autonomy Customer Support by e-mail or phone
go to http://www.autonomy.com/work/services/customer-support
XML Export SDK C Programming Guide
Contact Autonomy
Contact Autonomy
For general information about Autonomy, contact one of the following locations:
Europe and Worldwide
E-mail: [email protected]
Telephone: +44 (0) 1223 448 000
Fax: +44 (0) 1223 448 001
Autonomy Corporation plc
Cambridge Business Park
Cowley Rd.
Cambridge CB4 0WZ
United Kingdom
North and South America
E-mail: [email protected]
Telephone: +1.415.243.9955
Fax: +1.415.243.9984
Autonomy, Inc.
One Market Plaza
Spear Tower, Suite 1900
San Francisco CA 94105
USA
XML Export SDK C Programming Guide
•
•
•
•
•
•
23
About This Document
24
•
•
•
•
•
• XML Export SDK C Programming Guide
P ART 1
Overview of XML
Export
This section provides an overview of the Export SDK and describes how to use the C implementation of the API. It contains the following chapters:
Part 1 Overview of XML Export
26
•
•
•
•
•
• XML Export SDK C Programming Guide
C HAPTER 1
Introducing XML Export
This section describes the KeyView Export SDK package. It contains the following topics:
Platforms, Compilers and Dependencies
Overview
XML Export is part of the KeyView Export SDK. It enables you to convert virtually any document, spreadsheet, presentation, or graphic into well-formed, valid XML which is validated against a predefined Document Type Definition (DTD). With
XML Export, you control the content, structure, and format of the XML output using either easily customized templates, or the flexible and robust APIs.
The main purpose of XML Export is to apply an XML vocabulary to the data structures in a document so that content and metadata can be indexed and subsequently searched in context.
XML Export SDK C Programming Guide
•
•
•
•
•
•
27
28
•
•
•
•
•
•
Chapter 1 Introducing XML Export
Data structures in a source document can be:
metadata (title, author, subject, and so on)
document components (headers, footers, footnotes, endnotes, captions, bookmarks, and so on) tagged text (chapters, sections, bulleted lists, and so on) table components (sheet names, rows, columns, cell ranges, and so on) presentation components (notes, slide titles, slide descriptions, and so on)
Although viewing is not the main purpose of XML Export, Extensible Stylesheet
Language (XSL) style sheets or Cascading Style Sheets (CSS) can be used to display the XML data.
Export SDK supports a number of programming environments, such as Visual
Basic, Java, and Delphi and runs on all popular operating system platforms including Windows, Solaris, HP-UX, IBM AIX, and Linux.
Export SDK is part of the KeyView suite of products. KeyView provides high-speed text extraction, conversion to Web-ready HTML and well-formed XML, and high-fidelity document viewing.
Features
Export supports over 300 formats in 70 languages.
Convert files either in-process or out of process. Out-of-process conversion ensures the stability and robustness of the calling application if a corrupt document causes an exception or the conversion process to fail.
Files embedded within files can be extracted, using the File Extraction API, and then converted, using the Export API.
Use redirected input/output. You can provide an input stream that is not restricted to file system access.
Dynamically convert word processing, spreadsheet, presentation, and graphics files into well-formed, valid, and 1.0-compliant XML. The XML output is validated against a predefined DTD named the “Verity.dtd.”
Export automatically recognizes the file format being converted and uses the appropriate reader. Your application does not need to rely on filename extensions to determine the file format.
XML Export SDK C Programming Guide
Platforms, Compilers and Dependencies
Use callbacks to control such aspects of the conversion process as file naming and the insertion of scripts.
Create heading levels in the output file by either using the structure in the source document or by allowing Export to automatically generate a structure based on document properties, such as font or font attributes.
Manage memory allocation to optimize speed and performance of application.
Insert predefined XML markup at specific points in the output stream.
Apply XSL or Cascading Style Sheets (CSS) to improve the fidelity of the output.
Map paragraph and character styles in word processing documents to any markup you specify in the output.
Control the resolution of rasterized vector graphics to optimize storage requirements or image quality.
Select the target format for converted graphics, including GIF, JPEG, CGM,
PNG, WMF, and Java on Windows, and Java and JPEG on Unix and Linux.
Platforms, Compilers and Dependencies
This section lists the supported platforms, supported compilers, and software dependencies for the KeyView software.
Supported Platforms
FreeBSD 8.1 x86.
HP HP-UX 11i and 11i v2 PA-RISC
Mac OS X Mountain Lion 10.8 or higher on 32- and 64-bit Apple-Intel architecture
Microsoft Windows 2003 Server x86 and x64
Microsoft Windows Vista Business Edition x86 and x64. Other editions of Vista have not been tested, but are likely supported.
Microsoft Windows 2008 Server Enterprise Edition x86 and x64
Microsoft Windows 2008 Server R2
Microsoft Windows XP x86 (Service Pack 2)
Microsoft Windows 7 x86 and x64
XML Export SDK C Programming Guide
•
•
•
•
•
•
29
30
•
•
•
•
•
•
Chapter 1 Introducing XML Export
Microsoft Windows 8 x86 and x64
Red Hat Enterprise Linux AS 4.0 x86
Red Hat Enterprise Linux AS 4.0 x64
Red Hat Enterprise Linux 5.0 x86 and x64
Red Hat Enterprise Linux 6.0 x86 and x64
Sun Solaris 9.0, and 10 SPARC
Sun Solaris 10 x64
SuSE Linux Enterprise Server 10, 10.1, 11 x86
SuSE Linux Enterprise Server 10, 10.1 x64
SuSE Linux Enterprise Server 11 x64
Supported Compilers
Platform
Microsoft
Windows
Sun Solaris
Architecture x86 x64 x86 64-bit
Linux x64
HP HP-UX PA-RISC
Mac OSX
SPARC 64-bit x86
FreeBSD
Apple-Intel 32-bit and 64-bit
BSD x86
Compiler Name cl cl
Sun Studio 12
Sun Studio 11 gcc / g++ gcc / g++ cc / aCC
LLVM gcc / g++
Compiler Version
Microsoft 32-bit C/C++ Optimizing Compiler
Version 16.00.30319.01 for x86
Microsoft C/C++ Optimizing Compiler Version
16.00.30319.01 for x64
Sun C 5.9 SunOS_i386 Patch 124868-01
2007/07/12
Sun C 5.8 Patch 121015-06 2007/10/03
3.4.3 (Redhat 4), 4.1.0 (SuSE Linux 10)
4.1.0 (Redhat 4), 4.1.0 (SuSE Linux 10) aCC: HP ANSI C++ B3910B A.03.70 for 32 bit
1
Apple LLVM 5.1 (clang-503.0.40) (based on LLVM
3.4svn)
4.2.1 [FreeBSD] 20070719
XML Export SDK C Programming Guide
Package Contents
Component
Java components
.NET components
Compiler
Java 1.5
Microsoft Visual J# 2005 Compiler
8.00.50727.42
Software Dependencies
Some KeyView components require that you have installed specific third-party software:
Java Runtime Environment (JRE) or Java Software Developer Kit (JDK) version 1.5. Required for Java API and graphics conversion in Export SDK.
Outlook 2002 client or later versions. Required when processing Microsoft
Outlook Personal Folders (PST) files using the MAPI-based reader (pstsr).
The native PST reader (pstnsr) does not require an Outlook client.
Lotus Notes or Lotus Domino (minimum requirement is 6.5.1, but version 8.5 is recommended). Required for Lotus Notes database (NSF) file processing.
Microsoft .NET Framework SDK version 2.0, Microsoft .NET Framework version 2.0 Redistributable Package (if programming in .NET environment)
Package Contents
The Export installation contains:
Libraries and executable files necessary for converting source documents into high-quality, well-formed XML (see
“Files Required for Redistribution” on page ).
The include files that define the functions and structures used by the application to establish an interface with Export: adinfo.h
kvxml.h
kvtypes.h
kvxtract.h
The Java API implemented in the package com.verity.api.export contained in the file KeyView.jar.
Several sample programs that demonstrate Export’s functionality.
XML Export SDK C Programming Guide
•
•
•
•
•
•
31
32
•
•
•
•
•
•
Chapter 1 Introducing XML Export
Sample images that can be used as navigation buttons and background textures in your output.
Template files that allow you to set conversion options without modifying at the
API level. They can be used to generate a wide range of output, from highly-stylized user-defined XML to stripped-down, text-only output suitable for use with an indexing engine.
The predefined DTD, Verity.dtd, used to validate all XML output.
Sample style sheets: wp.xsl (for word processing documents), ss.xsl (for spreadsheets), and pg.xsl (for presentation graphics).
License Information
During installation, the installation program validates the organization name and license key you enter and generates the install/OS/bin/kv.lic file, where install is the directory in which you installed KeyView, and OS is the operating system. This file is opened and validated when the KeyView API is used.
The kv.lic file contains the organization name and the 28-digit license key you specified during installation. The contents of a kv.lic file looks similar to the following:
Company Name
XXXXXXX-XXXXXXX-XXXXXXX-XXXXXXX
The license key controls whether the following are enabled:
full version of the KeyView SDK
trial version of the KeyView SDK
language detection and advanced document readers—The following components are considered advanced features, and are licensed separately:
Microsoft Outlook Personal Folders (PST) reader (pstsr and pstnsr)
Lotus Notes database (NSF) reader (nsfsr)
Mailbox (MBX) reader (mbxsr)
Character set detection library (kvlangdetect)
If you change the license key at any time, you must update the licensing information in the kv.lic file. See
“Update License Information” on page .
XML Export SDK C Programming Guide
License Information
Enable Advanced Document Readers
To enable advanced readers in one of the KeyView SDKs, you must obtain an appropriate license key from Autonomy and update the installed license key with
the new information as described in “Update License Information” on page
.
If you are enabling the MBX reader in an existing installation of Export, in addition to updating the license key, change the parameter 208=eml to 208=mbx in the formats_e.ini
file.
Update License Information
If you currently have an evaluation version of KeyView and have purchased a full version of the SDK, or you are adding a document reader (for example, the PST reader), you must update the license information that was installed with the original version of the KeyView SDK.
If you installed a full version of KeyView, but did not enter licensing information at the time of installation, you must also update the license information.
To update the information, do one of the following:
Manually update the license information that is stored in the text file named kv.lic
.
Re-install the product and enter the new license information when prompted.
To update the KeyView license information:
1. Open the license key file, kv.lic, in a text editor. The file is in the install\
OS\bin directory, where install is the directory in which you installed
KeyView, and OS is the operating system. The file contains the following text:
COMPANY NAME
XXXXXXX-XXXXXXX-XXXXXXX-XXXXXXX
2. Replace the text COMPANY NAME with the company name that appears at the top of the License Key Sheet provided by Autonomy. Enter the text exactly as it appears in the document.
3. Replace the characters XXXXXX-XXXXXXX-XXXXXXX-XXXXXXX with the appropriate license key from the License Key Sheet provided by Autonomy.
The license key is listed in the Key column in the Standalone Products table.
The key is a string containing 31 characters, for example,
2TQD22D-2M6FV66-2KPF23S-2GEM5AB . Enter the characters exactly as they appear in the document, and do not include a leading or trailing space.
4. The finished kv.lic file looks similar to the following:
Autonomy
24QD22D-2M6FV66-2KPF23S-2G8M59B
XML Export SDK C Programming Guide
•
•
•
•
•
•
33
Chapter 1 Introducing XML Export
34
•
•
•
•
•
•
5. Save the kv.lic file.
Directory Structure
describes the directories created during the XML Export installation. The variable install is the pathname of the Export installation directory (for example, /usr/autonomy/KeyviewExportSDK on UNIX, or C:\Program
Files\Autonomy\KeyviewExportSDK on Windows). On UNIX, the XML
Export directory is named /xmlexpt.
The variable OS is the operating system for which the SDK is installed. For example, the bin directory on a standard 32-bit Windows installation would be located at C:\Program Files\Autonomy\KeyviewExportSDK\WINDOWS\ bin .
Directory
install\OS\bin
install\javaapi\ini
install\javaapi\javadoc
install\javaapi\sample
install\testdocs
install\XML Export\guide
install\XML Export\include
install\XML Export\programs\bin
Contents
Contains the libraries, executables for the sample programs Export Demo and cnv2xml, the Java program (kvraster.class), the Java applet
(kvvector.jar), the format detection file, formats_e.ini
, the license key file (kv.lic), and a number of other supporting files.
Contains the template files used with the Java API.
Contains the Javadoc for the Java API.
Contains the source files and sample programs for the
Java API.
Contains sample word processing, spreadsheet, and presentation graphics files that can be used to test
XML Export’s options. You may also find this directory useful when testing your own applications.
Contains the XML Export C Programming Guide and
XML Export Java Programming Guide in HTML and
PDF format.
Contains the header files (adinfo.h, kvxml.h and kvtypes.h
) for the C API.
Contains the executable files for the sample Visual
Basic program called Export Demo.
XML Export SDK C Programming Guide
Directory Structure
Directory Contents
install\XML Export\programs\cnv2xml Contains the C source code files for a sample program that creates a single XML file. The executable for this sample program is in the bin directory.
install\XML Export\programs\ cnv2xmloop
install\XML Export\programs\
ExportDemo
Contains the C source code for a sample program that creates a single XML file out of process.
Contains the source code for a sample Visual Basic program. The executable for this sample program is in the bin directory. Export Demo is available through the
Start menu.
install\XML Export\programs\ini
install\XML Export\programs\metadata Contains the C source code and supporting files for a sample program that creates a valid XML file containing only the document’s metadata.
install\XML Export\programs\pdfini
Contains the template files used to set the conversion options in the C API.
Contains the template file used to extract custom metadata from PDF documents.
install\XML Export\programs\tempout The default output directory for converted files.
Contains the KeyView DTD, sample style sheets, and character entity files. These files are required for viewing the converted XML files.
install\XML Export\programs\tstxtract Contains the C source code and supporting files for a sample program that demonstrates the File Extraction interface.
install\XML Export\programs\ xmlcallback
Contains the C source code and supporting files for a sample program that demonstrates how user callbacks can dynamically shape the XML conversion.
install\XML Export\programs\xmlindex Contains the C source code and supporting files for a sample program that produces text-only XML.
install\XML Export\programs\xmlini Contains the C source code and supporting files for a sample program that uses template files to set the conversion options.
XML Export SDK C Programming Guide
•
•
•
•
•
•
35
Chapter 1 Introducing XML Export
36
•
•
•
•
•
•
Directory Contents
install\XML Export\programs\xmlmulti Contains the C source code and supporting files for a sample program that creates multiple XML files from a source document. The main file contains the table of contents. Each H1 heading is contained within its own file.
install\XML Export\programs\ xmlonefile
Contains the C source code and supporting files for a sample program that converts a source document into a single, formatted XML file.
install\XML Export\rel_notes Contains the XML Export Release Notes in HTML and
PDF format.
Definition of Terms
The following are specialized terms used throughout the guide. anchor block block chunk or chunk callback stream token
XML markup that defines both anchors and hyperlinks. An anchor is a named place in a document to which other documents can form a link. Anchors use the XML anchor tags (<a xmlns:xlink= xlink href=> </a> ) to facilitate navigation within a document.
The major browsers do not currently support linking in XML documents.
All source document content (including sub-headings) associated with Heading Level 1. Export identifies and/or generates blocks from the input stream for the implementation of the your XML markup.
All source document content associated with Heading Levels 2 through 6. Chunks are subdivisions of blocks. You may supply specific XML markup for the different levels of block chunks.
A function optionally supplied by your application and called from within the Export API. For example, callbacks allow your application to monitor the progress of the conversion process dynamically.
Transmission of a file’s content between memory and disk in a continuous flow.
The vehicle for conveying specific types of information to and from the
API during the conversion process. Tokens are placeholders for
markup that appears in the output. See “Export Tokens” on page .
XML Export SDK C Programming Guide
Definition of Terms
XML Export SDK C Programming Guide
•
•
•
•
•
•
37
Chapter 1 Introducing XML Export
38
•
•
•
•
•
• XML Export SDK C Programming Guide
C HAPTER 2
Getting Started
This section provides an overview of XML Export and describes how to use the C implementations of the API. It contains the following topics:
Use the C-Language Implementation of the API
Use the Verity Document Type Definition (DTD)
XML Export SDK C Programming Guide
•
•
•
•
•
•
39
Chapter 2 Getting Started
Architectural Overview
The general architecture of the KeyView XML conversion technology is the same across all supported platforms and is illustrated in
.
Figure 1 XML Export Architecture
40
•
•
•
•
•
• XML Export SDK C Programming Guide
Architectural Overview
Each component is described in
.
Component
Developer’s Application
File Extraction API
XML Export API
Format Detection Module
Structured Access Layer
Document Reader
XML Writers
XML Export SDK C Programming Guide
Description
The developer’s application interfaces directly with the XML Export API through either a C-language or Java implementation.
The File Extraction API opens a file and extracts the file’s sub files so that they are available for conversion. See
“Use the File Extraction API” on page .
The XML Export API exposes the functionality of XML Export and controls all other XML Export modules during the conversion process.
The format detection module determines the file type of the source file, which enables the XML Export interface to load the appropriate
structured access layer module and document reader. See “File Format
The structured access layer contains three modules: one for word processing, one for spreadsheets, and one for presentations and graphics. Information from the format detection module determines which access layer module operates at this stage of the conversion. The structured access layer performs the following:
1. Loads the appropriate document reader.
2. Processes the data stream from the document reader.
3. Determines table of contents entries.
4. Sends the stream to the appropriate XML writer.
5. Accepts the XML stream from the XML writer.
6. Generates the XML output file with a table of contents, metadata, and the document’s contents, and sends it to the XML Export interface.
Each document reader reads a specific file format and sends a text stream of the document to the structured access layer. Word processing readers return a token stream to the structured access layer. A token stream contains the document contents and messages (tokens) that precede the content and identify the type of information that follows them. Each reader is loaded as required by the structured access layer.
See
“Document Readers and Writers” on page
for a complete list of document readers.
Each XML writer accepts a text stream or token stream from the structured access layer and generates an equivalent XML stream that is sent back to the structured access layer. The structured access layer then generates the output file. See
“Document Readers and Writers” on page
for a list of format writers.
•
•
•
•
•
•
41
42
•
•
•
•
•
•
Chapter 2 Getting Started
Memory Abstraction
All dynamic memory allocations in Export modules are abstracted through a C interface. This memory allocation interface is defined in the KVMemoryStream structure in kvtypes.h
. See
“KVMemoryStream” on page . You may override
all memory allocations by providing a C structure containing pointers to functions identical in nature to their standard ANSI C counterpart. The xmlcallback sample program demonstrates Export memory management features. See
.
Enhance Performance
KeyView is designed for optimal performance out of the box. However, there are some parameters that you can adjust to improve system performance according to your needs.
File Caching
To reduce the frequency of I/O operations, and consequently improve performance, the KeyView readers load file data into memory. The readers then read the data from the cache rather than the physical disk. You can configure the amount of memory used for file caching through the formats_e.ini
file.
Generally, when you increase the memory, performance will improve.
By default, KeyView uses a maximum of 1MB of memory for each thread— assuming a thread contains only one instance of pContext that is returned from the session initialization (see
“fpInit()” on page ). If the file data is larger than
1MB, up to 1MB of data is cached and the data beyond 1MB is read from disk.
The minimum amount of memory that can be used for file caching is 64KB.
To determine a reasonable value, divide the maximum amount of memory you want KeyView to use for file caching by the total number of threads. For example, if you want KeyView to use a maximum of 50MB of memory and have 10 threads, set the value to 5MB.
To modify the memory allocated for file caching, change the value for the following parameter in the
[DiskCache]
section of the formats_e.ini
file:
DiskCacheSize=1024
The value is in kilobytes. If this parameter is not set or is set to 0 (zero), the minimum value of 64KB is used.
XML Export SDK C Programming Guide
Convert Files Out of Process
The formats_e.ini
file is in the directory install
\ OS \bin
, where install is the pathname of the Export installation directory and OS is the name of the operating system.
Convert Files Out of Process
Export can run independently from the calling application. This is called out of
process. Out-of-process conversions protect the stability of the calling application in the rare case when a malformed document causes Export to fail. You can also run Export in the same process as the calling application. This is called
in-process. However, it is strongly recommended you convert documents out of process whenever possible.
The Export out-of-process framework uses a client-server architecture. The calling application sends an out-of-process conversion request to the Service
Request Broker in the main Export process. The Broker then creates, monitors, and manages a Servant process for the request—each request is handled by one independent Servant process. Data is exchanged between the application thread and the Servant through TCP/IP sockets. The source data is sent to the Servant process as a data stream or file, converted in the Servant, and then returned to the application thread. At that point, the application can either terminate the
Servant process or send more data for conversion.
Multiple conversion requests can be sent from multiple threads in the calling application simultaneously. All requests sent from one thread are processed by the Servant mapped to that thread, in other words, each thread can only have one
Servant to process its conversion requests.
Any standard conversion errors generated by the Servant are sent to the application.
NOTE Currently, the main Export process and Servant processes must run on the same host.
The following are requirements for running Export out of process:
Internet Protocol (TCP/IP) must be installed
Multi-threaded processing must be supported on the operating system platform
The user application must be built with a multi-threaded runtime library
XML Export SDK C Programming Guide
•
•
•
•
•
•
43
44
•
•
•
•
•
•
Chapter 2 Getting Started
The following functions run in-process or out of process:
NOTE When converting out of process, these functions must be called after the call to start an out-of-process session and before the call to end an out-of-process session.
Other Export API functions and the File Extraction functions always run in-process.
Configure Out-of-Process Conversions
Although most components of the out-of-process conversion are transparent, the following parameters are configurable:
File-size threshold/temporary file location
Conversion time-out
Listener port numbers and time-out
Connection time-out and retry
Servant process name
These parameters are defined internally, but you can override the default by defining the parameter in the formats_e.ini
file. The formats_e.ini
file is in the directory install \ OS \bin , where install is the pathname of the Export installation directory and OS is the name of the operating system.
To set the parameters, add the following section to the formats_e.ini
file:
[KVExportOOPOptions]
TempFileSizeMark=
TempFilePath=
WaitForConvert=
WaitForConnectionTime=
ListenerPortList=
ListenerTimeout=
ConnectRetryInterval=
ConnectRetry=
ServantName=
XML Export SDK C Programming Guide
Convert Files Out of Process
Each parameter is described in
. The default values for these parameters are set to ensure reasonable performance on most systems. If you are processing a large number of files, or running Export on a slow machine, you may need to increase some of the time-out and retry values.
Parameter
TempFileSizeMark unit = megabytes default=10
TempFilePath type = file path default = current working directory
Description
The file-size threshold. If the input file received by the Servant is larger than this value, temporary files are created to store the data. The directory in which the temporary files are stored is defined by the TempFilePath parameter. If the file received is smaller than this value, the data is stored in memory in the
Servant. This only applies when the input is a stream.
The directory in which temporary files are stored. Temporary files are created when the input file surpasses the file-size threshold
(TempFileSizeMark). If the Servant cannot access the file path, an error is generated.
This only applies when converting in stream mode.
The length of time to wait for a Servant to convert a file. If the conversion is not completed within the specified time, the error code “Wait for child process failed” is generated.
WaitForConvert unit = seconds default = 1800 range = 30~3600
WaitForConnectionTime unit = seconds default = 180 range = 15~600
ListenerPortList type = integer default = 9985, 9986, 9987, 9988, 9989
ListenerTimeout unit = seconds default = 10 range = 5~30
The length of time to wait for the Servant to connect to the application thread after the application has sent a conversion request to the Broker. If the Servant does not connect within the specified time, the error code “Wait for child process failed ” is generated. If there are many Servant processes running simultaneously, this value may need to be increased.
The TCP/IP port number(s) used for communication between the calling application and the Servant. You can specify a single port number or a series of numbers (enter the number separated by commas).
The length of time to wait for the Servant listener thread to get a process ID from the Servant after the connection is established.
If the ID is not obtained within the specified time, the error code
“Wait for child process failed” is generated. During this time, no other Servant can connect with the application.
XML Export SDK C Programming Guide
•
•
•
•
•
•
45
Chapter 2 Getting Started
46
•
•
•
•
•
•
Parameter
ConnectRetryInterval unit = microseconds default = 0.1
range = 50000~500000
ConnectRetry type = integer default = 120 range = 30~600
Description
The length of time to wait after a Servant has failed to connect to the application before it retries the connection. A Servant may be unable to connect because the application is waiting for another
Servant to send a process ID.
To calculate the total retry interval, the value set here is added to the platform-specific TCP retry value (on Windows, this is 1 second).
The number of attempts the Servant makes to connect to the calling application. This value and the total retry interval determine the total delay time. The total delay is calculated as follows:
ConnectRetryInterval + platform-specific_TCP_retry_value * ConnectRetry
For example, if the ConnectRetryInterval is set to 2 seconds, and the Export process is running on Windows (the default TCP retry value on Windows is 1 second), the total delay would be:
2 + 1 * 120 = 360
The Servant would attempt to connect to the application every 3 seconds for 120 attempts for a total of 360 seconds.
The name of the Servant process. To move the Servant to another location, enter a fully qualified path.
ServantName type = string default = servant
Run Export Out of Process—Overview
To convert files out of process
1. If required, set parameters for the out-of-process conversion in the formats_e.ini
file.
2. Initialize an Export session.
3. If you are using streams, create an input stream.
4. Define the conversion options.
5. Initialize an out-of-process session.
6. Convert the input and/or call other functions that can run out of process.
7. Shutdown the out-of-process session.
XML Export SDK C Programming Guide
Convert Files Out of Process
8. Repeat
through
9. Terminate the out-of-process session and the Servant process.
10. Shutdown the Export session.
Recommendations
To ensure multi-threaded conversions are thread-safe, you must create a unique context pointer for every thread by calling fpInit() . In addition, threads must not share context pointers, and the same context pointer must be used for all API calls in the same thread. Creating a context pointer for every thread does not affect performance because the context pointer uses minimal resources.
All functions that can run out of process must be called within the out-of-process session, that is, after the call to initialize the out-of-process session and before the call to end the out-of-process session.
When terminating an out-of-process session, persist the Servant process by setting the boolean flag bKeepServantAlive
in the
KVXMLEndOOPSession() function or endOOPSession method. If the Servant process remains active, subsequent conversion requests are processed more quickly because the
Servant process is already prepared to receive data. Only terminate the
Servant when there are no more out-of-process requests.
To recover from a failure in the Servant process, start a new out-of-process session. This creates a new Servant process for the next conversion.
Run Export Out of Process in the C API
The cnv2xmloop sample program demonstrates how to run Export out of process.
To convert files out of process in the C API
1. If required, set parameters for the out-of-process conversion in the formats_e.ini
file. See “Configure Out-of-Process Conversions” on page .
2. Declare instances of the following types and assign values to the members as required:
KVXMLTemplateEx
KVXMLOptionsEx
KVXMLHeadingInfo
KVXMLTOCOptions
See
“XML Export API Structures” on page for more information.
XML Export SDK C Programming Guide
•
•
•
•
•
•
47
48
•
•
•
•
•
•
Chapter 2 Getting Started
3. Load the
KVXML
library and obtain the
KVXMLInterface
entry point by calling
KVXMLGetInterface() . See
.
4. Initialize an Export session by calling fpInit()
. See
5. If you are using streams for the input and output source, follow these steps; otherwise proceed to
a. Create an input stream ( KVInputStream ) by calling fpFileToInputStreamCreate()
. See
“fpFileToInputStreamCreate()” on page .
b. Create an output stream (
KVOutputStream
) by calling fpFileToOutputStreamCreate() . See
“fpFileToOutputStreamCreate()” on page .
.
6. Set up an out-of-process session by calling
KVXMLStartOOPSession()
. See
“KVXMLStartOOPSession()” on page . This functions performs the
following:
Initializes the out-of-process session.
Specifies the input stream or file. If you are using an input file, set pFileName to the filename, and set pInputStream to NULL. If you are using an input stream, set pInputStream
to point to
KVInputStream
, and set pFileName to NULL.
Sets conversion options in the
KV X
KVXMLTOCOptions data structures.
MLTemplate
,
KVXMLOptions
, and
Creates a Servant process.
Establishes a communication channel between the application thread and the Servant.
Sends the data to the Servant.
See the sample code in “Example—KVXMLStartOOPSession” on page ,
and “KVXMLStartOOPSession()” on page
.
7. Convert the input and generate the output files by calling
KVXMLConvertFile() or fpConvertStream() . The structures
KVXMLTemplate
,
KVXMLOptions
, and
KVXMLTOCOptions
are defined in the call to KVXMLStartOOPSession() , and should be NULL in the conversion call.
A conversion function can only be called once in a single out-of-process session. See
, and
8. Terminate the out-of-process session by calling KVXMLEndOOPSession() . The
Servant ends the current conversion session, and releases the source data and session resources. See sample code in
XML Export SDK C Programming Guide
Convert Files Out of Process
, and
“KVXMLEndOOPSession()” on page .
9. If you used streams, free the memory allocated for the input stream and output stream by calling the functions fpFileToInputSreamFree() and fpFileToOutputStreamFree()
. See
“fpFileToInputStreamFree()” on page and
10. Repeat
through
11. After all files are converted, terminate the out-of-process session and the
Servant process by calling
KVXMLEndOOPSession()
and setting the boolean to FALSE .
12. After the out-of-process session and Servant are terminated, shutdown the
Export session by calling fpShutDown() . See
.
Example—KVXMLStartOOPSession
The following sample code is from the cnv2xmloop sample program:
/* declare OOP startsession function pointer */
KVXML_START_OOP_SESSION fpKVXMLStartOOPSession;
/* assign OOP startsession function pointer */ fpKVXMLStartOOPSession = (KVXML_START_OOP_SESSION)mpGetProcAddress
(hKVXML,"KVXMLStartOOPSession");
if(!fpKVXMLStartOOPSession)
{
printf("Error assigning KVXMLStartOOPSession pointer\n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);
mpFreeLibrary(hKVXML);
return 7;
}
/********START OOP SESSION *****************/ if(!(*fpKVXMLStartOOPSession)(pKVXML,
&Input,
NULL,
&XMLTemplates,
&XMLOptions,
/* Mark-up and related variables */
/* Options */
NULL, /* TOC options */
&oopServantPID,
&error,
0,
NULL,
NULL))
{
printf("Error calling fpKVXMLStartOOPSession \n");
XML Export SDK C Programming Guide
•
•
•
•
•
•
49
Chapter 2 Getting Started
50
•
•
•
•
•
•
(*KVXMLInt.fpShutDown)(pKVXML);
mpFreeLibrary(hKVXML);
return 9;
}
Example—KVXMLEndOOPSession
The following sample code is from the cnv2xmloop sample program:
/* declare endsession function pointer */
KVXML_END_OOP_SESSION fpKVXMLEndOOPSession;
/* assign OOP endsession function pointer */ fpKVXMLEndOOPSession = (KVXML_END_OOP_SESSION)mpGetProcAddress
(hKVXML, "KVXMLEndOOPSession");
if(!fpKVXMLEndOOPSession)
{
printf("Error assigning KVXMLEndOOPSession pointer\n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);
mpFreeLibrary(hKVXML);
return 8;
}
/********END OOP SESSION, DO NOT KEEP SERVANT ALIVE *********/ if(!(*fpKVXMLEndOOPSession)(pKVXML,
FALSE,
&error,
0,
NULL,
NULL))
{
printf("Error calling fpKVXMLEndOOPSession \n");
(*KVXMLInt.fpShutDown)(pKVXML);
mpFreeLibrary(hKVXML);
return 10;
}
Convert Files
KeyView Export SDK enables you to convert many different types of documents to
XML. Converting is the process of extracting the text from a document without the application-specific markup, and applying XML markup. However, the conversion process can also include the following:
XML Export SDK C Programming Guide
Sub File Extraction
Extracting sub files—exposes all sub files for conversion. See
Setting conversion options—determines the content, structure, and appearance of the XML output. See
“Set Conversion Options” on page
.
Extracting the file’s format—detects a file’s format, and reports the information to the API, which in turn reports the information to the developer’s application.
See
“Extract File Format Information” on page .
Extracting metadata—extracts selected metadata (document properties) from a file. See
Converting character set—controls the character set of both the input and the
output text. See “Convert Character Sets” on page
.
Implementing callbacks—controls the conversion while it is in progress. See
“XML Export API Callback Functions” on page .
You can use one of the following methods to convert documents:
Use the Export Demo sample program. This Visual Basic program demonstrates most Export SDKs capabilities and is the easiest way to get
started. See “Use the Export Demo Program” on page
Use the C-language implementation of the API from your C or C++ application. See
“Use the C-Language Implementation of the API” on page .
Use the C sample programs. See
NOTE It is strongly recommended you convert documents out of process.
During out-of-process conversion, Export runs independently from the calling application. Out-of-process conversions protects the stability of the calling application in the rare case when a malformed document causes
Export to fail. See
“Convert Files Out of Process” on page .
Sub File Extraction
To convert a file, you must first determine whether the source file contains any sub files (attachments, embedded objects, and so on). A file that contains sub files is called a container file. Compressed files (such as Zip), mail messages with attachments (such as Microsoft Outlook Express), mail stores (such as Microsoft
Outlook Personal Folders), and compound documents with embedded OLE objects (such as a Microsoft Word document with an embedded Excel chart) are examples of container files.
XML Export SDK C Programming Guide
•
•
•
•
•
•
51
52
•
•
•
•
•
•
Chapter 2 Getting Started
If the file is a container file, the container must be opened and its sub files extracted using the File Extraction API. The extraction process is done repeatedly until all sub files are extracted and exposed for conversion. Once a sub file is extracted, you can call the XML Export APIs to convert the file.
If a file is not a container, you should pass it directly to the XML Export API for conversion without extraction.
See
“Use the File Extraction API” on page for more information.
Convert Outlook Email without Using the Extraction API
It is strongly recommended you convert all container files, including Microsoft
Outlook files, using the File Extraction API. However, you can convert Outlook email messages (MSG) directly using the Export API and the MSG reader
(msgsr).
NOTE The MSG reader only extracts the message body of an MSG file. Attachments are not extracted.
To convert MSG files using the MSG reader, add the following to the formats_e.ini
file (TRUE is case-sensitive):
[ContainerOptions] bConvertMSG=TRUE
Set Conversion Options
Conversion options are parameters that determine the content, structure, and appearance of the XML output. For example, you can specify the markup inserted at the beginning and end of specific XML blocks, whether a heading is included in the table of contents, the output character set, or the resolution at which graphics are converted. The conversion options can be set either in the API or in the template files. Regardless of the method used to set the options, the values are ultimately passed to the API and used to populate the following data structures:
XML Export SDK C Programming Guide
Set Conversion Options
The conversion options are described in
“XML Export API Structures” on page .
Set Conversion Options Using the API
The conversion options are set using any of the following functions:
Set Conversion Options Using the Template Files
XML Export includes templates in the form of initialization files (
.ini
). The templates provide a quick and easy way to modify the conversion options without programming at the API level. However, the template files do not give you complete control of the conversion process. To control some features, you must use the API directly.
The template files can be fully customized using a text editor. For example, to change the output character set from the default
KVCS_UTF8
to
KVCS_SJIS
in the xml1file.ini
template, you would make the following change in bold:
[KVXMLOptions] eOutputCharSet=KVCS_SJIS bForceOutputCharSet=TRUE
To create valid XML, a template file must contain two structures:
KVXMLTemplateEx and
KVXMLOptionsEx.
NOTE If you enter markup in the template files that is not compliant with
XML standards, XML Export inserts the markup into the output file unchanged. This may result in a malformed XML file.
An application must then read the template file and write the data to the appropriate Export structures. In the C sample program xmlini
, a template file is supplied as a command-line argument (see
Templates
The template files for the C API implementation are in the directory install \ xmlexport\programs\ini
, where install is the pathname of the Export installation directory. The following templates are provided:
XML Export SDK C Programming Guide
•
•
•
•
•
•
53
Chapter 2 Getting Started
Template
Cascading style sheet
(xml_css.ini)
Index (xml_index.ini)
Description
This template writes style sheet information to an external CSS file. This makes the XML output significantly smaller because the information is not stored within the output file.
See
“Use Style Sheets” on page and
“Use Style Sheets with xmlini” on page for more information on using an external CSS file.
Converts a source document into a single, largely unformatted XML file that is appropriate for use with an indexing engine.
54
•
•
•
•
•
• XML Export SDK C Programming Guide
Set Conversion Options
Template
Single file( xml1file.ini
)
Single file for presentations
(xml1file_pg.ini)
Single file with table of contents
(xml1filetoc.ini)
Description
Creates a single XML file.
Does not define an XSL style sheet. A default XSL style sheet that is appropriate to the source document type is used. The defaults supplied are wp.xsl (for word processing documents), ss.xsl (for spreadsheets), pg.xsl (for presentations).
Forces the output character set to UTF-8.
Maintains the source document’s fonts and styles.
Does not create a table of contents.
This template is designed specifically for presentation formats.
Creates a single XML file.
Defines an XSL style sheet for presentations (pg.xsl).
Forces the output character set to UTF-8.
Since XML Export only extracts textual components from presentations, the bRasterizeFiles member of KVXMLOptions
is set to FALSE. See “KVXMLOptions” on page .
Only the szMainTop, szMainBottom, and szUserSummary parameters of the KVXMLTemplate structure are relevant to presentations and are set in the presentations template.
A template file for presentations must not include any other parameters in the KVXMLTemplate structure. See
Creates a single XML file.
Creates a table of contents at the top of the XML document.
Uses the Verity.dtd.
Uses an XSL style sheet (wp.xsl).
Forces the output character set to UTF-8.
Lists all metadata (Title, Subject, Author, Comments, Created,
Modified, Last Saved By, and Revision Number).
Uses the name of the worksheets for spreadsheets.
Uses the slide titles for presentations. If no titles are available in the source document, it uses “slide 1,” “slide 2,” “slide 3,” and so on.
XML Export SDK C Programming Guide
•
•
•
•
•
•
55
Chapter 2 Getting Started
Use the Export Demo Program
The easiest way to get started with XML Export is to become familiar with its capabilities through the Visual Basic sample program, Export Demo. The source code for the program is in the directory install \xmlexport\programs\
ExportDemo , where install is the pathname of the Export installation directory.
Export Demo is for Windows only, and requires Internet Explorer 4.01 with Service
Pack 1 or higher.
The output options for output files are pre-defined in Export Demo and cannot be changed in the user interface. Export Demo uses a small sample of the options available in the XML Export API.
You can use the sample documents in install \testdocs to experiment with converting different file formats.
To launch the sample program, select Export Demo from Start | Programs|
Autonomy | XML Export. The following dialog appears:
Figure 2 Export Demo: Launching
56
•
•
•
•
•
•
NOTE HTML conversion using HTML Export is available in Export Demo if you have HTML Export installed. If you do not have HTML Export installed, the HTML button is disabled.
XML Export SDK C Programming Guide
Use the Export Demo Program
Change Input/Output Directories
If XML Export is installed in the default directory, the output and input directories are automatically set. The default location for source files is the directory i nstall \testdocs . The default location for output files is the directory install \ xmlexport\programs\tempout
.
If XML Export is installed in a directory other than the default, you are prompted to select an output and input directory when you first start up Export Demo.
To change the default directories for the source and output files
1. Select Options | Set Directories. The following dialog appears:
Figure 3 Export Demo: Setting Directories
2. From the tree view, select the drive letter and directory for the source or output files.
3. In Change Location, select which files are in the directory, either Source or
XML.
4. Click Change. The Current Locations fields are updated with the new selection.
5. Follow the same procedure for the other file types. When you are finished, click OK.
Set Configuration Options
With XML Export, you can configure options prior to the document conversion using the XMLConfig() function. Export Demo demonstrates this function, and allows you to control the following options:
Generating output with verbose markup and without images.
XML Export SDK C Programming Guide
•
•
•
•
•
•
57
58
•
•
•
•
•
•
Chapter 2 Getting Started
Convert Files
Including position information in the markup generated for a PDF document.
Suppress Imagesn
Export Demo provides an option to generate output with verbose markup and
without images. For more information, see “KVXMLConfig()” on page
.
To specify that images are suppressed in the XML output, select Options | XML
Config | Suppress Images.
Using PDF Position Information
Export Demo provides an option to include position information in the markup
generated for a PDF document. For more information, see “KVXMLConfig()” on page .
To specify that PDF position information be included in the XML output, select
Options | XML Config | Enable Position Token.
To convert a single file:
1. Select Options | Convert | Single file.
2. Select the document from the file list, and click XML in the Convert file to pane.
To convert files in a directory:
1. Select Options | Convert | Entire directory.
2. Click XML in the Convert directory to pane.
To view a converted file, double-click the output file in the Output Files pane or select the output file and click View.
The converted file is displayed in the view pane.
XML Export SDK C Programming Guide
Use the C-Language Implementation of the API
Figure 4 Export Demo: Converting Files
To view the original document, select the document from the file list, and click
Open. If you have an application on your system associated with the file, the file is displayed in that application.
To delete output files, select the file in the Output Files pane and click Delete.
Use the C-Language Implementation of the API
The C-language implementation of the XML Export API is divided into the following function suites:
File Extraction API Functions —Open and extract sub files in a container file.
They also extract metadata and file format information, and control character set conversion on extraction.
— Extract format information (metadata, character set, and format), create an input/output stream from a file, and open, convert, and close the stream.
XML Export API Callback Functions
progress.
—Controls the conversion while it is in
XML Export SDK C Programming Guide
•
•
•
•
•
•
59
Chapter 2 Getting Started
60
•
•
•
•
•
•
Input/Output Operations
In the XML Export API, the source input and target output can be either a physical file accessed through a file path, or a stream created from a data source. A stream is a C structure containing pointers to functions similar in nature to their standard
ANSI C counterparts. This structure is passed to Export functions in place of the standard input source. The input stream is defined by the structure
KVInputStream
in kvtypes.h
. The output stream is defined by the structure
KVOutputStream in kvtypes.h
and
You can create an input stream using the function fpFileToInputStreamCreate()
, and an output stream using the function fpFileToOutputStreamCreate() . These functions assign C equivalent I/O functions to fpOpen()
, fpRead()
, fpSeek()
, fpTell()
, and fpClose()
. See
“fpFileToInputStreamCreate()” on page
and
“fpFileToOutputStreamCreate()” on page .
Convert Files
To use the C-language implementation of the API
1. Develop the XML markup and tokens to be assigned to the required members of a declared instance of
KVXMLTemplate
.
If you use markup in the structure that is not compliant with XML standards,
XML Export inserts the markup into the output file unchanged. This may result in a malformed XML file.
2. Declare instances of the following types and assign values to the members as required:
KVXMLTemplateEx
KVXMLOptionsEx
KVXMLHeadingInfo
KVXMLTOCOptions
See
“XML Export API Structures” on page for more information.
3. Load the KVXML library and obtain the KVXMLInterface entry point by calling
KVXMLGetInterface()
. See
.
4. Initialize an Export session by calling fpInit() . The function’s return value, pContext
, is passed as the first parameter to all other Export functions. See
5. Pass the context pointer from fpInit()
and the address of a structure containing pointers to the File Extraction API functions in the call to
KVGetExtractInterface()
. See
XML Export SDK C Programming Guide
Use the C-Language Implementation of the API
6. If you are using streams for the input and output source, follow these steps;
: a. Create an input stream (
KVInputStream
) by calling fpFileToInputStreamCreate() , or using code similar to the example code in the sample programs. See
“fpFileToInputStreamCreate()” on page .
b. Create an output stream (
KVOutputStream
) by calling fpFileToOutputStreamCreate() , or using code similar to the example code in the sample programs. See
“fpFileToOutputStreamCreate()” on page .
.
7. Declare the input stream or filename in the KVOpenFileArg structure. See
8. Open the source file by calling fpOpenFile() and passing the
KVOpenFileArg
structure. This call defines the parameters necessary to open a file for extraction. See
9. Determine whether the source file is a container file (contains sub files) by calling fpGetMainFileInfo() . See
.
10. If the call to fpGetMainFileInfo()
determined the source file is a container file, proceed to
; otherwise, proceed to
11. Determine whether the sub file is itself a container (contains sub files) by calling fpGetSubFileInfo() . See
“fpGetSubFileInfo()” on page .
12. Extract the sub file by calling fpExtractSubFile()
. See
“fpExtractSubFile()” on page .
13. If the call to fpGetSubFileInfo()
determined the sub file is a container file,
until all sub files are extracted; otherwise, proceed to
14. Setup an out-of-process session by calling KVXMLStartOOPSession() . See
“KVXMLStartOOPSession()” on page .
15. Convert the input and generate the output files by calling
KVXMLConvertFile()
or fpConvertStream()
. The structures
KVXMLTemplate , KVXMLOptions , and KVXMLTOCOptions are defined in the call to
KVXMLStartOOPSession()
, and should be NULL in the conversion call.
A conversion function can only be called once in a single out-of-process
session. See “fpConvertStream()” on page
or
“KVXMLConvertFile()” on page .
XML Export SDK C Programming Guide
•
•
•
•
•
•
61
Chapter 2 Getting Started
62
•
•
•
•
•
•
If you are using callbacks, they are called while the conversion process is underway. If required, alternate paths and filenames can be specified for output files, including using the table of content entries for the filenames. See
16. If you are converting additional files, terminate the out-of-process session by calling KVXMLEndOOPSession() and setting the boolean to TRUE . The Servant ends the current conversion session, and releases the source data and session resources.
If you are not converting additional files, terminate the out-of-process session
and the Servant process by calling KVXMLEndOOPSession() and setting the boolean to
FALSE
. See “KVXMLEndOOPSession()” on page
17. Close the file by calling fpCloseFile() . See
18. If you used streams, free the memory allocated for the input stream and output stream by calling the functions fpFileToInputSreamFree() and fpFileToOutputStreamFree()
. See
“fpFileToInputStreamFree()” on page and
19. Repeat
through
Step 18 for additional source files.
20. Shutdown the Export session by calling fpShutDown() . See
Multi-threaded Conversions
To ensure multi-threaded conversions are thread-safe, you must create a unique context pointer for every thread by initializing the Export session using fpInit() .
In addition, threads must not share context pointers, and the same context pointer must be used for all API calls in the same thread. Creating a context pointer for every thread does not affect performance because the context pointer uses minimal resources.
For example, your code should have the following logic for one thread: fpInit()
KVGetExtractInterface()
fpFileToInputStreamCreate()
fpFileToOutputStreamCreate()
fpOpenFile()
fpGetMainFileInfo()
fpGetSubFileInfo()
fpExtractSubFile
/* container file */
fpGetSubFileMetadata()
KVXMLStartOOPSession()
fpConvertStream()
KVXMLEndOOPSession(bKeepServantAlive TRUE)
fpCloseFile()
XML Export SDK C Programming Guide
Use the Verity Document Type Definition (DTD)
fpFileToInputSreamFree()
fpFileToOutputStreamFree()
set input/output file
fpOpenFile()
fpGetMainFileInfo() /* not a container file */
KVXMLStartOOPSession()
KVXMLConvertFile()
KVXMLEndOOPSession(bKeepServantAlive TRUE)
fpCloseFile()
...
fpShutdown()
Use the Verity Document Type Definition (DTD)
XML Export produces well-formed, valid XML documents. Document validity is based on a Document Type Definition (DTD) called the
Verity.dtd
. The
Verity.dtd
is in the default output directory tempout . If the DTD is in a different directory, the full path must be specified in pszVerityDTDPath
.
The elements in the Verity.dtd
are based on those defined in the W3C XHTML
1.0 specification and the attributes are based on those defined in the W3C CSS 2 specification.
The root element of each document is “
VerityXMLExport
.” Character entities are imported by using the three XHTML DTDs defined at the beginning of the
Verity.dtd
.
<!-- Character entities -->
<!ENTITY % HTMLlat1x SYSTEM "HTMLlat1x.ent">
%HTMLlat1x;
<!ENTITY % HTMLspecialx SYSTEM "HTMLspecialx.ent">
%HTMLspecialx;
<!ENTITY % HTMLsymbolx SYSTEM "HTMLsymbolx.ent">
%HTMLsymbolx;
Use XML Style Language Transformation (XSLT)
XML Export is designed to generate XML documents based on the Verity DTD.
You can convert the XML produced by XML Export to other XML vocabularies, such as Wireless Markup Language (WML), using XSLT.
XML Export SDK C Programming Guide
•
•
•
•
•
•
63
Chapter 2 Getting Started
Add Elements and Attributes to the DTD
XML Export can only generate XML that conforms to the Verity DTD. You can create your own DTD based on the Verity DTD. You cannot rename the Verity
DTD, so make sure you back up the original Verity DTD to another name before making changes.
If you create your own DTD and add elements or attributes that are not defined in the original Verity DTD, you must ensure the new markup is defined in the XML
Export API classes. You can define the markup by entering the markup directly in the styles, or populating the styles using the template files. See
“Map Styles” on page for more information on mapping styles to user-defined markup.
Move the DTD
The default output directory for the Verity DTD is programs\tempout . If you move the Verity DTD to another output directory, you must set the string value of pszVerityDTDPath to the new location. This path is added to the document type declaration in the XML file. See
64
•
•
•
•
•
• XML Export SDK C Programming Guide
P ART 2
Use the Export API
This section explains how to perform some basic tasks using the File Extraction and Export APIs, and describes the sample programs. It contains the following chapters:
Part 2 Use the Export API
66
•
•
•
•
•
• XML Export SDK C Programming Guide
C HAPTER 3
Use the File Extraction API
This section describes how to extract sub-files from a container file using the File
Extraction API. It contains the following topics.
Extract Sub Files from Outlook Files
Extract Sub Files from Outlook Express Files
Extract Sub Files from Mailbox Files
Extract Sub Files from Outlook Personal Folders Files
Extract Sub Files from Lotus Domino XML Language Files
Extract Sub Files from Lotus Notes Database Files
Extract Sub Files from PDF Files
Extract Sub Files from ZIP Files
Default Filenames for Extracted Sub Files
XML Export SDK C Programming Guide
•
•
•
•
•
•
67
68
•
•
•
•
•
•
Chapter 3 Use the File Extraction API
Introduction
To convert a file, you must first determine whether the file contains any sub files
(attachments, embedded OLE objects, and so on). A file that contains sub files is called a container file. A container file has a main file (parent) and sub files
(children) embedded in the main file.
The following are examples of container files:
Archive files such as ZIP, TAR, and RAR.
Mail messages such as Outlook (MSG) and Outlook Express (EML).
Mail stores such as Microsoft Outlook Personal Folders (PST), Mailbox
(MBX), and Lotus Notes database (NSF).
PDF files containing file attachments.
Compound documents with embedded OLE objects such as a Microsoft Word document with an embedded Excel chart.
NOTE
“Supported Formats” on page indicates which
formats are treated as container files and are supported by the
File Extraction API.
The sub files may also be container files, creating a file hierarchy of multiple levels. For example, let us say an MSG file (the root parent) contains three attachments:
a Microsoft Word document containing an embedded Microsoft Excel spreadsheet.
an AutoCAD drawing file (DWG).
an EML file with an attached Zip file, which in turn contains four archived files.
XML Export SDK C Programming Guide
shows the file’s hierarchy.
Figure 5 Example Container File Tree Structure
Extract Sub Files
NOTE The parent MSG file contains four first-level children.
The body text of a message file, although not a standalone file in the container, is considered a child of the parent file.
Extract Sub Files
To convert all files in a container file, the container must be opened and its sub files extracted using the File Extraction API. The extraction process is done repeatedly until all sub files are extracted and exposed for conversion. Once a sub file is extracted, you can call Export API functions to convert the file.
If you require a container file, including sub files, to be converted to a single file, you must extract all files from the container, convert the files, and then append each converted output to its parent.
To extract sub files, follow this general procedure
1. Pass the context pointer from fpInit() and the address of a structure containing pointers to the File Extraction API functions in the call to
KVGetExtractInterface() . See.
“KVGetExtractInterface()” on page .
2. Declare the input stream or filename in the KVOpenFileArg structure. See
XML Export SDK C Programming Guide
•
•
•
•
•
•
69
Chapter 3 Use the File Extraction API
70
•
•
•
•
•
•
3. Open the source file by calling fpOpenFile() and passing the
KVOpenFileArg structure. This call defines the parameters necessary to
open a file for extraction. See “fpOpenFile()” on page .
4. Determine whether the source file is a container file (contains sub files) by calling fpGetMainFileInfo(). See
“fpGetMainFileInfo()” on page .
5. If the call to fpGetMainFileInfo() determined the source file is a container file, proceed to
Step 6 ; otherwise, convert the file.
6. Determine whether the sub file is itself a container (contains sub files) by calling fpGetSubFileInfo(). See
“fpGetSubFileInfo()” on page .
7. Extract the sub file by calling fpExtractSubFile(). See
“fpExtractSubFile()” on page .
8. If the call to fpGetSubFileInfo() determined the sub file is a container file,
through Step 7 until all sub files are extracted and the lowest
level of sub files is reached; otherwise, convert the file.
Recreate a File’s Hierarchy
When a container file is extracted, any relationships between the sub files in the container are not maintained. However, the File Extraction interface provides information that enables you to recreate the hierarchy. The hierarchy can be used to create a directory structure in a file system, or to categorize documents according to their relationship to each other. For example, if you use KeyView to generate text for a search engine, the hierarchical information enables your users to search for a document based on the document’s parent or sibling. In addition, when the document is returned to the user, the parent and sibling documents can be returned as recommendations.
The information needed to recreate a file’s hierarchy is provided in the call to fpGetSubFileInfo()
. See “fpGetSubFileInfo()” on page . The members
KVSubFileInfo->parentIndex and KVSubFileInfo->childArray provide information about a sub file’s parent and children. Since you can only retrieve the first-level children in the sub file, you must call fpGetSubFileInfo() repeatedly until information for the leaf-node children is extracted.
Create a Root Node
Because of their structure, some container files do not contain a sub file or folder which acts as a root directory on which the hierarchy can be based. For example, sub files in a Zip archive can be extracted, but none of the sub files represent the root of the hierarchy. In this case, an artificial root node must be created at the top
XML Export SDK C Programming Guide
Recreate a File’s Hierarchy of the file hierarchy as a point of reference for each child, and ultimately to recreate the relationships. This artificial root node is an internal object, and is extracted to disk as a directory called root. Its index number is 0.
To create the root node, set openFlag to KVOpenFileFlag_CreateRootNode in the call to fpOpenFile(). See
“fpOpenFile()” on page . When a root node
is created, the value of numSubFiles in KVMainFileInfo includes the root node (see
). For example, when you call fpGetMainFileInfo() on a Microsoft Word document with three embedded
OLE objects and the root node is disabled, numSubFiles is 3. If you create a root node, numSubFiles is 4.
Recreate a File’s Hierarchy—Example
For example, let us say we extract a PST file containing seven sub files with a root node enabled. The call to fpGetMainFileInfo()returns the number of sub files as 8 (seven sub files and one root node).
shows the structure and the available hierarchy information after the sub files are extracted:
Figure 6 Extracted PST File
XML Export SDK C Programming Guide
•
•
•
•
•
•
71
Chapter 3 Use the File Extraction API
The parentIndex specifies the index number of a sub file’s parent. The childArray specifies an array of a sub file’s children. With this information, you can recreate the hierarchy shown in
Figure 7 Recreated File Hierarchy
72
•
•
•
•
•
•
Extract Mail Metadata
You can extract metadata, such as subject, sender, and recipient, from MSG,
EML, MBX, PST, and NSF files, by calling the fpGetSubFileMetaData() function.
You can extract a pre-defined set of metadata fields and/or individual fields that are unique to a file format.
Default Metadata Set
KeyView internally defines a set of common mail metadata fields that can be extracted as a group from mail formats. This default metadata set is listed in
. When you retrieve all metadata for a file—that is, pass NULL for the array of metadata—the complete set of default metadata, not all available metadata in the file, is returned.
XML Export SDK C Programming Guide
Extract Mail Metadata
Field Name (string to specify) Description
From
Sent
To
Cc
Bcc
Subject
Priority
The display name and e-mail address of the sender.
The time the message was sent.
The display names and email addresses of the recipients.
The display names and email addresses of recipients who receive copies of the email.
The display names and email addresses of recipients who received blind copies of the email.
The text in the subject line of the message.
The priority applied to the message.
Because mail formats use different terms for the same fields, the format’s reader maps the default field name to the appropriate format-specific name. For example, when retrieving the default metadata set, the NSF field Importance is mapped to the name Priority and is returned.
You can also extract the default field names individually by passing the field name
(such as From, To, and Subject); however, in this case, the string is not mapped to the format-specific name. For example, if you pass Priority in the call, you will retrieve the contents of the Priority field from an MBX file, but will not retrieve the contents of the Importance field from an NSF file.
NOTE You cannot pass the field names listed in
individually for PST files. However, you can pass either the MAPI tag number or the MAPI tag name as integers. See
“Microsoft Personal Folders File (PST) Metadata” on page .
Extract the Default Metadata Set
To extract the default metadata set, call the fpGetSubFileMetadata() function, and pass 0 for metaNameCount and NULL for metaNameArray. See
“fpGetSubFileMetaData()” on page .
KVGetSubFileMetaArgRec metaArg;
KVSubFileMetaData pMetaData = NULL;
KVStructInit(&metaArg); metaArg.index = subFileIndex; metaArg.metaNameCount = 0; metaArg.metaNameArray = NULL;
XML Export SDK C Programming Guide
•
•
•
•
•
•
73
74
•
•
•
•
•
•
Chapter 3 Use the File Extraction API error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg,
&pMetaData);
...
extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;
Microsoft Outlook (MSG) Metadata
In addition to the default metadata set, the metadata fields listed in Table 7 can be
extracted for MSG files. The field name must be passed to metaNameArray in the call to the fpGetSubFileMetadata() function.
Field Name (string to specify) Description
AttachFileName
ConversationTopic
CreationTime
InternetMessageID
An attachment's long filename and extension, excluding path.
The topic of the first message in a conversation thread. A conversation thread is a series of messages and replies.
This is the first message’s subject with any prefix removed.
The time the message or attachment was created. This value is displayed in the Sent field in the message’s
Properties dialog in Outlook.
The identifier for messages that come in over the Internet.
This is the MAPI property PR_INTERNET_MESSAGE_ID.
This property is not in the MAPI headers or MAPI documentation.
LastModificationTime The time the message or attachment was last modified.
This value is displayed in the Modified field in the message’s Properties dialog in Outlook.
Location The physical location of the event specified in the Outlook calendar entry.
MessageID
Received
The message transfer system (MTS) identifier for the message transfer agent (MTA). This value is displayed on the Message ID tab in the message’s Properties dialog in
Outlook.
The date and time a message was delivered. This value is displayed in the Received field in the message’s
Properties dialog in Outlook.
XML Export SDK C Programming Guide
Extract Mail Metadata
Field Name (string to specify) Description
Sender
Sensitivity
TransportMsgHeaders
StartDate
EndDate
The name and e-mail address of the message sender. This value is a concatenation of two MAPI properties in the following format:
"PR_SENDER_NAME" <PR_SENDER_EMAIL_ADDRESS>
The Sender value may be the same as or different than the default metadata From value (see
), depending on which MAPI properties exist in the MSG file.
The value indicating the message sender's opinion of the sensitivity of a message. For example, Personal, Private, or Confidential. This value is displayed in the Sensitivity field in the message’s Properties dialog in Outlook.
Contains transport-specific message envelope information. This value corresponds to the MAPI property
PR_TRANSPORT_MESSAGE_HEADERS .
Contains an appointment start date. This value corresponds to the PR_START_DATE MAPI property.
Contains an appointment end date. This value corresponds to the PR_END_DATE MAPI property.
Extract MSG-Specific Metadata
To extract specific metadata fields from an MSG file, call the fpGetSubFileMetadata() function, and pass the field name defined in
to metaNameArray (the string is not case sensitive). See
“fpGetSubFileMetaData()” on page .
For example, the following code extracts the contents of the
ConversationTopic and MessageID fields:
KVGetSubFileMetaArgRec metaArg;
KVSubFileMetaData pMetaData = NULL;
KVStructInit(&metaArg);
KVMetaNameRec names[2];
KVMetaName pname[2]; names[0].type = KVMetaNameType_String; names[0].name.sname = “conversationtopic”; names[1].type = KVMetaNameType_String; names[1].name.sname = “MessageID”; pname[0] = &names[0]; pname[1] = &names[1];
XML Export SDK C Programming Guide
•
•
•
•
•
•
75
Chapter 3 Use the File Extraction API
76
•
•
•
•
•
• metaArg.metaNameCount = 2; metaArg.metaNameArray = pname; metaArg.index = subFileIndex; error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg,
&pMetaData);
...
extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;
Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata
In addition to the default metadata set, you can extract any metadata field that exists in the header of an EML or MBX file by passing the field’s name. If the name is a valid field in the file, the contents of the field is returned. For example, to retrieve the name of the last mail server that received the message before it was delivered, you can pass the string “Received”.
Extract EML- or MBX-Specific Metadata
To extract specific metadata fields from an EML or MBX file, call the fpGetSubFileMetadata() function, and pass the metadata name to metaNameArray (the string is not case sensitive). See
“fpGetSubFileMetaData()” on page .
For example, the following code extracts the contents of the Received and
Mime-version fields:
KVGetSubFileMetaArgRec metaArg;
KVSubFileMetaData pMetaData = NULL;
KVStructInit(&metaArg);
KVMetaNameRec names[2];
KVMetaName pname[2]; names[0].type = KVMetaNameType_String; names[0].name.sname = “Received”; names[1].type = KVMetaNameType_String; names[1].name.sname = “Mime-version”; pname[0] = &names[0]; pname[1] = &names[1]; metaArg.metaNameCount = 2; metaArg.metaNameArray = pname; metaArg.index = subFileIndex; error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg,
&pMetaData);
...
extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;
XML Export SDK C Programming Guide
Extract Mail Metadata
Lotus Notes Database (NSF) Metadata
In addition to the default metadata set, you can extract any Lotus field name that exists in an NSF file by passing the field’s name. (You can extract fields from mail
NSF files and non-mail NSF files.) If the name is a valid field in the file, the field is returned. For example, to retrieve the date a document in an NSF file was last accessed, you would pass the string “$LastAccessedDB”.
NOTE A complete list of NSF fields are provided in the Lotus
Notes file stdnames.h. This header file is available in the Lotus
API Toolkit.
Extract NSF-Specific Metadata
To extract specific metadata fields from an NSF file , call the fpGetSubFileMetadata() function, and pass the metadata name to metaNameArray (the string is not case sensitive). See
“fpGetSubFileMetaData()” on page .
For example, the following code extracts the contents of the Description and
Categories fields:
KVGetSubFileMetaArgRec metaArg;
KVSubFileMetaData pMetaData = NULL;
KVStructInit(&metaArg);
KVMetaNameRec names[2];
KVMetaName pname[2]; names[0].type = KVMetaNameType_String; names[0].name.sname = “description”; names[1].type = KVMetaNameType_String; names[1].name.sname = “Categories”; pname[0] = &names[0]; pname[1] = &names[1]; metaArg.metaNameCount = 2; metaArg.metaNameArray = pname; metaArg.index = subFileIndex; error = extractInterface->fpGetSubFileMetaData(pFile, &metaArg,
&pMetaData);
...
extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;
XML Export SDK C Programming Guide
•
•
•
•
•
•
77
78
•
•
•
•
•
•
Chapter 3 Use the File Extraction API
Microsoft Personal Folders File (PST) Metadata
In addition to the default metadata set, you can extract Messaging Application
Programming Interface (MAPI) properties from a PST file. These properties describe all elements of an Outlook item in a PST file (such as subject, sender, recipient, and message text). Since the properties are stored in the PST file itself, they can be retrieved before the contents of the PST are extracted. This enables you to determine whether an Outlook item should be extracted based on its attributes. Some MAPI properties are also stored for Outlook attachments that are
not mail messages (such as an attached Microsoft Word document or Lotus 1-2-3 file).
NOTE Since all elements of a message (except non-mail attachments) are represented by MAPI properties, you can extract all components of a sub file, including the header and message text, by calling the fpGetSubFileMetadata() function.
MAPI Properties
Each MAPI property is identified by a property tag, which is a constant containing the property type and a unique identifier. For example, the property that indicates whether a message has attachments has the following components:
Property
Identifier
PR_HASATTACH
0x0E1B
Property type PT_BOOLEAN (000B )
Property tag 0x0E1B000B
The Microsoft MAPI documentation on the Microsoft Developer Network Web site lists all available MAPI properties, their tags, and types.
You can retrieve any MAPI property that is of one of the MAPI property types listed below:
PT_I2
PT_I4
PT_BINARY
PT_BOOLEAN
PT_DOUBLE
PT_FLOAT
PT_LONG
PT_SHORT
PT_STRING8
PT_TSTRING
PT_SYSTIME
PT_UNICODE
XML Export SDK C Programming Guide
Extract Mail Metadata
NOTE Properties with a PT_TSTRING type have the property type recompiled to either a Unicode string (PT_UNICODE) or to an ANSI string
(PT_STRING8) depending on the operating system’s character set. To retrieve the Unicode property, pass in the Unicode version of the tag. For example, the property tag for PR_SUBJECT is either 0x0037001E for an
ANSI string, or 0x0037001F for a Unicode string.
Extract PST-Specific Metadata
In the call to extract sub file metadata, you can pass either the MAPI tag number
(such as 0x0070001e) or the MAPI tag name (such as
PR_CONVERSATION_TOPIC ). If you specify the MAPI tag name, you must include the Windows header files mapitags.h and mapidefs.h in which the MAPI tag name is defined as a tag number.
To extract specific MAPI properties from a PST file, call the fpGetSubFileMetadata() function, and pass the property tag to metaNameArray . See
“fpGetSubFileMetaData()” on page . The tag is passed
as an integer.
For example, the following code extracts the MAPI properties PR_SUBJECT and
PR_ALTERNATE_RECIPIENT :
KVGetSubFileMetaArgRec metaArg;
KVSubFileMetaData pMetaData = NULL;
KVMetaNameRec names[2];
KVMetaName pName[2]; names[0].type = KVMetaNameType_Integer; names[0].name.iname = PR_SUBJECT; names[1].type = KVMetaNameType_Integer; names[1].name.iname = 0x3A010102; pName[0] = &names[0]; pName[1] = &names[1];
KVStructInit(&metaArg); metaArg.metaNameCount = 2; metaArg.metaNameArray = pName; metaArg.index = SubFileIndex; error = extractInterface->fpGetSubFileMetaData
(pFile,&metaArg,&pMetaData);
...
extractInterface->fpFreeStruct(pFile,pMetaData); pMetaData = NULL;
XML Export SDK C Programming Guide
•
•
•
•
•
•
79
Chapter 3 Use the File Extraction API
80
•
•
•
•
•
•
NOTE You must include the Windows header files mapitags.h
and mapidefs.h in which PR_SUBJECT is defined as 0x0037001E.
Exclude Metadata from the Extracted Text File
When a mail message is extracted, its message text and header information (To,
From , Sent, and so on) are also extracted. You can prevent the header information from appearing in the text file.
To exclude the header information, set the flag extractFlag to
KVExtractionFlag_ExcludeMailHeader in the call to fpExtractSubFile()
. See “fpExtractSubFile()” on page .
Extract Sub Files from Outlook Files
When an Outlook file (MSG) is extracted to disk, it’s message text and header information (To, From, Sent, and so on) are extracted to a text file. (If you do not want the header information to appear in the text file, see
“Exclude Metadata from the Extracted Text File” on page
.) If the Outlook file contains a non-mail attachment, the attachment is extracted in its native format to a sub directory. If
Outlook file contains a mail attachment, the attachment’s message text is extracted to a sub directory.
Extract Sub Files from Outlook Express Files
When an Outlook Express (EML) file is extracted to disk, its message text and header information (To, From, Sent, and so on) are extracted to a text file. (If you
do not want the header information to appear in the text file, see “Exclude
Metadata from the Extracted Text File” on page
.) If an Outlook file contains a non-mail attachment, the attachment is extracted in its native format to the same directory as the message text file. If the Outlook file contains a mail attachment, the complete attachment (including message text and attachments), message text file, and non-mail attachment(s) are extracted to a the same directory as the main message.
NOTE When the MBX reader (mbxsr) is enabled, it is used to filter MBX and EML files. If the MBX reader is not enabled, the
EML reader (emlsr) is used.
XML Export SDK C Programming Guide
Extract Sub Files from Mailbox Files
Extract Sub Files from Mailbox Files
A Mailbox (MBX) file is a collection of individual emails compiled with RFC 822 and RFC 2045 - 2049 (MIME), and divided by message separators. There are many mail applications that export to an MBX format, such as Eudora Email and
Mozilla Thunderbird.
When an MBX file is extracted to disk, the message text and header information
(To, From, Sent, and so on) from each mail file are extracted to text files. (If you
do not want the header information to appear in the text file, see “Exclude
Metadata from the Extracted Text File” on page
.)
In Eudora MBX files, attachments are inserted as a link and are stored externally from the message. These attachments are not extracted, but the path to the attachment is returned in the call to the fpGetSubFileInfo() function
(
“fpGetSubFileInfo()” on page ). You can write code to retrieve the attachment
based on the returned path.
For MBX files from other clients, KeyView extracts attachments when they are embedded in the message.
NOTE The Mailbox (MBX) reader is an advanced feature and is sold and licensed separately. To enable this reader in a KeyView SDK, you must obtain the appropriate license key from Autonomy. See
for information on adding a new license key to an existing installation.
Extract Sub Files from Outlook Personal Folders
Files
KeyView can extract Outlook items such as messages, appointments, contacts, tasks, notes, and journal entries from a PST file. When a PST file is extracted to disk, the text and header information (To, From, Sent, and so on) from each
Outlook item are extracted to a text file. (If you do not want the header information to appear in the text file, see
“Exclude Metadata from the Extracted Text File” on page .)
You can also extract messages from PST files as MSG files, including all their attachments, by setting the KVExtractionFlag_SaveAsMSG flag in the
KVExtractSubFileArg structure when calling fpExtractSubFile(). See
“KVExtractSubFileArg” on page .
XML Export SDK C Programming Guide
•
•
•
•
•
•
81
82
•
•
•
•
•
•
Chapter 3 Use the File Extraction API
If an Outlook item contains a non-mail attachment, the attachment is extracted in its native format to a sub directory. If an Outlook item contains an Outlook attachment, the attached item’s text and attachment(s) are extracted to a sub directory.
NOTE The Microsoft Outlook Personal Folders (PST) reader is an advanced feature and is sold and licensed separately. To enable this reader in a KeyView SDK, you must obtain the appropriate license key from
Autonomy. See
“Update License Information” on page for information on
adding a new license key to an existing installation.
Use the Native or MAPI-based Reader
KeyView accesses PST files in one of two ways:
indirectly using the Microsoft’s Messaging Application Programming Interface
(MAPI) reader named pstsr.
directly using the native PST reader named pstnsr.
On UNIX and Windows x64 and IA-64, the native reader is always used to process PST files because the MAPI-based reader only runs on Windows x86. On
Windows x86, you can specify either reader, however, the MAPI-based reader is used by default. The differences between the two readers are summarized in the following table:
Feature/Requirement
All platforms supported
Outlook client required
MAPI properties supported
Password-protection supported
Compressible encryption supported
High encryption supported
Yes
No
Native Reader
(pstnsr)
Yes
No
Yes
All properties defined in mapitags.h
. Object properties are not supported.
Yes
MAPI-based Reader
(pstsr)
Windows only
Yes
Yes.
All properties defined in mapitags.h
. Object properties are not supported.
Yes (using
KVCredential structure)
Yes
Yes
XML Export SDK C Programming Guide
Extract Sub Files from Outlook Personal Folders Files
To specify the MAPI-based reader be used for PST files, change the PST entry in the formats_e.ini file as follows:
297=pst
To specify the native reader be used for PST files, change the PST entry in the formats_e.ini
file as follows:
297=pstn
NOTE You must ensure the PST you are extracting is not open in the Outlook client and the Outlook process is not running.
Use the Native PST Reader (pstnsr)
The native PST reader accesses PST files directly without relying on the Microsoft interface to the PST format. It runs on both Windows and UNIX and does not require an Outlook client to be installed on the system processing the PST files.
However, the native reader does not support password-protected PST files that use high encryption.
Use the MAPI Reader (pstsr)
The pstsr reader accesses PST files indirectly using Microsoft’s Messaging
Application Programming Interface (MAPI). MAPI is a standard Windows message interface that enables different mail programs and other mail-aware applications (such as word processors and spreadsheets) to exchange messages and attachments with each other. MAPI allows KeyView to open a PST file, traverse the folders and Outlook items, and extract the items inside the PST file.
NOTE When extracting sub files from PST files, information on the distribution list used in an e-mail is extracted to a file called
emailname.dist
. This applies to the MAPI reader (pstsr) only.
System Requirements
Since MAPI is only supported on Windows platforms, you can only convert PST files on Windows. And since MAPI relies on functionality in Microsoft Outlook, a
Microsoft Outlook client must be installed on the same machine as the application converting PST files, and must be the default e-mail application. KeyView supports the following PST formats and Outlook clients:
Outlook 97 or higher PST files
Outlook 2002 or Outlook 2003 clients
XML Export SDK C Programming Guide
•
•
•
•
•
•
83
84
•
•
•
•
•
•
Chapter 3 Use the File Extraction API
NOTE The Outlook client must be the same version as or newer than the version of Outlook that generated the PST file.
MAPI Attachment Methods
The way in which the contents of a PST message attachment can be accessed is determined by the MAPI attachment method applied to the attachment. For example, if the attachment is an embedded OLE object, then it uses the
ATTACH_OLE attachment method. KeyView can access message attachments that use the following attachment methods:
ATTACH_BY_VALUE
ATTACH_EMBEDDED_MSG
ATTACH_OLE
ATTACH_BY_REFERENCE
ATTACH_BY_REF_ONLY
ATTACH_BY_REF_RESOLVE
Attachments using the ATTACH_BY_VALUE, ATTACH_EMBEDDED_MSG, or
ATTACH_OLE attachment methods are extracted automatically when the PST file is extracted. An “attach by reference” method means the attachment is not in
Outlook, but Outlook contains an absolute path to the attachment. Before you can extract these types of attachments, you must retrieve the path to access the attachment.
To extract “attach by reference” attachments
1. Determine whether the attachment uses an ATTACH_BY_REFERENCE,
ATTACH_BY_REF_ONLY , or ATTACH_BY_REF_RESOLVE method by retrieving the MAPI property PR_ATTACH_METHOD.
2. If the attachment uses one of the “attach by reference” methods, get the fully qualified path to the attachment by retrieving the MAPI properties
PR_ATTACH_LONG_PATHNAME or PR_ATTACH_PATHNAME.
3. You can then either copy the files from their original location to the path where the PST file is extracted, or use the Export API functions to convert the attachment.
XML Export SDK C Programming Guide
Extract Sub Files from Lotus Domino XML Language Files
Open Secured PST Files
KeyView enables you to specify credentials (user name and password), which are
Detect PST Files While the Outlook Client is Running
If you are running an Outlook client while running the File Extraction API, the
KeyView format detection module (kwad) may not be able to open the PST file to determine the file’s format because Outlook has the file locked. In this case, you may do one of the following:
Close Outlook when using the Extraction API
Detect PST files by extension only and bypass the format detection module.
To enable this option, add the following lines to the formats_e.ini file.
[container_flags] detectPSTbyExtension=1
NOTE The detectPSTbyExtension option only applies when you are using the MAPI reader (pstsr).
NOTE If you use this option, you must ensure in your code that valid PST files are passed to KeyView because the format detection module will not be available to verify the file type and pass the file to the appropriate reader.
Extract Sub Files from Lotus Domino XML
Language Files
When a Lotus Domino XML Language (.DXL) file is extracted, its message text and header information (To, From, Sent, and so on) are extracted to a text file.
NOTE To prevent header information from being extracted, see
“Exclude Metadata from the Extracted Text File” on page
.
You can ensure that dates and times extracted from Lotus Domino .DXL files are displayed in a uniform format.
XML Export SDK C Programming Guide
•
•
•
•
•
•
85
86
•
•
•
•
•
•
Chapter 3 Use the File Extraction API
To extract custom date/time formats
In the formats_e.ini file, set the DateTimeFormat option in the [dxlsr] section. For example:
[dxlsr]
DateTimeFormat=%m/%d/%Y %I:%M:%S %p
In this example, dates and times are extracted in the following format:
02/11/2003 11:36:09 AM
The format arguments are the same as those for the strftime() function.
Refer to the following Web page for more information.
http://msdn.microsoft.com/en-us/library/fe06s4ak%28VS.71%29.aspx
Extract Sub Files from Lotus Notes Database Files
A Lotus Notes database is a single file that contains multiple documents called
notes. Notes include design notes (such as forms, views, folders, navigators, outlines, pages, framesets, agents, and resources), data document notes, profile document notes, access control list notes, and collection (index) notes. KeyView can extract text items, attachments, and OLE objects from data document notes only. Data document notes include emails, journal entries, discussion threads, documents (Microsoft Office and Lotus SmartSuite), and so on.
All components of a note are prefixed by field names such as “SendTo:”,
“Subject:”, and “Body:”. When a note is extracted, the field names are not included in the extracted output; only the field values are extracted.
When a mail message in an NSF file is extracted to disk, the body text and header information, such as the values from the SendTo, From, and DeliveredDate fields, in each message is extracted to a text file. (If you do not want the header
information to appear in the message text file, see “Exclude Metadata from the
.)
NOTE The Lotus Notes Database (NSF) reader is an advanced feature and is sold and licensed separately. To enable this reader in a KeyView
SDK, you must obtain the appropriate license key from Autonomy. See
“Update License Information” on page for information on adding a new
license key to an existing installation.
XML Export SDK C Programming Guide
Extract Sub Files from Lotus Notes Database Files
System Requirements
The NSF format is proprietary. Therefore, KeyView accesses NSF files indirectly using Lotus Notes API. Since the NSF reader relies on functionality in Lotus
Notes, a Lotus Notes client or Lotus Domino server must be installed and configured on the same machine as the application converting NSF files. On UNIX and Linux, the Lotus Domino server is required. On Windows, the Lotus Notes client or Lotus Domino server is required.
KeyView supports the following Lotus Notes clients and Domino servers:
Lotus Notes 6.5.1
Lotus Domino 6.5.1
KeyView supports NSF files on the same platforms supported by Lotus Notes and
Lotus Domino:
Windows XP x86 (Service Pack 1 and 2)
Windows 2000 x86 (Service Pack 2)
Solaris 8.0 and 9.0 (built on Solaris 8.0)
Red Hat Enterprise Linux AS 3.0 (x86)
SuSE Linux Enterprise Server 8 and 9 (x86)
IBM AIX 5.1, 5L version 5.2
Installation and Configuration
Before KeyView can convert NSF files, you must set up the Lotus Notes client or
Lotus Domino server. Full configuration is not required. The following steps outline the minimal setup for NSF conversion:
Windows
1. Install the Lotus Notes client or Lotus Domino server. You do not need to configure the client or server.
2. Ensure the file notes.ini is in the proper location.
If Lotus Notes is installed, the file should appear in the install\lotus\ notes directory, where install is the installation directory.
If only Lotus Domino is installed, the file should appear in the install\ lotus\domino directory, where install is the installation directory.
If the file does not exist, create an ASCII file named notes.ini, and add the following text:
[Notes]
XML Export SDK C Programming Guide
•
•
•
•
•
•
87
88
•
•
•
•
•
•
Chapter 3 Use the File Extraction API
3. Add the KeyView bin directory and the install\lotus\notes or
install\lotus\domino directory to the PATH environment variable (the
KeyView bin directory must be first in the path). It is recommended you add the KeyView bin directory because the Lotus Notes or Domino server installation may contain older KeyView OEM libraries.
Solaris
1. Install Lotus Domino server. You do not need to configure the server.
2. Ensure the file notes.ini is in the install/lotus/notes/latest/ sunspa directory, where install is the directory where Lotus Notes is installed. If the file does not exist, create an ASCII file named notes.ini, and add the following text:
[Notes]
3. Add the install/lotus/notes/latest/sunspa directory to the PATH environment variable: setenv PATH install/lotus/notes/latest/sunspa:$PATH
4. Add the install/lotus/notes/latest/sunspa and the KeyView bin directory to the LD_LIBRARY_PATH environment variable: setenv LD_LIBRARY_PATH keyview_bin:install/lotus/notes/latest/ sunspa:$LD_LIBARY_PATH where keyview_bin is the location of the KeyView bin directory. It is recommended you add the KeyView bin directory because the Lotus Notes installation may contain older KeyView OEM libraries.
AIX 5.x
1. Install the bos.iocp.rte file set if it is not already installed, and reboot the machine. See the Lotus Domino server documentation for more information.
2. Install Lotus Domino server. You do not need to configure the server.
3. Ensure the file notes.ini is in the install/lotus/notes/latest/ ibmpow directory, where install is the directory where Lotus Notes is installed. If the file does not exist, create an ASCII file named notes.ini, and add the following text:
[Notes]
4. Add the install/lotus/notes/latest/ibmpow directory to the PATH environment variable: setenv PATH install/lotus/notes/latest/ibmpow:$PATH
5. Add the install/lotus/notes/latest/ibmpow and the KeyView bin directory to the LIBPATH environment variable:
XML Export SDK C Programming Guide
Extract Sub Files from Lotus Notes Database Files setenv LIBPATH keyview_bin:install/lotus/notes/latest/ ibmpow:$LIBPATH where keyview_bin is the location of the KeyView bin directory. It is recommended you add the KeyView bin directory because the Lotus Notes installation may contain older KeyView OEM libraries.
Linux
1. Install Lotus Domino server. You do not need to configure the server.
2. Ensure the file notes.ini is in the install/lotus/notes/latest/ linux directory, where install is the directory where Lotus Notes is installed. If the file does not exist, create an ASCII file named notes.ini, and add the following text:
[Notes]
3. Add the install/lotus/notes/latest/linux directory to the PATH environment variable: setenv PATH install/lotus/notes/latest/linux:$PATH
4. Add the install/lotus/notes/latest/linux and the KeyView bin directory to the LD_LIBRARY_PATH environment variable: setenv LD_LIBRARY_PATH keyview_bin:install/lotus/notes/latest/ linux:$LD_LIBRARY_PATH where keyview_bin is the location of the KeyView bin directory. It is recommended you add the KeyView bin directory because the Lotus Notes installation may contain older KeyView OEM libraries.
Open Secured NSF Files
KeyView enables you to specify credentials (user ID file and password) which are used to open a secured NSF file for extraction. See
“Password Protected Files” on page for more information.
Format Note Sub Files
The KeyView NSF reader uses XML templates to format note sub-files. You can customize the templates as required to approximate the look and feel of the original notes as closely as possible. For more information, see
Format Lotus Notes Sub Files” on page
.
XML Export SDK C Programming Guide
•
•
•
•
•
•
89
Chapter 3 Use the File Extraction API
90
•
•
•
•
•
•
Extract Sub Files from PDF Files
KeyView can extract document-level and page-level attachments from a PDF document. Document-level attachments are added by using the Attach A File tool and may include links to or from the parent document or to other file attachments.
Page-level attachments are added as comments by using various tools.
Page-level or comment attachments display the File Attachment icon or the
Speaker icon on the page where they are located.
When a PDF’s attachments are extracted to disk, the attachments are saved in their native format.
Extract Embedded OLE Objects
Embedded OLE objects can be converted in two ways:
Using the File Extraction API, the OLE object is first extracted from the main
file and saved to disk (see “File Extraction API Functions” on page ). It can
then be converted by making a separate conversion call.
Using the XML Export API, the main file is converted to XML and the OLE object is converted to a graphics file that is referenced in the XML file (see
“XML Export API Functions” on page ).
The File Extraction API can extract embedded OLE objects from the following types of documents:
Lotus Notes (DXL)
Microsoft Excel
Microsoft Word
Microsoft PowerPoint
Microsoft Outlook
Microsoft Visio
Microsoft Project
OASIS Open Document
Rich Text Format (RTF)
When an embedded OLE object is extracted from its parent file, the location where the embedded file appears in the original document is not available. The parent and child are extracted as separate files.
XML Export SDK C Programming Guide
Extract Sub Files from ZIP Files
Extract Sub Files from ZIP Files
ZIP files that are not password-protected can be extracted using the general
method (see “Extract Sub Files” on page . However, some ZIP files use
password protection, in which case you must use a different method to enter the required credentials. See
“Password Protected Files” on page for more
information.
Default Filenames for Extracted Sub Files
When a filename is not specified in the call to fpExtractSubFile() (see
“fpExtractSubFile()” on page ) in some cases, a default filename is applied to
the extracted sub file.
Default Filename for Mail Formats
To avoid naming conflicts and problems with long filenames, KeyView applies its own names to the extracted mail items when a name is not supplied in the call to fpExtractSubFile() . A non-mail attachment retains its original filename and extension.
When the contents of a mail store or the message body of a mail message are extracted, the extracted filenames may include the following:
The first valid eight characters of the original folder name or “Subject” line of the mail message. If the “Subject” line is empty, the characters kvext are used, where ext is the format’s extension. For example, the characters would be “kvmsg” for MSG and “kvnsf” for NSF.
The following special characters are considered invalid and are ignored: any non-printing character with a value less than 0x1F angle brackets (< >) double quote (“) asterisk (*) back slash (\) colon (:) forward slash (/) pipe (|) question mark (?)
For notes, the filename is derived from the first 24 characters of the note text.
For contact entries, the filename is derived from the full name of the contact.
XML Export SDK C Programming Guide
•
•
•
•
•
•
91
Chapter 3 Use the File Extraction API
92
•
•
•
•
•
•
The characters _kvn, where n is an integer incremented from 0 for each extracted item.
One of the following extensions:
Type email message calendar appointment contact entry task entry note journal entry distribution list posting note
File Extension
.cal
.cont
.task
.note
.jrnl
.dist
.post
If the type cannot be determined for an MSG or PST file, the file is given
If the type cannot be determined for a NSF file, the file is given a extension.
.tmp
The format of a MAIL file is plain text by default, but can be set to RTF with the KVExtractionFlag_GetFormattedBody flag.
For example, an MSG mail message with the subject line RE: Product roadmap containing the Microsoft Excel attachment release_schedule.xls would be extracted as
RE produ_kv0.mail
release_schedule.xls
If an extracted message contains an embedded OLE object or any attachment that does not have a name, the object or attachment is extracted as _kv#.tmp.
Default Filename for Embedded OLE Objects
KeyView can apply a default name to an extracted embedded OLE object when a name is not supplied in the call to fpExtractSubFile(). When an embedded
OLE object is extracted, the extracted filename may include the following:
The first valid eight characters of the main file. The following special characters are considered invalid and are ignored:
XML Export SDK C Programming Guide
Default Filenames for Extracted Sub Files any non-printing character with a value less than 0x1F angle brackets (< >) double quote (“) asterisk (*) back slash (\) colon (:) forward slash (/) pipe (|) question mark (?)
The characters _kvn, where n is an integer incremented from 0 for each extracted object.
If KeyView can determine the embedded OLE is a Microsoft Office document, the original extension is used. If the file type cannot be determined, the file is given a .tmp
extension.
For example, let us say a Microsoft Word document (sales_quarterly.doc) contains two embedded OLE objects: a Microsoft Excel file called west_region.xls
, and a Bitmap created in the Word document. The embedded objects would be extracted as sales_qu_kv0.xls
sales_qu_kv1.tmp
XML Export SDK C Programming Guide
•
•
•
•
•
•
93
Chapter 3 Use the File Extraction API
94
•
•
•
•
•
• XML Export SDK C Programming Guide
C HAPTER 4
Use the XML Export API
This section describes how to perform some basic tasks using the XML Export
API. It contains the following topics:
Extract File Format Information
Display Vector Graphics on UNIX and Linux
Convert Revision Tracking Information
XML Export SDK C Programming Guide
•
•
•
•
•
•
95
96
•
•
•
•
•
•
Chapter 4 Use the XML Export API
Extract Metadata
When a file format supports metadata, KeyView can extract and process that information. Metadata includes document information fields such as title, author, creation date, and file size. Depending on the file’s format, metadata is referred to in a number of ways: for example, “summary information,” “OLE summary information,” “file information,” and “document properties.”
The metadata in mail formats (MSG and EML) and mail stores (PST, NSF, and
MBX) is extracted differently than other formats. For information on extracting metadata from these formats, see
“Extract Mail Metadata” on page .
NOTE KeyView can only extract metadata from a document if metadata is defined in the document, and the document reader can extract metadata for
the file format. The section “Supported Formats” on page lists the file
formats for which metadata can be extracted. KeyView does not generate metadata automatically from the document contents.
Extract Metadata Using the API
You can extract the metadata at the API level. The API extracts all valid metadata fields that exist in the file.
To extract metadata using the C API
1. Declare a pointer to the
KVSummaryInfoEx
structure. See
.
2. Call the fpGetSummaryInfo()
function. See “fpGetSummaryInfo()” on page .
Extract Metadata Using a Template File
When using a template file, KeyView recognizes two types of metadata: standard and non-standard. Standard metadata includes fields, such as Title, Author, and
Subject. The standard fields are enumerated from 1 to 41 in KVSumType in the header file kvtypes.h
. Non-standard metadata includes any field not listed from 1 to 41 in KVSumType , such as user-defined fields (for example, custom property fields in Microsoft Word documents), or fields that are unique to a particular file type (for example, “Artist” or “Genre” fields in MP3 files). Enumerated types 42 and greater are reserved for non-standard metadata.
XML Export SDK C Programming Guide
Extract Metadata
To extract metadata using a template file
1. Insert metadata tokens in a member of the
KVXMLTemplate
structure in the template files. This defines the point at which the metadata appears in the
XML output.
2. If you are using the $USERSUMMARY or $SUMMARY token, define the szUserSummary
member of the
KVXMLTemplate
structure in the template file.
This determines the markup and tokens generated when these metadata tokens are processed.
3. In your application, read the template file and write the data to the
KVXMLTemplate
structure. See
The following tokens can be used in the template files:
$SUMMARYNN Inserts the data from a specified metadata field. NN is a number from 00 through 33 that is enumerated in KVSumType in kvtypes.h
.
$SUMMARY Inserts the data from valid metadata fields in the range of 0 to 33 using the markup provided in pszUserSummary.
$USERSUMMARY Inserts the data from every valid non-standard metadata field using the markup provided in pszUserSummary.
$CONTENT
$NAME
Inserts the content of the metadata field specified by the $NAME token.
Inserts the name of a the metadata field, such as “Title,” “Author,” or
“Subject.”
Examples
$SUMMARYNN
The following markup displays the contents of the “Title” field at the top of the main XML file: szMainTop=$SUMMARY01
In KVSumType , 01 is the enumerated value for the “Title” metadata field.
$SUMMARY
The following markup extracts all standard fields, and includes them in the first H1
XML block: szFirstH1Start=$SUMMARY szUserSummary=<MetaData name="$NAME" content="$CONTENT" />
XML Export SDK C Programming Guide
•
•
•
•
•
•
97
98
•
•
•
•
•
•
Chapter 4 Use the XML Export API
This example extracts the field name (
$NAME
) and field content (
$CONTENT
) for standard metadata and includes it at the beginning of the first heading level 1 XML block.
The generated XML may look like this:
<MetaData name="CodePage" content="1252" \>
<MetaData name="Title" content="My design document" \>
<MetaData name="Subject" content="design specifications" \>
<MetaData name="Author" content="John Doe" \>
<MetaData name="Keywords" content="" \>
<MetaData name="Comments" content="" \>
<MetaData name="Template" content="Normal.dot" \>
<MetaData name="LastAuthor" content="lchapman" \>
<MetaData name="RevNumber" content="6" \>
<MetaData name="EditTime" content="01/01/1601, 0:08" \>
<MetaData name="LastPrinted" content="14/01/2002, 14:06" \>
<MetaData name="Create_DTM" content="27/08/2003, 10:31" \>
<MetaData name="LastSave_DTM" content="29/08/2003, 14:07" \>
<MetaData name="PageCount" content="1" \>
<MetaData name="WordCount" content="4062" \>
<MetaData name="CharCount" content="23159" \>
<MetaData name="AppName" content="Microsoft Word 9.0" \>
<MetaData name="Security" content="0" \>
<MetaData name="Category" content="software" \>
<MetaData name="LineCount" content="192" \>
<MetaData name="ParCount" content="46" \>
<MetaData name="ScaleCrop" content="FALSE" \>
<MetaData name="Manager" content="" \>
<MetaData name="Company" content="Autonomy" \>
<MetaData name="LinksDirty" content="FALSE" \>
$USERSUMMARY
The following markup extracts non-standard fields, and includes them at the bottom of the main XML file: szMainBottom=$USERSUMMARY szUserSummary=<MetaData name="$NAME" content="$CONTENT" />
This example extracts the field name ( $NAME ) and field content ( $CONTENT ) for non-standard metadata from a document, and includes it at the bottom of the main
XML file.
The generated XML may look like this:
<MetaData name="Telephone number" content="444-111-2222"
<MetaData name="Recorded date" content="07/03/2003, 23:00"
<MetaData name="Source" content="TRUE"
<MetaData name="my property" content="reserved"
XML Export SDK C Programming Guide
Extract File Format Information
Extract File Format Information
Export can detect a file’s format, and report the information to the API, which in turn reports the information to the developer’s application. This feature enables you to apply customized conversion settings based on a file’s format. See
for more information on format detection.
To extract file format information using the C API
1. Declare a pointer to the
KVStreamInfo
data structure. See
2. Call the fpGetStreamInfo()
function. See
.
Convert Character Sets
Export enables you to control the character set of both the input and the output text. This is accomplished by either
setting the source and/or target character set in the API, or basing the input/output on the character set of the document (if the document character set is stored in the document and can be determined by the document reader).
The character sets are enumerated in KVCharSet of kvtypes.h
. Not all character sets can be used to specify the target character set. See
a list of character sets that can be used as a target character set.
Determine the Character Set of the Output Text
To determine the output character set of a converted document, Export considers the following:
Whether the reader can extract the character set from the document. This depends on whether the file format can provide character set information and whether the document actually contains character set information.
The section
indicates the file formats for which character set information can be extracted. If character set information cannot be determined for your document type, you must set the source and/or target character set in the API.
Whether a source character set is set in the API.
XML Export SDK C Programming Guide
•
•
•
•
•
•
99
Chapter 4 Use the XML Export API
NOTE To set the source character set, you must specify a character set and set the bForceSrcCharSet member of the KVXMLOptions structure to TRUE.
Whether a target character set is set in the API.
NOTE To set the target character set, you must specify a character set and set the parameter bForceOutputCharSet member of the KVXMLOptions structure to TRUE.
Guidelines for Character Set Conversion
shows how the output character set is determined when the document character set can be determined:
Figure 8 Document Character Set Can Be Determined
100
•
•
•
•
•
• XML Export SDK C Programming Guide
Convert Character Sets
shows how the output character set is determined when the document character set cannot be determined:
Figure 9 Document Character Set Cannot Be Determined
Examples of Character Set Conversion
The examples below demonstrate possible configurations for mapping character sets and the expected output for each scenario.
XML Export SDK C Programming Guide
•
•
•
•
•
•
101
102
•
•
•
•
•
•
Chapter 4 Use the XML Export API
Document Character Set Can be Determined
For the example in Table 8 , the document is an RTF file. The section
Processing Formats” on page indicates the document character set can be
obtained from this file type. The document character set is Traditional Chinese
(BIG5).
Source charset set
KVCS_GB
Target charset set
KVCS_UTF8
KVCS_GB
--
--
--
KVCS_UTF8
--
Output charset
KVCS_UTF8
Converts GB (Simplified Chinese) to
UTF-8. Output character set is the target character set specified in the API.
KVCS_GB
Converts BIG5 to GB (Simplified Chinese).
Output character set is the source character set specified in the API.
KVCS_UTF8
Converts BIG5 to UTF-8. Output character set is the target character set specified in the API.
KVCS_BIG5
Output character set is the document character set. No conversion.
XML Export SDK C Programming Guide
Convert Character Sets
Document Character Set Cannot be Determined
For the example in Table 9 , the document is an ASCII file. The section
Processing Formats” on page indicates the document character set cannot
be obtained from this file type. The document character set is KVCS_1251 .
Source charset set
KVCS_1252
Target charset set
KVCS_UTF8
KVCS_1252
KVCS_1252
--
--
KVCS_UNKNOWN
--
KVCS_1252
--
Output charset
KVCS_UTF8
Converts KVCS_1252 to KVCS_UTF8.
Output character set is the target character set specified in the API.
KVCS_1252
Output character set is the source character set specified in the API because
KVCS_UNKNOWN cannot be used. No conversion.
KVCS_1252
Output character set is the source character set specified in the API. No conversion.
KVCS_1252
Converts OS code page to KVCS_1252.
Output character set is the target character set specified in the API.
Output character set is OS code page. No conversion.
Set the Character Set During Conversion
You can convert the character set of a file at the time the file is converted.
To specify the source character set for documents from which the document character set cannot be obtained by the reader
1. Set the eSrcCharSet
member of the structure
KVXMLOptions
to one of the character sets enumerated in KVCharSet in kvtypes.h
. See
2. Set the bForceSrcCharSet member of the structure KVXMLOptions to TRUE.
See
.
XML Export SDK C Programming Guide
•
•
•
•
•
•
103
Chapter 4 Use the XML Export API
104
•
•
•
•
•
•
To specify the target character set:
1. Set the eOutputCharSet
member of the
KVXMLOptions
structure to one of the character sets enumerated in KVCharSet in kvtypes.h
. See
2. Set the bForceOutputCharSet member of the structure KVXMLOptions to
TRUE. See “KVXMLOptions” on page .
Set the Character Set During File Extraction from a Container
You can convert the character set of a container sub file at the time the sub file is extracted from the container and before it is converted to XML. This is most often used to set the output character set of a mail message’s body text. See
File Extraction API” on page .
To specify the source character set of a sub file, call the fpExtractSubFile() function, and set the KVExtractSubFileArg->srcCharset argument to any value in the enumerated list in
KVCharSet
of kvtypes.h
. See “fpExtractSubFile()” on page .
To specify the target character set of a sub file, call fpExtractSubFile()
, and set the KVExtractSubFileArg->trgCharSet argument to any value in the enumerated list in
KVCharSet
of kvtypes.h
. See
“fpExtractSubFile()” on page .
Map Styles
Export can map paragraph and character styles in any word processing format that contains styles (such as Microsoft Word, RTF, or Folio Flat File) to user-defined markup. With this feature, you can redact (hide) text in the source document, delete content, or change the overall structure of the output. You can also embed style sheet styles in the output defined in the XML.
To enable style mapping, you must indicate which paragraph and/or character styles are to be mapped, and define the starting and ending markup to be included in the XML output. For example, if the source Microsoft Word document contains the character style “Recipe,” and the content of the style in Microsoft
Word is “Brownies,” you can specify that the starting markup be <recipe> and the ending markup
</recipe>
. This would result in the output XML containing:
<recipe> Brownies </recipe> .
You can also use style mapping to control the look of the XML output by either using a Cascading Style Sheet (CSS) or defining the style directly in the starting markup. For example, if a Word document contains the paragraph style “Colorful”, you can have markup of the form <div class=”rainbow”> inserted at the front
XML Export SDK C Programming Guide
Map Styles of the paragraph and markup of the form
</div>
inserted at the end of the paragraph. “Rainbow” is a CSS style defined in an externally provided CSS file referenced at the top of the XML output.
If you map styles to elements or attributes that are not defined in the DTD, you must add the new elements or attributes to the DTD. You must also ensure the new markup is defined in the API, either by entering the markup directly in the classes, or populating the classes using the template files.
Use the C API
To map styles using the C API
1. Define the KVStyle
structure. See “KVStyle” on page . The information in
this structure includes:
the markup to be added to the beginning and end of a paragraph or character style.
the name of the word processing style (for example, “Heading 1”) to which style mapping applies. Style names are case sensitive.
the flag which defines instructions on how to process the content associated with a paragraph or character style. The flags are defined in kvtypes.h
and described in
2. Call the fpSetStyleMapping()
function. See “fpSetStyleMapping()” on page .
Use a Template file
To map styles using a template file
1. Use the
KVStyle
parameter to specify how many styles are being mapped.
For example, if there are nine mapped heading levels, add the following:
[KVStyle]
NumStyles=9
2. For each style, there must be a [Style X ] entry that contains the markup that appears at the start and end of the defined style. For example, the first heading level is defined as follows:
[Style1]
StyleName=Colorful
MarkUpStart=<div class="colorful">
MarkUpEnd=<!-- end of colorful --></div>
These values are used in StyleName , MarkUpStart , and MarkUpEnd in the
KVStyle
structure. See
XML Export SDK C Programming Guide
•
•
•
•
•
•
105
106
•
•
•
•
•
•
Chapter 4 Use the XML Export API
3. For each style, define the flag that applies. Flags define instructions on how to process the content associated with a paragraph or character style. They are defined in kvtypes.h
and described in
used in dwflags of the KVStyle structure. See
value associated with each flag is a hexadecimal number. You can set an option by either entering the converted decimal value or entering the flag’s text.
Flags=0
A finished entry in a template file could look like this:
[KVStyle]
NumStyles=3
[Style1]
StyleName=Colorful
MarkUpStart=<div class="Colorful">
MarkUpEnd=<!-- End of Colorful --></div>
Flags=0
[Style2]
StyleName=RedactPara
MarkUpStart=<div class="RedactPara">
MarkUpEnd=<!-- End of RedactPara --></div>
Flags=2048
[Style3]
StyleName=Code
MarkUpStart=<pre>
MarkUpEnd=<!-- End of Code --></pre>
Flags=KVSTYLE_PRE
XML Export SDK C Programming Guide
Map Styles
Flag Description
KVSTYLE_PRE
KVSTYLE_HEADING[1-6]
The KVSTYLE_PRE flag specifies that white space should be preserved (treated as characters, not word separators), and that mode changes, such as changes in font size within a paragraph, should be ignored. This allows the tags <pre> and </pre> to be used.
The flags KVSTYLE_HEADING[1-6] specify that a given style is to be detected and processed as a heading. Heading flags are exclusive. This means a style cannot be processed as both H1 and
H2.
By default, Export maps the heading style “Heading 1” to <h1></ h1> , and so on, for heading levels 1 through 6. If you use style mappings, the default mapping is overridden. Therefore, you must supply markup for all heading levels. Export uses heading levels to define the overall structure of the XML output.
KVSTYLE_ORDERLIST The KVSTYLE_ORDERLIST flag specifies that the style should be tagged as an ordered list. Currently not implemented.
KVSTYLE_UNORDEREDLIST The KVSTYLE_UNORDERLIST flag specifies that the style should be tagged as an unordered list. Currently not implemented.
KVSTYLE_DELETECONTENT The KVSTYLE_DELETECONTENT flag specifies that the content associated with the style tag should be deleted from the output.
KVSTYLE_ONCONSECUTIVE
PARAGRAPHS
The KVSTYLE_ONCONSECUTIVEPARAGRAPHS flag specifies that the style should be applied to consecutive paragraphs of the document. If this flag is used, and two or more paragraphs require the same style, the opening and closing tags that normally appear between each paragraph are not generated.
KVSTYLE_REDACT The KVSTYLE_REDACT flag is used to hide sensitive or confidential information in the source document. It specifies that the text associated with the style tag should be replaced in the XML output with a selected character. The default replacement character is “X,” but you can specify a different replacement character by setting cRedact
XML Export SDK C Programming Guide
•
•
•
•
•
•
107
Chapter 4 Use the XML Export API
108
•
•
•
•
•
•
Use Style Sheets
XML is a content-based metalanguage designed to structure data. XML does not include information about how a document should be displayed in a browser. To view an XML document in a browser, information about how its displayed must be provided by style sheets. These are coded using either Cascading Style Sheets
(CSS) or Extensible Stylesheet Language (XSL).
The style sheet options are enumerated in KVXMLStyleSheetType .
Use Extensible Style Sheet Language (XSL)
You can use XSL style sheets to specify how XML data is displayed in a browser.
Existing XSL style sheets can be used, but unlike CSS, style sheet information cannot be written to an external XSL file during the conversion.
Both CSS and XSL style sheets can be used to format XML documents. However,
XSL can also transform XML documents. For example, list items can be transformed to display in alphabetical order, words can be replaced by other words, or empty elements can be replaced by text.
To use an existing XSL style sheet
1. Set eStyleSheetType
to
XML_XSL
to enable XSL style sheet mapping.
2. Set bUseExistingStyleSheet to TRUE to apply a pre-existing style sheet to an XML document. Pre-existing style sheets are not validated.
3. Specify the path and filename of the style sheet file in pszStyleSheet .
If bUseExistingStyleSheet
is set to TRUE and pszStyleSheet
is not specified, a default XSL style sheet that is appropriate for the source document type is used.
The following are default XSL style sheets:
wp.xsl
(for word processing documents)
ss.xs
l (for spreadsheets)
pg.xsl
(for presentation graphics)
Use Cascading Style Sheets (CSS)
In addition to XSL style sheets, Export can write style sheet information to an external CSS file. The C sample program xmlini provides an example of how to use an existing style sheet, and output formatting data to an external file. See
XML Export SDK C Programming Guide
Display Vector Graphics on UNIX and Linux
To enable CSS mapping and output the resulting formatting data in an external file
1. Set eStyleSheetType to XML_CSS .
2. Use the
KVXMLSetStyleSheet()
function to set the path and filename of the external style sheet. See
“KVXMLSetStyleSheet()” on page
.
To enable CSS mapping and use an existing CSS file:
1. Set eStyleSheetType
to
XML_CSS
.
2. Set bUseExistingStyleSheet to TRUE to specify a pre-existing style sheet for an XML document.
3. Specify the path and filename of the style sheet file in pszStyleSheet .
If bUseExistingStyleSheet
is set to TRUE and pszStyleSheet
or
SetExternalStyleFile is not specified, a CSS style sheet is created.
NOTE Cascading style sheets can only be used with word processing documents.
Display Vector Graphics on UNIX and Linux
Export offers the option of rasterizing vector graphic content from source documents into a variety of graphics formats including JPEG, PNG, WMF, and
CGM. This solution is implemented with Windows Graphical Device Interface
(GDI) code, and therefore is not portable to other platforms.
The output format of vector graphics is defined by the member eOutputVectorGraphicType
of the structure
KVXMLOptions
, and the options are enumerated in KVXMLGraphicType in kvxml.h
. See
To display vector graphics in presentation, word processing, and spreadsheet files on UNIX and Linux, Export can convert the files directly to JPEG using a Java program named kvraster.class
. This program uses the Java Abstract
Windowing Toolkit (AWT). The AWT requires access to an X Server.
NOTE If you are using KeyView 10.5.0.0 or Java 1.6, you do not have to set up an X Server; however, if you are using a version of KeyView lower than 10.4 with a version of Java lower than 1.6, you must set up an X
Server.
XML Export SDK C Programming Guide
•
•
•
•
•
•
109
Chapter 4 Use the XML Export API
110
•
•
•
•
•
•
To set up an X Server, do one of the following
Run a virtual X Server, such as the Xvfb utility. This utility is included in the
X11R6 distribution or can be downloaded from the following site:
http://www.x.org/Downloads.html
For example, to run the Xvfb utility on a 512 Mb, Solaris 2.8 platform, follow these steps: a. Start Xvfb at root:
/usr/X11R6/bin/Xvfb :1 -screen 0 1152x900x8 & b. Set the display environment variable: setenv DISPLAY:1.0
Make an X display available to the Java runtime using the DISPLAY environment variable. No windows appear on the display. For example, set the
DISPLAY environment variable as follows: setenv DISPLAY computername:0.0
or setenv DISPLAY ipaddress:0.0
After the X Server is set up, the file can be converted.
To convert the file
1. Add the location of the JRE to the
PATH
environment variable.
2. Set eOutputVectorGraphicType to JPEG in the template file or directly in the
API.
3. Convert the document to XML. The graphics in the document are converted to
JPEG and stored in the output directory.
Convert Revision Tracking Information
The revision tracking feature in applications—such as Microsoft Word’s Track
Changes—marks changes to a document (typically, strikethrough for deleted text and underline for inserted text) and tracks each change by reviewer name and date.
If revision tracking was enabled when changes were made to a document, Export can be configured to convert the deleted text and graphics and include revision tracking information in the XML output. (The deleted content and revision tracking information is excluded from the XML output by default.)
XML Export SDK C Programming Guide
Convert Revision Tracking Information
Content that was added to the document is identified by
<ins>
tags. Content that was deleted from the document is identified by <del> tags. The <ins> and <del> tags include cite
and datetime
attributes which define the name of the reviewer who made the change and the date the change was made respectively. (The date is in ISO-8601 format:
YYYY-MM-DDThh:mm:ss
.) The tags also include a title attribute which allows you to display the author and date information in a browser.
These elements are included in the verity.dtd
.
The following markup is generated for inserted text:
<ins title=”Inserted: JohnD, 2006-04-24Tl4:47:00” cite="mailto:JohnD" datetime="2006-04-24T14:47:00">This text was added</ins> in a previous version.
The following markup is generated for deleted text:
<del title=”Deleted: JohnD, 2006-04-24Tl4:56:00” cite="mailto:JohnD" datetime="2006-04-24T14:56:00">This text was deleted</del> in a previous version.
To convert deleted text and graphics and include revision tracking information
1. Call the fpInit() function. See
2. Call the fpXMLConfig()
function with the following arguments (see
Argument Parameter nType nValue pData
KVCFG_INCLREVISIONMARK
TRUE (non-zero)
NULL
For example:
(*fpXMLConfig)(pKVXML, KVCFG_INCLREVISIONMARK, TRUE, NULL);
The xmlini
sample program demonstrates this function. See
3. Call the fpConvertStream()
or
KVXMLConvertFile()
function. See
or
.
XML Export SDK C Programming Guide
•
•
•
•
•
•
111
112
•
•
•
•
•
•
Chapter 4 Use the XML Export API
Convert PDF Files
Export has special configuration options that allow greater control over the conversion of PDF files. These options can improve the accuracy of the XML output.
Convert PDF Files to a Logical Reading Order
The PDF format is primarily designed for presentation and printing of brochures, magazines, forms, reports, and other materials with complex visual designs. Most
PDF files do not contain the logical structure of the original document—the correct reading order, for example, and the presence and meaning of significant elements such as headers, footers, columns, tables, and so on.
KeyView can convert a PDF file by either using the file’s internal unstructured paragraph flow, or by applying a structure to the paragraphs to reproduce the logical reading order of the visual page. Logical reading order enables KeyView to output PDF files containing languages that read from right-to-left (such as Hebrew and Arabic) in the correct reading direction.
NOTE The algorithm used to reproduce the reading order of a PDF page is based on common page layouts. The paragraph flow generated for PDFs with unique or complex page designs may not emulate the original reading order exactly.
For example, page design elements such as drop caps, callouts that cross column boundaries, and significant changes in font size, may disrupt the logical flow of the output text.
Logical Reading Order and Paragraph Direction
By default, KeyView produces an unstructured text stream for PDF files. This means PDF paragraphs are extracted in the order in which they are stored in the file, not the order in which they appear on the visual page. For example, a three-column article could be output with the headers and the title at the end of the output file, and the second column extracted before the first column. Although this output does not represent a logical reading order, it accurately reflects the internal structure of the PDF.
You can configure KeyView to produce a structured text stream that flows in a specified direction. This means PDF paragraphs are extracted in the order (logical reading order) and direction (left-to-right or right-to-left) in which they appear on the page.
XML Export SDK C Programming Guide
Convert PDF Files
The following paragraph direction options are available:
Paragraph
Direction Option
Left-to-right
Right-to-left
Dynamic
Description
Paragraphs flow logically and read from left to right. This option should be specified when most of your documents are in a language using a left-to-right reading order, such as English or
German.
Paragraphs flow logically and read from right to left. This option should be specified when most of your documents are in a language using a right-to-left reading order, such as Hebrew or
Arabic.
Paragraphs flow logically. The PDF reader determines the paragraph direction for each PDF page, and then sets the direction accordingly. When a paragraph direction is not specified, this option is used.
NOTE Conversions may be slower when logical reading order is enabled. For optimal speed, use an unstructured paragraph flow.
The paragraph direction options control the direction of paragraphs on a page; they do not control the text direction in a paragraph. For example, let us say a
PDF file contains English paragraphs in three columns that read from left to right, but 80% of the second paragraph contains Hebrew characters. If the left-to-right logical reading order is enabled, the paragraphs are ordered logically in the output—title paragraph, then paragraph 1, 2, 3, and so on—and flow from the top left of the first column to the bottom right of the third column. However, the text direction of the second paragraph is determined independently of the page by the
PDF reader, and is output from right to left.
NOTE Extraction of metadata is not affected by the paragraph direction setting. The characters and words in metadata fields are extracted in the correct reading direction regardless of whether logical reading order is enabled.
Enable Logical Reading Order
You can enable logical reading order using either the API or the formats_e.ini
file. Setting the direction in the API overrides the setting in the formats_e.ini
file.
XML Export SDK C Programming Guide
•
•
•
•
•
•
113
114
•
•
•
•
•
•
Chapter 4 Use the XML Export API
Use the C API
To enable PDF logical reading order in the C API
1. Call the fpInit() function. See
2. Call the fpXMLConfig()
function with the following arguments (See
Argument Parameter nType nValue pData
KVCFG_LOGICALPDF
Set to one of the following flags which are defined in kvtypes.h. (see
LPDF_LTR—Logical reading order and left-to-right paragraph direction.
LPDF_RTL—Logical reading order and right-to-left paragraph direction.
LPDF_AUTO—Logical reading order. The PDF reader determines the paragraph direction for each PDF page, and then sets the direction accordingly. When a paragraph direction is not specified, this option is used.
LPDF_RAW—Unstructured paragraph flow. This is the default behavior. If logical reading order is enabled, and you want to return to an unstructured paragraph flow, set this flag.
NULL
For example:
(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_RTL, NULL);
The cnv2xml
sample program demonstrates this function. See
3. Call the fpConvertStream()
or
KVXMLConvertFile()
function. See
or
.
Use the formats_e.ini File
The formats_e.ini
file is in the directory install \ OS \bin , where install is the pathname of the Export installation directory and OS is the name of the operating system.
To enable logical reading order using the formats_e.ini
file
1. Change the PDF reader entry in the
[Formats]
section of the formats_e.ini
file as follows:
XML Export SDK C Programming Guide
Convert PDF Files
[Formats]
200=lpdf
2. Optionally, add the following section to the end of the formats_e.ini
file:
[pdf_flags] pdf_direction=paragraph_direction where paragraph_direction is one of the following:
Flag
LPDF_LTR
LPDF_RTL
LPDF_AUTO
LPDF_RAW
Description
Left-to-right paragraph direction
Right-to-left paragraph direction
The PDF reader determines the paragraph direction for each PDF page, and then sets the direction accordingly. When a paragraph direction is not specified, this option is used.
Unstructured paragraph flow. This is the default behavior. If logical reading order is enabled, and you want to return to an unstructured paragraph flow, set this flag.
Control Hyphenation
There are two types of hyphens in a PDF document:
A soft hyphen is added to a word by a word processor to divide the word across two lines. This is a discretionary hyphen and is used to ensure proper text flow in justified text.
A hard hyphen is intentionally added to a word regardless of the word’s position in the text flow. It is required by the rules of grammar and/or word usage. For example, compound words, such as “three-week vacation” and
“self-confident,” contain hard hyphens.
By default, KeyView maintains the source document’s soft hyphens in the output
XML to more accurately represent the source document’s layout. However, if you are using Export to generate text output for an indexing engine or are not concerned with maintaining the document’s layout, it is recommended you remove soft hyphens from the XML output. To remove soft hyphens, you must enable the soft hyphen flag.
NOTE If the soft hyphen flag is enabled, every hyphen at the end of a line is considered a soft hyphen and removed from the XML output. If a hard hyphen appears at the end of a line, it will also be removed. This may result in an intentionally hyphenated word being extracted without a hyphen.
XML Export SDK C Programming Guide
•
•
•
•
•
•
115
Chapter 4 Use the XML Export API
116
•
•
•
•
•
•
To remove soft hyphens from the XML output
1. Call the fpInit()
function. See
2. Call the KVXMLConfig() function, with the following arguments (see
Argument Parameter nType nValue pData
KVCFG_DELSOFTHYPHEN
TRUE (non-zero)
NULL
For example:
(*fpXMLConfig)(pKVXML, KVCFG_DELSOFTHYPHEN, TRUE, NULL);
3. Call the fpConvertStream() or KVXMLConvertFile() function. See
or
.
Improve Performance for PDFs with Many Small Images
To improve performance when converting PDF files containing many small pixel images, you can specify in the formats_e.ini
file the minimum pixel height and width for images that are converted to JPEG. If an image is smaller than the minimum height and width, KeyView does not generate a JPEG file for the image.
For example, to specify that images 16 pixels in height and width and less are not converted, you would add the following to the [pdf_flags] section of the formats_e.ini
:
[pdf_flags] process_images_with_min_height=17 process_images_with_min_width=17
Extract Custom Metadata from PDF Files
To extract custom metadata from your PDF files, add the custom metadata names to the pdfsr.ini
file provided, and copy the modified file to the \bin directory.
You can then extract metadata as you normally would.
The pdfsr.ini
is in the directory samples\pdfini , and has the following structure:
<META>
<TOTAL>total_item_number</TOTAL>,
/metadata_tag_name datatype,
</META>
XML Export SDK C Programming Guide
Convert Spreadsheet Files
Parameter total item number metadata_tag_name datatype
For example:
<META>
<TOTAL> 4 </TOTAL>
/part_number
/volume
INT4
INT4
/purchase_date
/customer
DATETIME
STRING
</META>
Description
The total number of metadata tags that are listed.
The metadata tag name used in the PDF files.
The data type of the metadata field. Data types are defined in KVSumInfoType. See
Convert Spreadsheet Files
Export has special configuration options that allow greater control over the conversion of spreadsheet files.
Convert Hidden Text in Microsoft Excel Files
Normally, Export does not convert hidden text from a Microsoft Excel spreadsheet because it is assumed the text should not be exposed. You can change this default behavior, and convert text in hidden rows, columns, and sheets by adding the following lines to the formats_e.ini
file:
[Options] gethiddeninfo=1
Convert Headers and Footers in Microsoft Excel 2003 Files
Normally, Export does not convert headers and footers from Microsoft Excel 2003 spreadsheets. You can change this default behavior and convert headers and footers by adding the following lines to the formats_e.ini file:
[Options]
ShowHeaderFooter=1
XML Export SDK C Programming Guide
•
•
•
•
•
•
117
118
•
•
•
•
•
•
Chapter 4 Use the XML Export API
Specify Date and Time Format on UNIX Systems
System date and time format information is not stored in Microsoft Excel files. On
Windows systems, you can specify a locale setting to determine the date and time format. However, on UNIX systems, the date and time format is set to the U.S. short date format by default (mm/dd/yyyy). To change the format, you must use a formats_e.ini
option.
To specify the system date and time format on UNIX systems
In the formats.ini file, set the SysDateTime option in the
[LocaleSetting] section. For example:
SysDateTime=%d/%m/%Y
In this example, dates and times are extracted in the following format:
28/02/2008
The format arguments are the same as those for the strftime() function.
Refer to the following Web page for more information. http://linux.die.net/man/3/strftime
Extract Microsoft Excel Formulas
Normally, the actual value of a formula is extracted from an Excel spreadsheet; the formula from which the value is derived is not included in the output. However,
KeyView enables you to include the value as well as the formula in the output. For example, if Export is configured to extract the formula and the formula value, the output may look like this:
245 = SUM(B21:B26)
The calculated value from the cell is
245
and the formula from which the value is derived is SUM(B21:B26) .
NOTE Depending on the complexity of the formulas, enabling formula extraction may result in slightly slower performance.
To set the extraction option for formulas, add the following lines to the formats_e.ini
file:
[Options] getformulastring=option
XML Export SDK C Programming Guide
Convert Spreadsheet Files where option is one of the following:
Option
0
1
2
Description
Extract the formula value only. This is the default.
If formula extraction is enabled, and you want to return to the default, set this option.
Extract the formula only.
Extract the formula and the formula value.
NOTE If a function in a formula is not supported or is invalid, and option 1 or 2 is specified, only the calculated value is extracted. See
list of supported functions.
When formula extraction is enabled, Export can extract Microsoft Excel formulas containing the functions listed in
=ABS()
=ASIN()
=CELL()
=CODE()
=ACOS()
=ATAN2()
=CHAR()
=COLUMN()
=COS()
=DATEVALUE()
=COUNT()
=DAVERAGE()
=DDB() =DMAX()
=DSTDEV() =DSUM()
=EXP()
=FIXED()
=HOUR()
=INDIRECT()
=ISERR()
=ISREF()
=FACT()
=FV()
=ISBLANK()
=INT()
=ISERROR()
=ISTEXT()
=AND()
=ATAN2()
=CHOOSE()
=COLUMNS()
=COUNTA()
=DAY()
=DMIN()
=DVAR()
=AREAS()
=AVERAGE()
=CLEAN()
=CONCATENATE()
=DATE()
=DCOUNT()
=DOLLAR()
=EXACT()
=FALSE()
=GROWTH()
=FIND()
=HLOOKUP()
=IF() =INDEX()
=IPMT() =IRR()
=ISNA()
=LEFT()
=ISNUMBER()
=LEN()
XML Export SDK C Programming Guide
•
•
•
•
•
•
119
120
•
•
•
•
•
•
Chapter 4 Use the XML Export API
=LINEST()
=LOGEST()
=MAX()
=MINUTE()
=MOD()
=NOT()
=OFFSET()
=PPMT()
=RATE()
=ROUND()
=SEARCH()
=SLN()
=SUM()
=TEXT()
=TRANSPOSE()
=TYPE()
=VLOOKUP()
=LN()
=LOOKUP()
=MDETERM()
=MINVERSE()
=LOG()
=LOWER()
=MID()
=MIRR()
=MONTH()
=NOW()
=N()
=NPER()
=OR() =PI()
=PRODUCT() =PROPER()
=REPLACE()
=ROUND()
=SECOND()
=SQRT()
=SYD()
=TIME()
=TREND()
=UPPER()
=WEEKDAY()
=REPT()
=ROW()
=SIGN()
=STDEV()
=LOG10()
=MATCH()
=MIN()
=MMULT()
=NA()
=NPV()
=PMT()
=PV()
=RIGHT()
=ROWS()
=SIN()
=SUBSTITUTE()
=T() =TAN()
=TIMEVALUE() =TODAY()
=TRIM()
=VALUE()
=YEAR()
=TRUE()
=VAR()
Convert XML Files
Export enables you to extract all or selected content from source XML files (see
“Configure Element Extraction for XML Documents” on page ). It detects the
following XML formats:
generic XML
Microsoft Office 2003 XML (Word, Excel, and Visio)
StarOffice/OpenOffice XML (text document, presentation, and spreadsheet)
See
Appendix E for more information on format detection.
XML Export SDK C Programming Guide
Convert XML Files
Configure Element Extraction for XML Documents
When converting XML files, you can specify which elements and attributes are extracted according to the file’s format ID or root element. This is useful when you want to extract only relevant text elements, such as abstracts from reports, or a list of authors from an anthology.
A root element is an element in which all other elements are contained. In the
XML sample below, book
is the root element:
<book>
<title>XML Introduction</title>
<product id="33-657" status="draft">XML Tutorial</product>
<chapter>Introduction to XML
<para>What is HTML</para>
<para>What is XML</para>
</chapter>
<chapter>XML Syntax
<para>Elements must have a closing tag</para>
<para>Elements must be properly nested</para>
</chapter>
</book>
For example, you could specify that when converting files with the root element book
, the element title
is extracted as metadata, and only product
elements with a status attribute value of draft are extracted. When you extract an element, the child elements within the element are also extracted. For example, if you extract the element chapter from the sample above, the child element para is also extracted.
Export defines default element extraction settings for the following XML formats:
generic XML
Microsoft Office 2003 XML (Word, Excel, and Visio)
StarOffice/OpenOffice XML (text document, presentation, and spreadsheet)
These settings are defined internally and are used when converting these file formats; however, you can modify their values.
In addition to the default extraction settings, you can also add custom settings for your own XML document types. If you do not define custom settings for your own
XML document types, the settings for the generic XML are used.
XML Export SDK C Programming Guide
•
•
•
•
•
•
121
122
•
•
•
•
•
•
Chapter 4 Use the XML Export API
Modify Element Extraction Settings
You can modify configuration settings for XML documents through either the API or the kvxconfig.ini
file.
NOTE You can only use customized element extraction settings when converting files in process. When converting out of process, the default extraction settings are used.
Use the C API
You can use the C API to modify the settings for the standard XML document types or add configuration settings for your own XML document types.
To modify settings
1. Call the fpInit() function. See
2. Define the
KVXConfigInfo
data structure. See
.
3. Call the KVXMLConfig() function with the following arguments (see
Argument Parameter nType nValue pData
KVCFG_SETXMLCONFIGINFO
0 address of the KVXConfigInfo structure
For example:
KVXConfigInfo xinfo; /* populate xinfo */
(*fpXMLConfig)(pKVXML, KVCFG_SETXMLCONFIGINFO, 0, &xinfo);
4. Repeat steps 2 and 3 until the settings for all the XML document types you want to customize are defined.
5. Call the function fpConvertStream() or KVXMLConvertFile() . See
or
.
Use an Initialization File
You can use the initialization file to modify the settings for the standard XML document types or add configuration settings for your own XML document types.
To modify settings
1. Modify the kvxconfig.ini
file.
XML Export SDK C Programming Guide
Convert XML Files
2. Use the template file when processing the XML file. See
Extraction Settings in the kvxconfig.ini File” on page .
The sample program ( xmlini
) demonstrates how to use a template file during the conversion process. See
.
Modify Element Extraction Settings in the kvxconfig.ini File
The kvxconfig.ini
file contains default element extraction settings for supported XML formats. The file is in the directory install
\ OS \bin
, where install is the pathname of the Export installation directory and OS is the name of the operating system. For example, the following entry defines extraction settings for the Microsoft Visio 2003 XML format:
[config3] eKVFormat=MS_Visio_XML_Fmt szRoot= szInMetaElement=DocumentProperties szExMetaElement=PreviewPicture szInContentElement=Text szExContentElement= szInAttribute=
The following options are available:
Configuration Option eKVFormat szRoot szInMetaElement
Description
The format ID as detected by the KeyView detection module.
This determines the file type to which these extraction
settings apply. See Appendix E for more information on
format ID values.
If you are adding configuration settings for a custom XML document type, this is not defined.
The file’s root element. When the format ID is not defined, the root element is used to determine the file type to which these settings apply.
To further qualify the element, specify its namespace. See
“Specify an Element’s Namespace and Attribute” on page .
The elements extracted from the file as metadata. All other elements are extracted as text.
Multiple entries must be separated by commas. To further qualify the element, specify its namespace and/or attributes.
See “Specify an Element’s Namespace and Attribute” on page .
XML Export SDK C Programming Guide
•
•
•
•
•
•
123
124
•
•
•
•
•
•
Chapter 4 Use the XML Export API
Configuration Option szExMetaElement
Description
The child elements in the included metadata elements that are not extracted from the file as metadata. For example, the default extraction settings for the Visio XML format extracts the DocumentProperties element as metadata. This element includes child elements such as Title, Subject,
Author , Description, and so on. However, the child element PreviewPicture is defined in szExMetaElement because it is binary data and should not be extracted.
You cannot exclude any metadata elements from the output for
StarOffice files. All metadata is extracted regardless of this setting.
Multiple entries must be separated by commas. To further qualify the element, specify its namespace and/or attributes.
See “Specify an Element’s Namespace and Attribute” on page .
szInContentElement The elements extracted from the file as content text. Enter an asterisk (*) to extract all elements including child elements.
Multiple entries must be separated by commas. To further qualify the element, specify its namespace and/or attributes.
See “Specify an Element’s Namespace and Attribute” on page .
szExContentElement The child elements in the included content elements that are not extracted from the file as content text.
Multiple entries must be separated by commas. To further qualify the element, specify its namespace and/or attributes.
See “Specify an Element’s Namespace and Attribute” on page .
szInAttribute The attribute values extracted from the file. If attributes are not defined here, attribute values are not extracted.
Enter the namespace (if used), element name, and attribute name in the following format:
namespace:elementname@attributename
For example:
Autonomy:division@name
Multiple entries must be separated by commas.
XML Export SDK C Programming Guide
Convert XML Files
Specify an Element’s Namespace and Attribute
To further qualify an element, you can specify that the element exist in a certain namespace and/or contain a specific attribute. To define the namespace and attribute of an element, enter the following:
ns_prefix:elemname@attribname=attribvalue
Attribute values containing spaces must be enclosed in quotation marks.
For example, the following entry: bg:language@id=xml extracts a language
element in the namespace bg
that contains the attribute name id with the value of “xml” . This entry extracts the following element from an XML file:
<bg:language id="xml">XML is a simple, flexible text format derived from SGML</bg:language> but does not extract:
<bg:language id="sgml">SGML is a system for defining markup languages.</bg:language> or
<adv:language id="xml">The namespace should be a Uniform Resource
Identifier (URI).</adv:language>
Add Configuration Settings for Custom XML Document Types
You can define element extraction settings for custom XML document types by adding the settings to the kvxconfig.ini
file. For example, for files containing the root element autonomyxml
, we could add the following section to the end of the initialization file:
[config101] eKVFormat= szRoot=autonomyxml szInMetaElement=dc:title,dc:meta@title,dc:meta@name=title szExMetaElement= szInContentElement=autonomy:division@name=dev,autonomy:division@n ame=export,p@style="Heading 1" szExContentElement= szInAttribute=autonomy:division@name
The custom extraction settings must be preceded by a section heading named
[config N ] , where N is an integer starting at 100 and increasing by 1 for each additional file type, as in
[config100]
,
[config101]
,
[config102]
, and so on.
The default extraction settings for the supported XML formats are numbered config0
to config99
. Currently only
0
to
6
are used.
XML Export SDK C Programming Guide
•
•
•
•
•
•
125
Chapter 4 Use the XML Export API
126
•
•
•
•
•
•
Since a custom XML document type is not recognized by the KeyView detection module, the format ID is not defined. The file type is identified by the file’s root element only.
If a custom XML document type is not defined in the kvxconfig.ini
file or by the
KVXMLConfig()
function, then the default extraction settings for a generic XML document are used.
Show Hidden Data
Microsoft Word, Excel, or PowerPoint documents contain hidden information, some of which is shown by default when exported and some of which is hidden by default. There are several options that allow you to determine exactly which types of hidden data are exported.
Hidden Data in Microsoft Documents
You can show or display four types of hidden data from Microsoft Word, Excel, and PowerPoint documents, each of which has a corresponding flag in the
function, which you can toggle to determine whether the hidden data is shown or not.
lists each data type, its default behavior, and its corresponding configuration API flag.
Hidden Data Type
Microsoft Word
Comments a
Hidden text
Date field codes
File name field codes
Microsoft Excel
Hidden information
Comments
Formulas
Microsoft PowerPoint
Hidden slides
Default Behavior Configuration API Flag
Shown b
Hidden
Calculated date
Document file name
Hidden
Hidden
Calculated value
Shown
KVCFG_WP_NOCOMMENTS
KVCFG_WP_SHOWHIDDENTEXT
KVCFG_WP_SHOWDATEFIELDCODE
KVCFG_WP_SHOWFILENAMEFIELDCODE
KVCFG_SS_SHOWHIDDENINFOR
KVCFG_SS_SHOWCOMMENTS
KVCFG_SS_SHOWFORMULA
KVCFG_PG_HIDEHIDDENSLIDE
XML Export SDK C Programming Guide
Show Hidden Data
Hidden Data Type
Comments
Comments slide
Slide notes e
Default Behavior Configuration API Flag
Shown c
Hidden
Hidden
KVCFG_PG_HIDECOMMENT
KVCFG_PG_SHOWCOMMENTSSLIDE d
KVCFG_PG_SHOWSLIDENOTES a. Word comment settings can also be toggled with a configuration parameter in the formats_e.ini file.
See “Toggle Word Comment Settings in the formats_e.ini File” on page
.
b. Shown by default in Microsoft Word 97 to 2003 documents.
c. Shown by default in Microsoft PowerPoint 97 to 2000 documents.
d. This setting affects PowerPoint 2003 and 2007 only.
e. PowerPoint slide note settings can also be toggled with a configuration parameter in the formats_e.ini
file. See “Toggle PowerPoint Slide Note Settings in the formats_e.ini File” on page .
To toggle the display of any type of hidden data
Use the configuration API and set the third parameter to TRUE or FALSE:
(*fpXMLConfig)(pKVXML, KVCFG_WP_NOCOMMENTS, TRUE, NULL)
In this example, comments will not be exported from Word documents.
NOTE The third parameter affects the default behavior.
To change the default behavior, set it to TRUE.
For more information, see
Toggle Word Comment Settings in the formats_e.ini File
Microsoft Word 97 to 2003 comment settings can also be controlled through a parameter in the formats_e.ini file.
The formats_e.ini file is in the directory install\OS\bin, where install is the pathname of the Export installation directory and OS is the name of the operating system.
To toggle comment output in formats_e.ini
1. Open the formats_e.ini file in a text editor.
2. Under [Options], add the WP_NOCOMMENTS parameter and set it to 0 to show comments or 1 to hide comments. For example:
[Options]
XML Export SDK C Programming Guide
•
•
•
•
•
•
127
128
•
•
•
•
•
•
Chapter 4 Use the XML Export API
WP_NOCOMMENTS=1
NOTE The configuration API flag
KVCFG_WP_NOCOMMENTS overrides the setting in formats_e.ini
.
Toggle PowerPoint Slide Note Settings in the formats_e.ini File
Microsoft PowerPoint slide note settings can also be controlled through a parameter in the formats_e.ini file.
The formats_e.ini file is in the directory install\OS\bin, where install is the pathname of the Export installation directory and OS is the name of the operating system.
To toggle slide note output in formats_e.ini
1. Open the formats_e.ini file in a text editor.
2. Under [Options], add the ShowSlideNotes parameter and set it to 1 to show slide notes or 0 to hide slide notes. For example:
[Options]
ShowSlideNotes=1
NOTE The configuration API flag
KVCFG_PG_SHOWSLIDENOTES overrides the setting in formats_e.ini
.
XML Export SDK C Programming Guide
Show Hidden Data
Show Hidden Data
Microsoft Word, Excel, and PowerPoint documents contain hidden information, some of which is shown by default when exported and some of which is hidden by default. There are several options that allow you to determine which types of hidden data are shown.
Hidden Data in Microsoft Documents
You can show several types of hidden data from Microsoft Word, Excel, and
PowerPoint documents, each of which has a corresponding flag in the
function, which you can toggle to determine whether the hidden data is shown or not.
lists each data type, its default behavior, and its corresponding configuration API flag.
Hidden Data Type
Microsoft Word
Comments a
Hidden text
Date field codes
File name field codes
Microsoft Excel
Hidden information
Comments
Formulas
Microsoft PowerPoint
Hidden slides
Comments
Comments slide
Slide notes e
Default Behavior
Shown b
Hidden
Calculated date
Document file name
Hidden
Hidden
Calculated value
Shown
Shown c
Hidden
Hidden
Configuration API Flag
KVCFG_WP_NOCOMMENTS
KVCFG_WP_SHOWHIDDENTEXT
KVCFG_WP_SHOWDATEFIELDCODE
KVCFG_WP_SHOWFILENAMEFIELDCODE
KVCFG_SS_SHOWHIDDENINFOR
KVCFG_SS_SHOWCOMMENTS
KVCFG_SS_SHOWFORMULA
KVCFG_PG_HIDEHIDDENSLIDE
KVCFG_PG_HIDECOMMENT
KVCFG_PG_SHOWCOMMENTSSLIDE d
KVCFG_PG_SHOWSLIDENOTES a. Word comment settings can also be toggled with a configuration parameter in the formats_e.ini file.
See “Toggle Word Comment Settings in the formats_e.ini File” on page .
b. Shown by default in Microsoft Word 97 to 2003 documents.
c. Shown by default in Microsoft PowerPoint 97 to 2000 documents.
XML Export SDK C Programming Guide
•
•
•
•
•
•
129
Chapter 4 Use the XML Export API
130
•
•
•
•
•
• d. This setting affects PowerPoint 2003 and 2007 only.
e. PowerPoint slide note settings can also be toggled with a configuration parameter in the formats_e.ini file. See
“Toggle PowerPoint Slide Note Settings in the formats_e.ini File” on page .
To toggle the display of any type of hidden data
Use the configuration API and set the third parameter to TRUE or FALSE:
(*fpHTMLConfig)(pKVHTML, KVCFG_WP_NOCOMMENTS, TRUE, NULL)
In this example, comments will not be exported from Word documents.
NOTE The third parameter affects the default behavior.
To change the default behavior, set it to TRUE.
For more information, see
Toggle Word Comment Settings in the formats_e.ini File
Microsoft Word 97 to 2003 comment settings can also be controlled through a parameter in the formats_e.ini file.
The formats_e.ini file is in the directory install\OS\bin, where install is the pathname of the Export installation directory and OS is the name of the operating system.
To toggle comment output in formats_e.ini
1. Open the formats_e.ini file in a text editor.
2. Under [Options], add the WP_NOCOMMENTS parameter and set it to 0 to show comments or 1 to hide comments. For example:
[Options]
WP_NOCOMMENTS=1
NOTE The configuration API flag
KVCFG_WP_NOCOMMENTS overrides the setting in formats_e.ini
.
Toggle PowerPoint Slide Note Settings in the formats_e.ini File
Microsoft PowerPoint slide note settings can also be controlled through a parameter in the formats_e.ini file.
The formats_e.ini file is in the directory install\OS\bin, where install is the pathname of the Export installation directory and OS is the name of the operating system.
XML Export SDK C Programming Guide
Show Hidden Data
To toggle slide note output in formats_e.ini
1. Open the formats_e.ini file in a text editor.
2. Under [Options], add the ShowSlideNotes parameter and set it to 1 to show slide notes or 0 to hide slide notes. For example:
[Options]
ShowSlideNotes=1
NOTE The configuration API flag
KVCFG_PG_SHOWSLIDENOTES overrides the setting in formats_e.ini
.
XML Export SDK C Programming Guide
•
•
•
•
•
•
131
Chapter 4 Use the XML Export API
132
•
•
•
•
•
• XML Export SDK C Programming Guide
C HAPTER 5
Sample Programs
This section describes the sample programs provided with XML Export. It contains the following topics:
Introduction
The sample programs demonstrate how to use the C and Visual Basic implementations of XML Export. The sample code is intended to provide a starting point for your own applications or to be used for reference purposes.
XML Export SDK C Programming Guide
•
•
•
•
•
•
133
134
•
•
•
•
•
•
Chapter 5 Sample Programs
The source code and makefile for each program are in the directory
install\xmlexport\programs\program_name where install is the pathname of the Export installation directory, and program_name is the name of the sample program.
C Sample Programs
The C sample programs demonstrate how to use the C implementation of XML
Export. The sample code is intended to provide a starting point for your own applications or to be used for reference purposes.
The following C sample programs are provided:
The source code and makefile for each program are in the directory
install\xmlexport\programs\program_name where install is the pathname of the Export installation directory, and program_name is the name of the sample program.
NOTE The sample programs do not parse white space in filenames. If your filenames contain spaces, use quotation marks around the entire path name. Inserting quotation marks around the filename only does not work.
To compile the C sample programs, use the makefiles provided in the sample programs directories. Ensure the XML Export include directory is specified in the include path of the project. Once the executables are compiled and built, they must be placed in the same directory as the XML Export libraries.
XML Export SDK C Programming Guide
tstxtract
Compile the Visual Basic Sample Program
To compile Export Demo, use the Visual Studio project file ( demo_vb.vbp
) in the directory install
\xmlexport\programs\ExportDemo
, where install is the pathname of the Export installation directory.
tstxtract
The tstxtract sample program demonstrates the File Extraction API. It opens a file, extracts sub files from the file, and repeats the extraction process until all sub files are extracted. It also demonstrates how to extract the default set of metadata and pass integer or string names to extract specific metadata. After the files are extracted, you can convert the files using one of the conversion sample programs.
The source code for the tstxtract sample program is the same for the Filter and Export SDKs. A flag in the makefile specifies whether the program is compiled for Filter, HTML Export, or XML Export.
To run tstxtract, type the following command line: tstxtract [options] input_file output_directory bin_directory where options is one or more of the following:
Option Description
-c charset
-cf keyfile1, keyfile2,...
-l logfile
Specify the target character set, for example KVCS_SJIS. See
“Coded Character Sets” on page
for a full list of supported character sets.
Specify one or more credential files (private keys) to use to decrypt encrypted .EML, .MBX, .PST, or .MSG files.
Specify the path and filename of the log file in which metadata is written.
Retrieve metadata and write the data to the log file.
-lm
-lms
metaname1,
metaname2,...
Retrieve metadata with string metanames and write the data to the log file for .MSG, .EML, .MBX, and .NSF files.
-lmi metaint1,
metaint2,...
Retrieve metadata with integer (hexadecimal) metanames and write the data to the log file for .PST files.
-lma Retrieve all metadata from an .NSF file and write the data to the log file.
XML Export SDK C Programming Guide
•
•
•
•
•
•
135
Chapter 5 Sample Programs
136
•
•
•
•
•
•
Option
-r
-msg
-f
-t
-h
Description
Recursively extract second-level subfiles to the specified output directory. For example, if a .ZIP file contains a Microsoft Word file and the Word file contains an embedded Microsoft Excel file, set the -r option to extract both the Word and Excel files.
If this option is not set, only first-level subfiles are extracted. For the example above, only the Word file would be extracted.
Extract mail messages in a .PST file as an .MSG file, including all of its attachments. If this flag is not set, the mail message is extracted as text. This applies to PST files on Windows only.
Extract the formatted version of the message body (HTML or
RTF) from mail files when possible. If neither an HTML nor RTF version of the message body exists in the mail file, then it is extracted as plain text. If this flag is not set, the message body is extracted as plain text when possible.
Preserve the timestamp of embedded files when possible.
Extract hidden text.
input_file is the full path and filename of the source document.
output_directory is the directory to which the files will be extracted. bin_directory is the path to the Export bin directory. This is required if you do not run the program from the install\Export SDK\bin directory.
cnv2xml
The cnv2xml
sample program creates a single, formatted XML output file. It is called by the Export Demo sample program, but can also be used on its own. This program runs on both Windows and UNIX platforms.
To run cnv2xml , type the following command line: cnv2xml [options] inputfile outputfile where, options
is one or more of the options listed in Table .
inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
XML Export SDK C Programming Guide
cnv2xmloop
The following options are available:
Option
-pdfltr
-pdfrtl
-pdfauto
-pdfraw
14 cnv2xml
Sample Program
Description
-c KVCFG_SUPPRESSIMAGES
-c KVCFG_ENABLEPOSITIONINFO
-c KVCFG_DELSOFTHYPHEN
Specifies that XML output includes verbose markup, but no images. If this option is not set, then embedded images in a document are regenerated as separate files and stored in the output directory. See
.
Specifies that a position element is included in the markup for PDF documents. The position element defines the absolute position of the text relative to the bottom left corner of the page, and includes additional information such as font and color. See
.
Specifies that soft hyphens in PDF files are deleted from the
converted output. See “Control Hyphenation” on page .
Specifies that PDF files are output in a logical reading order, and the paragraph direction is left to right. See
Files to a Logical Reading Order” on page .
Specifies that PDF files are output in a logical reading order, and the paragraph direction is right to left. See
Files to a Logical Reading Order” on page .
Specifies that PDF files are output in a logical reading order.
The PDF reader determines the paragraph direction
(left-to-right or right-to-left) for each PDF page, and then sets the direction accordingly. See
Logical Reading Order” on page .
Specifies that PDF files are output in an unstructured paragraph flow. This is the default. If logical reading order is enabled, and you want to return to an unstructured paragraph flow, set this flag. See
Logical Reading Order” on page .
cnv2xmloop
The cnv2xmloop
sample program creates a single, formatted XML output file, but unlike cnv2xml , it converts the file out of process. See
for more information on out of process conversions. This program runs on both Windows and UNIX platforms.
To run cnv2xmloop
, type the following command line: cnv2xmloop [options] inputfile outputfile
XML Export SDK C Programming Guide
•
•
•
•
•
•
137
Chapter 5 Sample Programs where, options
is one or more of the options listed in Table .
inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the XML output file.
The following options are available:
Option
15 cnv2xmloop
Sample Program
Description
-c KVCFG_SUPPRESSIMAGES
-c KVCFG_ENABLEPOSITIONINFO
Specifies that XML output includes verbose markup, but no images. If this option is not set, then embedded images in a document are regenerated as separate files and stored in the output directory.
See
.
Specifies that a position element is included in the markup for PDF documents. The position element defines the absolute position of the text relative to the bottom left corner of the page, and includes additional information such as font and color. See
metadata
The metadata
sample program converts a source document into a single XML file that only contains the document metadata (Author, Subject, Title, and so on). This program runs on both Windows and UNIX platforms.
To run metadata , type the following command line: metadata inputfile outputfile where, inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
138
•
•
•
•
•
•
xmlindex
The xmlindex
sample program produces stripped-down XML output suitable for use with indexing engines. It converts a source document into a single, largely unformatted XML file. This program runs on both Windows and UNIX platforms.
XML Export SDK C Programming Guide
xmlini
To run index
, type the following command line: xmlindex inputfile outputfile where, inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
xmlini
The xmlini sample program is used in conjunction with template files to produce well-formed XML documents. For more information, see
Using the Template Files” on page . Sample template files are in the directory
programs\ini
. This program runs on both Windows and UNIX platforms.
To run xmlini , type the following command line: xmlini [options] inifile inputfile outputfile where, options
is one or more of the options listed in Table .
inifile is the full path and filename of a template file.
inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
XML Export SDK C Programming Guide
•
•
•
•
•
•
139
140
•
•
•
•
•
•
Chapter 5 Sample Programs
The following options are available:
Option
16 xmlini Sample Program
Description
-s stylesheetfile
-rm
Reads style sheet information from an existing style sheet file, or writes the information to an external CSS file. See
“Use Style Sheets with xmlini” on page .
-x xmlconfig_filename Converts an XML file using customized element extraction settings defined in the kvxconfig.ini file. If you do not enter the full path to the template file, the program looks for the file in the current working directory
( install \ OS \bin , where install is the pathname of the Export installation directory and OS is the name of the operating system). See
If this is set, text and graphics that were deleted from a document with a revision tracking feature enabled are converted and revision tracking information is included in the XML output. See
-oop
-fl
Runs the conversion out of process.
Prints a list of converted files in the console.
If the XML file is output to a directory other than the directory programs\tempout
, you must update the XML markup so that, the browser can find images used by the template (such as backgrounds or corporate logos) and the style sheet. The markup contains relative references to the image files ( ..\images ).
Use Style Sheets with xmlini
The xmlini sample program provides an option that allows XML Export to read
Cascading Style Sheet (CSS), or Extensible Stylesheet Language (XSL) style sheet information from an existing style sheet file, or to write CSS information to an external CSS file. If the CSS does not exist, it is created. The style sheet name is referenced in the output XML, for example:
<?xml-stylesheet href="c:\mystyle.css" type="text/css"?>
This type of conversion makes the XML output document significantly smaller and allows you to use the same style sheet for many conversions.
XML Export SDK C Programming Guide
xmlcallback
To apply an existing style sheet to a conversion using the xmlini
sample program
1. In the template file, set eStyleSheetType to either XML_CSS or XML_XSL . This specifies that the formatting data is stored in either a CSS, or an XSL style sheet.
2. At the command prompt, type: xmlini -s stylesheetname inifile inputfile outputfile where stylesheetname is the path and filename of the CSS or XSL file.
xmlcallback
The xmlcallback sample program demonstrates how you can control the conversion to generate specialized output while it is in progress. The program employs developer-defined callbacks and memory management functions during conversion. This program runs on Windows platforms only.
To run xmlcallback , type the following command line: xmlcallback inputfile outputfile where, inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
xmlonefile
The xmlonefile
sample program converts a source document into a single, formatted XML file. This program runs on Windows platforms only.
To run xmlonefile
, type the following command line: xmlonefile inputfile outputfile where, inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
XML Export SDK C Programming Guide
•
•
•
•
•
•
141
142
•
•
•
•
•
•
Chapter 5 Sample Programs
xmlmulti
The xmlmulti
sample program creates multiple XML files from a source document. The main file contains the table of contents. Each H1 heading is contained within its own file. The main file contains hyperlinks to each H1 block; each H1 file contains navigation to the table of contents, as well as to the previous and next blocks. This program runs on Windows platforms only.
To run multi , type the following command line: xmlmulti inputfile outputfile where, inputfile is the full path and filename of the source document.
outputfile is the full path and filename of the first XML output file.
Export Demo
Export Demo is a Visual Basic program that provides an easy-to-use graphical user interface to the KeyView Export technology. It allows you to select files, convert them to XML, and view the result in a browser object. The output options that control the look of the output files are pre-defined in Export Demo and cannot be changed in the user interface.
Export Demo accesses the Export functionality by returning to the operating system and running a C program named cnv2xml
. To adapt the sample program to your needs, modify the GUI using Visual Basic, and the cnv2xml program
using C. See “cnv2xml” on page .
To launch Export Demo, select Export Demo from Start | Programs | Autonomy
| Export SDK | XML Export.
The source code for the program is in the directory install \xmlexport\ programs\ExportDemo
, where install is the pathname of the Export installation directory. Export Demo is for Windows only.
See
“Use the Export Demo Program” on page
for more information.
XML Export SDK C Programming Guide
P ART 3
C API Reference
This section provides detailed reference information for the
C-language implementation of the File Extraction and Export
APIs. It includes the following chapters:
File Extraction API Structures
XML Export API Callback Functions
Part 3 C API Reference
144
•
•
•
•
•
• XML Export SDK C Programming Guide
C HAPTER 6
File Extraction API Functions
This section describes the functions in the File Extraction API. The File Extraction functions open a container file, and extract the container’s sub files so that the sub files are exposed and available for conversion. Sub files may be files within a Zip archive, messages in a mail store, attachments in a mail message, or OLE objects embedded in a compound document. See
“Sub File Extraction” on page for
more information.
Each function appears as a function prototype followed by a description of its arguments, its return value, and a discussion of its use. This section contains the following topics.
XML Export SDK C Programming Guide
•
•
•
•
•
•
145
146
•
•
•
•
•
•
Chapter 6 File Extraction API Functions
KVGetExtractInterface()
This function is the entry point to obtain the file extraction functions. It supplies pointers to the file extraction functions, and in the case of out-of-process mode starts the kvoop.exe server and initializes out-of-process extraction services.
When KVGetExtractInterface() is called, it assigns the function pointers in the structure KVExtractInterface to the functions described in this section.
Syntax int pascal KVGetExtractInterface (
void *pContext,
KVExtractInterface pIextract);
Arguments pContext pIextract
Pointer returned from fpInit().
Pointer to the structure KVExtractInterface, which contains function pointers that KVGetExtractInterface() assigns to all
other file extraction functions. See “KVExtractInterface” on page .
Before initializing the KVExtractInterface structure, use the macro KVStructInit to initialize the KVStructHead structure. See
.
Returns
If the call is successful, the return value is KVERR_Success.
If the call is not successful, the return value is an error code.
Example fpKVGetExtractInterface =
(int (pascal *)( void *, KVExtractInterface))myGetProcAddress(hKVExport,
(char*)"KVGetExtractInterface");
/*Initialize file extraction interface structure using KVStructInit*/
KVStructInit(&extractInterface);
/* Retrieve file extraction interface */ error = (*fpKVGetExtractInterface)(pExport,&extractInterface))
XML Export SDK C Programming Guide
fpCloseFile()
fpCloseFile()
This function frees the memory allocated by fpOpenFile() and closes the file.
See
Syntax int (pascal *fpCloseFile) (void *pFile);
Arguments pFile Identifier of the file. This is a file handle returned from fpOpenFile().
See
Returns
If the file is closed, the return value is KVERR_Success.
If the file is not closed, the return value is an error code.
Example extractInterface->fpCloseFile(pFile); pFile = NULL;
XML Export SDK C Programming Guide
•
•
•
•
•
•
147
Chapter 6 File Extraction API Functions
148
•
•
•
•
•
•
fpExtractSubFile()
This function extracts a sub file from a container file to a user-defined path or output stream. This call returns file format information when file is extracted to a path.
Syntax int (pascal *fpExtractSubFile) (
void *pFile,
KVExtractSubFileArg
KVSubFileExtractInfo
extractArg,
*extractInfo);
Arguments pFile extractArg
Identifier of the file. This is a file handle returned from fpOpenFile() . See
Pointer to the structure KVExtractSubFileArg, which defines the sub file to be extracted. See
“KVExtractSubFileArg” on page .
Before initializing the KVExtractSubFileArg structure, use the macro KVStructInit to initialize the KVStructHead structure. See
.
extractInfo Pointer to the structure KVSubFileExtractInfo, which defines
information about the extracted sub file. See “KVSubFileExtractInfo” on page .
Returns
If the sub file is extracted from the container file, the return value is
KVERR_Success .
If the sub file is not extracted from the container file, the return value is an error code.
Discussion
After the file is extracted, call fpFreeStruct() to free the memory allocated
by this function. See “fpFreeStruct()” on page .
If the sub file is embedded in the main file as a link and is stored externally, extractInfo->infoFlag is set to
KVSubFileExtractInfoFlag_External . For example, the sub file may be an object that was embedded in a Word document using “Link to File,” or
XML Export SDK C Programming Guide
fpExtractSubFile() an attachment that is referenced in an MBX message. This type of sub file cannot be extracted. You must write code to access the sub file based on the path in the member extractInfo->filePath or extractInfo->fileName
. See “KVSubFileExtractInfo” on page
.
Example
KVSubFileExtractInfo extractInfo = NULL;
KVStructInit(&extractArg); extractArg.index = index; extractArg.extractionFlag = KVExtractionFlag_CreateDir |
KVExtractionFlag_Overwrite; extractArg.filePath = subFileInfo->subFileName;
/*Extract this sub file*/ error=extractInterface->fpExtractSubFile(pFile,&extractArg,&extractInfo); if ( error )
{
extractInterface->fpFreeStruct(pFile,extractInfo);
subFileInfo = NULL;
}
XML Export SDK C Programming Guide
•
•
•
•
•
•
149
150
•
•
•
•
•
•
Chapter 6 File Extraction API Functions
fpFreeStruct()
This function frees the memory allocated by fpGetMainFileInfo(), fpGetSubFileInfo() , fpGetSubFileMetadata(), and fpExtractSubFile() .
Syntax int (pascal *fpFreeStruct) (
void *pFile,
void *obj);
Arguments pFile obj
Identifier of the file. This is a file handle returned from fpOpenFile().
See
.
Pointer to the result object returned by fpGetMainFileInfo(), fpGetSubFileInfo() , fpGetSubFileMetaData, or fpExtractSubFile() .
Returns
If the allocated memory is freed, the return value is KVERR_Success.
Otherwise, the return value is an error code.
Example
The example below frees the memory allocated by fpGetSubFileInfo(): if ( subFileInfo )
{
extractInterface->fpFreeStruct(pFile,subFileInfo);
subFileInfo = NULL;
}
XML Export SDK C Programming Guide
fpGetMainFileInfo()
fpGetMainFileInfo()
This function determines whether a file is a container file—that is, whether it contains sub files—and should be extracted further.
Syntax int (pascal *fpGetMainFileInfo) (
void *pFile,
KVMainFileInfo *fileInfo);
Arguments pFile Identifier of the file. This is a file handle returned from fpOpenFile().
fileInfo Pointer to the structure KVMainFileInfo. This structure contains information about the file. See
Returns
If the file information is retrieved, the return value is KVERR_Success.
If the file information is not retrieved, the return value is an error code.
Discussion
After the file information is retrieved, call fpFreeStruct() to free the memory allocated by this function. See
.
If the file is a container (fileInfo->numSubFiles is non-zero), call fpGetSubFileInfo() and fpExtractSubFile() for each sub file. See
and
.
If the file is not a container (fileInfo->numSubFiles is 0) and contains text (fileInfo->infoFlag is set to
KVMainFileInfoFlag_HasContent) , pass the file directly to the conversion functions. See
“XML Export API Functions” on page .
Example
KVMainFileInfo fileInfo = NULL; if( (error=extractInterface->fpGetMainFileInfo(pFile,&fileInfo)))
{
XML Export SDK C Programming Guide
•
•
•
•
•
•
151
Chapter 6 File Extraction API Functions
/* Free result object allocated in fileInfo */
extractInterface->fpFreeStruct(pFile,fileInfo);
fileInfo = NULL;
}
152
•
•
•
•
•
• XML Export SDK C Programming Guide
fpGetSubFileInfo()
fpGetSubFileInfo()
This function gets information about a sub file in a container file.
Syntax int (pascal *fpGetSubFileInfo) (
void *pFile,
int index,
KVSubFileInfo *subFileInfo);
Arguments pFile index
Identifier of the main file. This is a file handle returned from fpOpenFile()
. See “fpOpenFile()” on page .
The index number of the sub file for which information will be retrieved. subFileInfo Pointer to the structure KVSubFileInfo, which defines information about the sub file. See
Returns
If the file information is retrieved, the return value is KVERR_Success.
If the file information is not retrieved, the return value is an error code.
Discussion
After the sub file information is retrieved, call fpFreeStruct() to free the memory allocated by this function. See
.
If the root node is not enabled, the first sub file is index 0. If the root node is enabled, the first sub file is index 1. The root node is required to recreate a file’s hierarchy. See
“Create a Root Node” on page .
The members subFileInfo->parentIndex and subFileInfo->childArray enable you to recreate a file’s hierarchy. Since childArray only retrieves the first-level children in the sub file, you must call fpGetSubFileInfo() repeatedly until information for the leaf-node children is extracted. See
“Recreate a File’s Hierarchy” on page .
If the sub file is embedded in the main file as a link and is stored externally, subFileInfo->infoFlag is set to KVSubFileInfoFlag_External. For example, the sub file may be an object that was embedded in a Word
XML Export SDK C Programming Guide
•
•
•
•
•
•
153
Chapter 6 File Extraction API Functions document using “Link to File,” or an attachment that is referenced in an MBX message. This type of sub file cannot be extracted. You must write code to access the sub file based on the path in the member subFileInfo->subFileName
.
The KVSubFileInfoFlag_External flag will not be set for an OLE object that is embedded as a link in a Microsoft PowerPoint file. KeyView can only detect linked objects in a Microsoft PowerPoint file when the object is extracted. See
“fpExtractSubFile()” on page .
Example
KVSubFileInfo for ( index = 0; index < fileInfo->numSubFiles; index++)
{
error=extractInterface->fpGetSubFileInfo(pFile,index,&subFileInfo);
if ( error )
{
extractInterface->fpFreeStruct(pFile,subFileInfo);
subFileInfo = NULL;
}
154
•
•
•
•
•
• XML Export SDK C Programming Guide
fpGetSubFileMetaData()
fpGetSubFileMetaData()
This function extracts metadata from mail stores, mail messages, and non-mail
items in an NSF file. See “Extract Mail Metadata” on page .
Syntax int (pascal *fpGetSubFileMetaData) (
void *pFile,
KVGetSubFileMetaArg
KVSubFileMetaData
metaArg,
*metaData);
Arguments pFile metaArg metaData
Identifier of the file. This is a file handle returned from fpOpenFile().
Pointer to the structure KVGetSubFileMetaArg, which defines metadata tags whose values are retrieved. See
“KVGetSubFileMetaArg” on page .
Before initializing the KVGetSubFileMetaArg structure, use the macro KVStructInit to initialize the KVStructHead structure. See
.
Pointer to the structure KVSubFileMetaData, which contains the retrieved metadata values. See
Returns
If the metadata is retrieved, the return value is KVERR_Success.
If the metadata is not retrieved, the return value is an error code.
Discussion
After the metadata is retrieved, call fpFreeStruct() to free the memory allocated by this function. See
When you pass in 0 for metaArg->metaNameCount, and NULL for metaArg->metaNameArray , a set of default metadata is retrieved. See
“Extract Mail Metadata” on page .
If a field is repeated in an EML or MBX mail header, the values in each instance of the field are concatenated and returned as one field. The values are separated by five pound signs (#####) delimiter.
XML Export SDK C Programming Guide
•
•
•
•
•
•
155
156
•
•
•
•
•
•
Chapter 6 File Extraction API Functions
Example
KVSubFileMetaData metaData = NULL;
KVStructInit(&metaArg);
/* retrieve all the default metadata elements */ metaArg.metaNameCount = 0; metaArg.metaNameArray = NULL; metaArg.index = Index; error = extractInterface->fpGetSubFileMetaData(pFile,&metaArg,&metaData);
...
extractInterface->fpFreeStruct(pFile,metaData); metaData = NULL;
/* retrieve specific metadata fields */
KVMetaName pName[2];
KVMetaNameRec names[2]; names[0].type = KVMetaNameType_Integer; names[0].name.iname = KVPR_SUBJECT; names[1].type = KVMetaNameType_Integer; names[1].name.iname = KVPR_DISPLAY_TO; pName[0] = &names[0]; pName[1] = &names[1]; metaArg.metaNameCount = 2; metaArg.metaNameArray = pName; metaArg.index = Index; error = extractInterface->fpGetSubFileMetaData
(pFile,&metaArg,&metaData);
...
extractInterface->fpFreeStruct(pFile,metaData); metaData = NULL;
XML Export SDK C Programming Guide
fpOpenFile()
fpOpenFile()
This function opens a file to make the file accessible for sub file extraction or conversion.
Syntax int (pascal *fpOpenFile) (
void *pContext,
KVOpenFileArg
void
openArg,
**pFile);
Arguments pContext openArg pFile
Pointer returned from fpInit().
Pointer to the structure KVOpenFileArg. This structure defines the input parameters necessary to open a file for extraction, such as
credentials, and the default extraction directory. See “KVOpenFileArg” on page .
Before initializing the KVOpenFileArg structure, use the macro
KVStructInit to initialize the KVStructHead structure. See
.
Handle for the opened file. This handle is used in subsequent file extraction calls to identify the source file.
Returns
Discussion
Call fpCloseFile() to free the memory allocated by this function. See
Example
If the file is opened, the return value is KVERR_Success.
If the file is not opened, the return value is an error code and pFile is NULL.
KVOpenFileArgRec openArg;
/*Initialize the structure using KVStructInit*/
KVStructInit(&openArg);
XML Export SDK C Programming Guide
•
•
•
•
•
•
157
Chapter 6 File Extraction API Functions openArg.extractDir = destDir; openArg.filePath = srcFile;
/*Open the main file */ if ( (error = extractInterface->fpOpenFile(pExport,&openArg,&pFile)))
{
extractInterface->fpCloseFile(pFile);
pFile = NULL;
}
158
•
•
•
•
•
• XML Export SDK C Programming Guide
C HAPTER 7
File Extraction API Structures
This section provides information on the structures used by the File Extraction
API. These structures define the input and output parameters required to extract sub files from a container file, and are defined in kvxtract.h. This section contains the following topics.
XML Export SDK C Programming Guide
•
•
•
•
•
•
159
Chapter 7 File Extraction API Structures
KVCredential
This structure contains a count of the number of credential elements, and a pointer to the first element of the array of individual elements. It is initialized by calling fpOpenFile(). See
. It is defined in kvxtract.h
.
typedef struct tag_KVCredential
{
int itemCount;
KVCredentialComponent
}
*items;
KVCredentialRec, *KVCredential;
Member Descriptions itemCount The number of credentials defined for this file.
items Pointer to the structure KVCredentialComponent. This structure contains the individual credential elements used to open a protected file.
See
“KVCredentialComponent” on page
.
160
•
•
•
•
•
• XML Export SDK C Programming Guide
KVCredentialComponent
KVCredentialComponent
This structure contains the value of a credential item. It is defined in kvxtract.h
.
typedef struct tag_KVCredentialComponent
{
KVCredKeyType
union
keytype;
{
void
char
*pkey;
*skey;
unsigned int
}
ikey;
keyobj;
}
KVCredentialComponentRec, *KVCredentialComponent;
Member Descriptions keytype pkey skey ikey
The type of credential (such as a user name or password). The types are defined by the enumerated type KVCredKeyType. See
Pointer to a structure defining credentials. Reserved for future use.
Pointer to a string credential key.
An integer credential key.
XML Export SDK C Programming Guide
•
•
•
•
•
•
161
162
•
•
•
•
•
•
Chapter 7 File Extraction API Structures
KVExtractInterface
The members of this structure are pointers to the file extraction functions
described in “File Extraction API Functions” on page . When the function
KVGetExtractInterface() is called, this structure assigns pointers to the functions. The structure is defined in kvxtract.h. See
“KVGetExtractInterface()” on page
.
typedef struct tag_KVExtractInterface
{
KVStructHeader;
int void **pFileHandle);
int (pascal *fpCloseFile) (void *pFileHandle);
int (pascal *fpGetMainFileInfo) (void *pFile, KVMainFileInfo
*MainFileInfo);
int (pascal *fpGetSubFileInfo) (void *pFile, int index,
KVSubFileInfo *subFileInfo);
int (pascal *fpGetSubFileMetaData) (void *pFile,
KVGetSubFileMetaArg metaArg, KVSubFileMetaData *metaData);
int (pascal *fpExtractSubFile) (void *pFile,
KVExtractSubFileArg extractArg, KVSubFileExtractInfo
*extractInfo);
int (pascal *fpFreeStruct) (void *pFile, void *obj);
}
KVExtractInterfaceRec, *KVExtractInterface;
Member Descriptions
The member functions are described in
“File Extraction API Functions” on page .
Discussion
Before initializing a File Extraction structure, use the macro KVStructInit to initialize the KVStructHead structure. This sets the revision number of the File
Extraction API and supports binary compatibility with future releases. See
XML Export SDK C Programming Guide
KVExtractSubFileArg
KVExtractSubFileArg
This structure defines the input parameters required to extract a sub file. See
“fpExtractSubFile()” on page . It is defined in kvxtract.h.
typedef struct tag_KVExtractSubFileArg
{
KVStructHeader;
int index;
KVCharSet
KVCharSet
srcCharset;
trgCharset;
int isMSBLSB;
DWORD
char
extractionFlag
*filePath;
char *extractDir;
KVOutputStream *stream;
}
KVExtractContainerSubFileArgRec, *KVExtractContainerSubFileArg;
Member Descriptions
KVStructHeader The KeyView version of the structure. See
.
index The index number of the sub file to be extracted.
srcCharset trgCharset isMSBLSB
Specifies the source character set of the sub file when the file format’s reader cannot determine the character set. The character sets are enumerated in KVCharSet of kvtypes.h. See
If the file type is KVFileType_Main, this is the target character set of the extracted file. Otherwise, this is ignored. The character sets are enumerated in KVCharSet in kvtypes.h. See
This flag indicates whether the byte order for Unicode text is Big
Endian (MSBLSB) or Little Endian (LSBMSB).
XML Export SDK C Programming Guide
•
•
•
•
•
•
163
164
•
•
•
•
•
•
Chapter 7 File Extraction API Structures extractionFlag A bitwise flag defining additional parameters for file extraction.
The following flags are available:
KVExtractionFlag_CreateDir
Indicates whether the directory structure of a sub file should be created. If this is set, the path defined in filePath is created if it does not already exist. If this is not set, the path is not created, and the function returns FALSE.
KVExtractionFlag_Overwrite
If this is set, and the file being extracted has the same name as a file in the target path, the file in the target path is overwritten without warning. If this is not set, and a sub file has the same name as a file in the target path, the error
KVError_OutputFileExists is generated.
KVExtractionFlag_ExcludeMailHeader
If this is set, header information (To, From, Sent, and so on) in a mail file is not included in the extracted data. If this is not set, the extracted data contains header information and the message’s body text. See
Extracted Text File” on page .
KVExtractionFlag_GetFormattedBody
If this is set, the formatted version of the message body (HTML or RTF) is extracted from mail files when possible. If neither an
HTML nor RTF version of the message body exists in the mail file, then it is extracted as plain text. If this flag is not set, the message body is extracted as plain text when possible.
Note: When an HTML or RTF message body is extracted, the message’s mail headers (such as “From,” “To,” and “Subject,”) are extracted, saved in the same format, and added to the beginning of the sub file. This applies to PST (MAPI-based reader), MSG and NSF files only.
KVExtractionFlag_SaveAsMSG
If this is set, the mail message is extracted as an MSG file, including all of its attachments. If this flag is not set, the mail message is extracted as text. This applies to PST files on
Windows only.
Note: In file mode, when the application sets this flag in fpExtractSubFile() , it must also check the
KVSubFileExtractInfo structure’s filePath parameter to verify the filename used for extraction. See
“fpExtractSubFile()” on page and
“KVSubFileExtractInfo” on page
.
XML Export SDK C Programming Guide
KVExtractSubFileArg filePath extractDir stream
Pointer to the suggested path or filename to which the sub file is extracted. This can be a filename, partial path, or full path. This can be used in conjunction with extractDir to create the full output path. See
Pointer to the directory to which sub files are extracted. This directory must exist. If this is set, the path specified in
KVOpenFileArg->extractDir is ignored. This is used in conjunction with filePath to create the full output path.
Pointer to an output stream defined by KVOutputStream. See
. See
Discussion
The KVSubFileExtractInfoFlag_CharsetConverted flag in the
KVSubFileExtractInfo structure indicates whether the character set of
the sub file was converted during extraction. See “KVSubFileExtractInfo” on page .
If the document character set is detected and is also specified in srcCharset , the detected character set is overridden by the specified character set. If the source character set is not detected and is not specified, character set conversion does not occur. The section
“Supported Formats” on page lists the formats for which the source character set can be
determined.
The following applies when the output is to a file:
If filePath is a valid full path, filePath is the output path, and the path in extractDir is ignored.
If filePath is a filename or partial path, the target directory specified in either KVExtractSubFileArg->extractDir or
KVOpenFileArg->extractDir is used to create the full path. See
If filePath is a full path or partial path, and createDir is TRUE, the directory is created if it does not already exist.
If filePath is not specified, a default name and the target directory specified in either KVExtractSubFileArg->extractDir or
KVOpenFileArg->extractDir are used to create a full path.
If both filePath and extractDir are not specified or are invalid, an error is returned.
If filePath is valid, but extractDir is not valid, an error is returned.
The following applies when the output is to a stream:
XML Export SDK C Programming Guide
•
•
•
•
•
•
165
Chapter 7 File Extraction API Structures
Set filePath and extractDir to NULL.
The file format (docInfo) and extraction file path (filePath) are not returned in KVSubFileExtractInfo. See
“KVSubFileExtractInfo” on page .
The flags KVExtractionFlag_CreateDir and
KVExtractionFlag_Overwrite are ignored.
166
•
•
•
•
•
• XML Export SDK C Programming Guide
KVGetSubFileMetaArg
KVGetSubFileMetaArg
This structure defines the metadata tags whose values are retrieved by fpGetSubFileMetaData() . See
“fpGetSubFileMetaData()” on page . It is
defined in kvxtract.h.
typedef struct tag_KVGetSubFileMetaArg
{
KVStructHeader;
int index;
int metaNameCount;
KVMetaName
KVCharSet
*metaNameArray;
srcCharset;
KVCharSet trgCharset;
int isMSBLSB;
}
KVGetSubFileMetaArgRec, *KVGetSubFileMetaArg;
Member Descriptions
KVStructHeader The KeyView version of the structure. See
index metaNameCount
The index number of the sub file for which metadata is extracted.
The number of metadata fields to be extracted. metaNameArray srcCharset trgCharset isMSBLSB
Pointer to the structure KVMetaName containing an array of metadata tags whose values are retrieved. See
Specifies the source character set of the metadata when the format’s reader cannot determine the character set. The character sets are enumerated in KVCharSet of kvtypes.h.
The target character set of the extracted metadata.
The character sets are enumerated in KVCharSet in kvtypes.h
.
This flag indicates whether the byte order for Unicode text is Big
Endian (MSBLSB) or Little Endian (LSBMSB).
Discussion
If the character set is detected and is also specified in srcCharset, the detected character set is overridden by the specified character set. If the
XML Export SDK C Programming Guide
•
•
•
•
•
•
167
Chapter 7 File Extraction API Structures source character set is not detected and is not specified, character set
conversion does not occur. The section “Supported Formats” on page
lists the formats for which the source character set can be determined.
To retrieve a pre-defined list of metadata, pass 0 for metaNameCount and
NULL for metaNameArray. The metadata in
168
•
•
•
•
•
• XML Export SDK C Programming Guide
KVMainFileInfo
KVMainFileInfo
This structure contains information about a main file that is open for extraction. It is initialized by calling fpGetMainFileInfo(). See
“fpGetMainFileInfo()” on page . It is defined in kvxtract.h.
typedef struct tag_KVMainFileInfo
{
KVStructHeader;
int numSubFiles;
ADDOCINFO
KVCharSet
docInfo;
charset;
int isMSBLSB;
unsigned long
}
infoFlag;
KVMainFileInfoRec, *KVMainFileInfo;
Member Descriptions
KVStructHeader The KeyView version of the structure. See
.
numSubFiles The number of sub files in the main file.
docInfo charset isMSBLSB infoFlag
The file’s major format (such as Microsoft Word or Corel
Presentation) as defined by the structure ADDOCINFO. See
The character set of the main file.
This flag indicates whether the byte order for Unicode text is Big
Endian (MSBLSB) or Little Endian (LSBMSB).
A bitwise flag providing additional information about the main file.
The following flag is available:
KVMainFileInfoFlag_HasContent —The main file contains text that can be converted. Below are some examples of how this flag is used:
For an MSG file without attachments, numSubFiles is 1
(message body text), and this flag is FALSE because the
MSG file itself does not contain text.
For a Zip file with three files, numSubFiles is 3, and this flag is FALSE because a Zip file does not contain text.
For a Microsoft Word file with an embedded OLE object, numSubFiles is 1 (OLE object), and this flag is TRUE (Word file contains text to be converted).
XML Export SDK C Programming Guide
•
•
•
•
•
•
169
Chapter 7 File Extraction API Structures
Discussion
If numSubFiles is 0, the file does not contain sub files and does not need to be extracted further. If the KVMainInfoFlag_HasContent flag is set, the file contains body text and can be passed directly to the conversion functions.
See
“XML Export API Functions” on page .
If numSubFiles is non-zero, get information on the sub file by calling fpGetSubFileInfo() , and then extract the sub files using fpExtractSubFile()
. See “fpGetSubFileInfo()” on page and
“fpExtractSubFile()” on page .
If openFlag is set to KVOpenFileFlag_CreateRootNode in the call to fpOpenFile() , numSubFiles also includes the root object (index 0) which is created by KeyView for reconstructing the file’s hierarchy. See
170
•
•
•
•
•
• XML Export SDK C Programming Guide
KVMetadataElem
KVMetadataElem
This structure contains metadata field values extracted from a mail file. It is defined in kvtypes.h. typedef struct tag_KVMetadataElem
{
int isDataValid;
int dataID;
KVMetadataType
char*
dataType;
strType;
void* data;
int dataSize;
}
KVMetadataElem;
Member Descriptions isDataValid dataID dataType strType data dataSize
Specifies whether the metadata returned from the API is valid data.
The integer name of the extracted metadata field.
The data type of the metadata field. The types are defined in
KVMetadataType in kvtypes.h. See
Pointer to the string name of the metadata field.
The contents of the metadata field.
If the type member is KVMetadata_Int4 or
KVMetadata_Bool , this member contains the actual value.
Otherwise, this member is a pointer to the actual value.
KVMetadata_DateTime points to an 8-byte value.
KVMetadata_String and KVMetadata_Unicode point to the beginning of the string containing the text. The strings are NULL terminated.
KVMetadata_Binary points to the first element of a byte array.
The byte count of data when the type is KVMetadata_Binary,
KVMetadata_Unicode or KVMetadata_String.
XML Export SDK C Programming Guide
•
•
•
•
•
•
171
172
•
•
•
•
•
•
Chapter 7 File Extraction API Structures
KVMetaName
This structure defines the names of the metadata fields to be extracted from a mail file. It is defined in kvxtract.h.
typedef struct tag_KVMetaName
{
KVMetaNameType
union
type;
{
void
char
}name;
*pname;
int iname;
*sname;
}
KVMetaNameRec, *KVMetaName;
Member Descriptions type pname iname sname
The type of metadata name (such as integer or string). The types are defined by the enumerated type KVMetaNameType. See
. Note MAPI property names are of type integer.
Pointer to a structure defining the metadata fields to be retrieved.
The name of a metadata field of type integer.
Pointer to the name of a metadata field of type string.
Discussion
If you specify the MAPI tag name (for example, PR_CONVERSATION_TOPIC), you must include the Windows header files mapitags.h and mapidefs.h in which
PR_CONVERSATION_TOPIC is defined as 0x0070001e.
XML Export SDK C Programming Guide
KVOpenFileArg
KVOpenFileArg
This structure defines the input arguments necessary to open a file for extraction.
It is initialized by calling fpOpenFile(). See
“fpOpenFile()” on page . It is
defined in kvxtract.h.
typedef struct tag_KVOpenFileArg
{
KVStructHeader;
KVCredential cred;
KVInputStream
char
*stream;
*filePath;
char
DWORD
*extractDir;
openFlag;
DWORD
void
reserved;
*pReserved;
}
KVOpenFileArgRec, *KVOpenFileArg;
Member Descriptions
KVStructHeader
The KeyView version of the structure. See “KVStructHead” on page .
cred The credentials required to open a protected PST or NSF file. This is a pointer to the KVCredential structure. Your application can define multiple credentials to this member for multiple formats.
See
stream filePath extractDir
Pointer to the developer-assigned instance of KVInputStream.
The structure KVInputStream defines the input stream
containing the source. See “KVInputStream” on page .
If you are using a file as input, this is NULL.
Pointer to the full file path to the source file.
If you are using a stream as input, this is NULL.
Pointer to the default directory to which sub files are extracted.
This directory must exist.
This is used in conjunction with
KVExtractSubFileArg->filePath to create the full output path. See
“KVExtractSubFileArg” on page .
XML Export SDK C Programming Guide
•
•
•
•
•
•
173
Chapter 7 File Extraction API Structures openFlag reserved pReserved
A bitwise flag defining additional parameters for opening the file.
The following flag is available:
KVOpenFileFlag_CreateRootNode —If this flag is set,
KeyView creates a root object when extracting this file’s sub files.
This root node does not have a parent and is at the highest level of the file’s tree structure. It is used internally to provide a reference point from which all other child nodes are determined, and the file’s hierarchy is created.
If you want to maintain the file’s hierarchy when you extract sub
files from a container, you must set this flag. See “Recreate a File’s
for more information.
The root node has an index of zero. Although not all container formats require an artificial root node, the root is created for all container formats regardless of whether the file itself contains a root directory or file.
Reserved for future use. It must be NULL.
Reserved for future use. It must be NULL.
174
•
•
•
•
•
• XML Export SDK C Programming Guide
KVOutputStream
KVOutputStream
This structure defines an output stream for the extracted sub file.
typedef struct tag_OutputStream
{
void *pOutputStreamPrivateData;
BOOL (pascal *fpCreate)(struct tag_OutputStream *,TCHAR *);
UINT (pascal *fpWrite) (struct tag_OutputStream *, BYTE *, UINT);
BOOL (pascal *fpSeek) (struct tag_OutputStream *, long, int);
long (pascal *fpTell) (struct tag_OutputStream *);
BOOL (pascal *fpClose) (struct tag_OutputStream *);
}
KVOutputStream;
Member Descriptions
All member functions are equivalent to their counterparts in the ANSI standard library.
XML Export SDK C Programming Guide
•
•
•
•
•
•
175
176
•
•
•
•
•
•
Chapter 7 File Extraction API Structures
KVSubFileExtractInfo
This structure contains information about an extracted sub file. It is initialized by calling fpExtractSubFile(). See
. It is defined in kvxtract.h.
typedef struct tag_KVSubFileExtractInfo
{
KVStructHeader;
char *filePath;
char *fileName;
unsigned long infoFlag;
ADDOCINFO
}
docInfo;
KVSubFileExtractInfoRec, *KVSubFileExtractInfo;
Member Descriptions
KVStructHeader The KeyView version of the structure. See
.
filePath The full path to which the sub file was extracted.
If the sub file is embedded in the main file as a link, this is the external path to the sub file.
If you output the data to a stream, the extraction path is not returned.
XML Export SDK C Programming Guide
KVSubFileExtractInfo fileName infoFlag docInfo
The original path and/or filename of the sub file.
If the sub file is embedded in the main file as a link, this is the external path to the sub file.
A bitwise flag providing additional information about the extracted sub file. The following flags are available:
KVSubFileExtractInfoFlag_NeedsExtraction —The file may contain sub files and should be extracted further.
KVSubFileExtractInfoFlag_FileCreated —The file was created on disk.
KVSubFileExtractInfoFlag_CharsetConverted —The sub file’s character set was converted.
KVSubFileExtractInfoFlag_External —The sub file is embedded in the main file as a link and is stored externally.
For example, the sub file may be an object that was embedded in a Word document using “Link to File,” or an attachment that is referenced in an MBX message. This type of file cannot be extracted. You must write code to access the sub file based on the path in the member filePath or fileName .
KVSubFileExtractInfoFlag_FolderCreated —A folder was created.
KVSubFileExtractInfoFlag_NonFormattedBodyExtra cted —Indicates that a plain text version of the message was extracted due to an error extracting the formatted version of the message.
The file’s major format (such as Microsoft Word or Corel
Presentation) as defined by the structure ADDOCINFO. See
If you output the data to a stream, the file format is not returned.
XML Export SDK C Programming Guide
•
•
•
•
•
•
177
Chapter 7 File Extraction API Structures
178
•
•
•
•
•
•
KVSubFileInfo
This structure contains information about a sub file in a container file. It is initialized by calling fpGetSubFileInfo(). See
“fpGetSubFileInfo()” on page . It is defined in kvxtract.h.
typedef struct tag_KVSubFileInfo
{
KVStructHeader;
char *subFileName;
int subFileType;
long subFileSize;
unsigned long infoFlag;
KVCharSet charset;
int isMSBLSB;
BYTE fileTime[8];
int parentIndex;
int childCount;
int *childArray;
}
KVContainerSubFileInfoRec, *KVSubFileInfo;
Member Descriptions
KVStructHeader The KeyView version of the structure. See
subFileName subFileType
The path and/or file name of the sub file.
If the sub file is the body text of a mail file or is an embedded OLE object, KeyView provides a default filename. See
Filenames for Extracted Sub Files” on page .
The sub file’s position in the container file’s hierarchy. The following options are available:
KVSubFileType_Main —The sub file is at the top level of the
main file. This is the default sub file type. See “Discussion”
below.
KVSubFileType_Attachment —The sub file is an attachment in a file.
KVSubFileType_OLE —The sub file is an embedded OLE object in a compound document.
KVSubFileType_Folder —The sub file is a folder or the
artificial root node (see “Create a Root Node” on page ).
XML Export SDK C Programming Guide
KVSubFileInfo subFileSize infoFlag charset isMSBLSB fileTime
The size of the sub file in bytes. This information may be useful if you do not want to extract very large files.
This value is approximate and is the maximum size of the sub file.
The sub file is usually smaller than this value when it is extracted.
A bitwise flag providing additional information about the sub file.
The following flags are available:
KVSubFileInfoFlag_NeedsExtraction —The sub file may contain sub files. It must be extracted further to conclusively determine whether it contains sub files.
KVSubFileInfoFlag_Secure —The sub file is secured and credentials (such as user name and password) are required to extract it. This flag applies to ZIP, RAR, and PDF files only.
KVSubFileInfoFlag_SMIME —The sub file is S/
MIME-encrypted and credentials are required to extract it. This applies to .eml and .pst files only.
KVSubFileInfoFlag_External —The sub file is embedded in the main file as a link and is stored externally. For example, the sub file may be an object that was embedded in a Word document using “Link to File,” or an attachment that is referenced in an MBX message. This type of file cannot be extracted. You must write code to access the sub file based on the path in the member subFileName.
KVSubFileInfoFlag_MailItem —When the sub file type is
KVSubFileType_Attachment , this indicates the attachment is a mail item. This flag applies to PST, MSG, and NSF files only.
If the sub file is not an attachment, this is the character set of the sub file. If the sub file is an attachment, the character set is
KVCS_UNKNOWN .
This flag indicates whether the byte order for Unicode text is Big
Endian (MSBLSB) or Little Endian (LSBMSB).
When the sub file is a mail message, this is the file’s Sent time.
Otherwise, it is the last modified time. The file time is not available for the following file types:
EML attachments
OLE objects in a Microsoft Office document
XML Export SDK C Programming Guide
•
•
•
•
•
•
179
180
•
•
•
•
•
•
Chapter 7 File Extraction API Structures parentIndex childCount childArray
The index number of this file’s parent. For example, this may be the index of a folder in which the sub file is stored, or file to which the sub file is attached. If a file does not have a parent, the parentIndex is -1.
The number of first-level children in the sub file.
Pointer to an array of first-level children in the sub file.
Discussion
The KVSubFileType_Main type applies to the following for each file format:
File format
MSG and EML
Zip files
PST files
MBX files
NSF files
PDF files
KVSubFileType_Main applies to...
the message body.
a file inside the archive.
an item that is not an attachment, an OLE object, or a root node.
a message in the MBX file.
an item that is not an attachment, an OLE object, or a root node.
an item that is not an attachment or a root node.
If the flag KVSubFileInfoFlag_NeedsExtraction is set, open the sub file and extract its children. See
“fpExtractSubFile()” on page .
The members parentIndex and childArray provide information about the sub file’s parent and children. This information can be used to recreate the file hierarchy on extraction. Since childArray only retrieves the first-level children in the sub file, you must call fpGetSubFileInfo() repeatedly until information for the leaf-node children is extracted. See
XML Export SDK C Programming Guide
KVSubFileMetaData
KVSubFileMetaData
This structure contains a count of the number of metadata elements extracted from a mail file, and a pointer to the first element of the array of elements. It is initialized by calling fpGetSubFileMetadata(). See
“fpGetSubFileMetaData()” on page . It is defined in kvxtract.h.
typedef struct tag_KVSubFileMetaData
{
KVStructHeader;
int nElem;
KVMetadataElem**
unsigned long
ppElem;
infoFlag;
}
KVSubFileMetaDataRec, *KVSubFileMetaData;
Member Descriptions
KVStructHeader
The KeyView version of the structure. See “KVStructHead” on page .
nElem The number of metadata fields contained in the array. ppElem infoFlag
Pointer to an array of pointers that are the memory addresses of metadata field values in the structure KVMetadataElem. See
.
A bitwise flag defining additional properties of the extracted metadata. The following flag is available:
KVSubFileMetaInfoFlag_CharsetConverted —Indicates the metadata’s character set was converted.
XML Export SDK C Programming Guide
•
•
•
•
•
•
181
Chapter 7 File Extraction API Structures
182
•
•
•
•
•
• XML Export SDK C Programming Guide
C HAPTER 8
XML Export API Functions
This section describes the functions in the XML Export API. These functions manage the input and output streams, and perform the document conversion.
Each function appears as a function prototype followed by a description of its arguments, return value, and discussion of its use. This section contains the following topics:
XML Export SDK C Programming Guide
•
•
•
•
•
•
183
Chapter 8 XML Export API Functions
184
•
•
•
•
•
• XML Export SDK C Programming Guide
KVXMLGetInterface()
KVXMLGetInterface()
This function is exported by the Export definition file. It supplies function pointers to other Export functions. When KVXMLGetInterface() is called, it assigns the function pointers in the structure
KVXMLInterface
to other functions described in this chapter. For example, KVXMLInterface.fpInit
is assigned to point to
KVXMLInit()
.
Syntax void pascal KVXMLGetInterface (KVXMLInterface *pInterface);
Arguments pInterface Pointer to the structure KVXMLInterface. See
Returns
None.
Discussion
One of the initial steps in using the XML Export API is to create an instance of a
KVXMLInterface
structure and use this function to gain access to other functions.
The functions can be called directly. For example, you can call
KVXMLGetSummaryInfo() instead of using fpGetSummaryInfo() in
KVXMLInterface
. However, it is recommended that you assign the function pointers in KVXMLInterface to the functions for efficiency.
XML Export SDK C Programming Guide
•
•
•
•
•
•
185
186
•
•
•
•
•
•
Chapter 8 XML Export API Functions
fpConvertStream()
This function converts either a source stream or file to an output stream.
Syntax
BOOL pascal fpConvertStream(
void *pContext,
void *pCallingContext,
KVInputStream *pInput,
KVOutputStream
KVXMLTemplate
*pOutput,
*pTemplates,
KVXMLOptions *pOptions,
KVXMLTOCOptions *pTOCCreateOptions,
KVXMLCallbacks
BOOL
*pCallbacks,
bIndex,
KVErrorCode *pError );
Arguments pContext pCallingContext pInput pOutput pTemplates
Pointer returned from fpInit().
Pointer passed back to the callback functions.
Pointer to the developer-assigned instance of
KVInputStream . The structure KVInputStream defines the input stream containing the source for the conversion. See
Pointer to the developer-assigned instance of
KVOutputStream . The structure KVOutputStream defines the output stream to which Export writes the generated XML.
See
Pointer to the data structure KVXMLTemplate. It defines the overall structure of the output. Individual elements within the structure define the markup written at specific points in the
output stream. See “KVXMLTemplate” on page .
If this pointer is NULL, the default values for the structure are used.
XML Export SDK C Programming Guide
fpConvertStream() pOptions pCallbacks
Pointer to the data structure KVXMLOptions. It defines the options that control the markup written in response to the general style and attributes (font, color, and so on) of the document. See
.
If this pointer is NULL, the default values for the structure are used. pTOCCreateOptions Pointer to the data structure KVXMLTOCOptions. It specifies whether a heading is included in the table of contents. See
.
If this pointer is NULL, the default values for the structure are used.
Pointer to the data structure KVXMLCallbacks. It is a structure of functions that Export calls for specific, user-defined purposes. See
If callbacks are not used, then this can be NULL. bIndex pError
Set this to TRUE to generate output with minimal markup and without images. Since the generated output is minimized to textual content, it is suitable for an indexing engine. If bIndex is set to FALSE, embedded images in a document are regenerated as separate files and stored in the output directory.
This can be set through the bIndexOnly member of the structure KVXMLOptions. See
.
To generate output with verbose markup and without images, set the nType argument of the function KVXMLConfig() to
KVCFG_SUPPRESSIMAGES
. See “KVXMLSetStyleSheet()” on page
.
Pointer to an error code if the call to fpConvertStream() fails.
Returns
If the call is successful, the return value is TRUE.
If the call is unsuccessful, the return value is FALSE.
Discussion
Only pContext
, pInput
, pOutput
, and bIndex
are required. All other pointers should be NULL when they are not set.
XML Export SDK C Programming Guide
•
•
•
•
•
•
187
188
•
•
•
•
•
•
Chapter 8 XML Export API Functions
If pCallbacks
is NULL, pOptions->pszDefaultOutputDirectory must be valid, except when bIndex is set to TRUE.
This function runs in-process or out of process. See
.
When converting out of process, this function must be called after the call to
KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession() .
See
“KVXMLStartOOPSession()” on page and
“KVXMLEndOOPSession()” on page .
When converting out of process, the values for the KVXMLTemplate,
KVXMLOptions , and KVXMLTOCOptions structures should be set to NULL.
These structures are already passed in the call to
KVXMLStartOOPSession()
.
See
“KVXMLStartOOPSession()” on page .
Example
The following sample code is from the cnv2xml sample program: if(!(*KVXMLInt.fpConvertStream)(
pKVXML,
NULL,
/* Pointer returned by fpInit() */
/* Pointer for callback functions */
&Input,
&Output,
/* Input stream
/* Output stream
*/
*/
NULL, /* Mark-up and related variables
&XMLOptions, /* Options
*/
*/
NULL,
NULL,
/* TOC options */
/* Pointer to callback functions */
FALSE, /* Index mode
&error))
*/
/* Error return value */
{
printf("Error converting %s to XML %d\n", argv[i - 1], error);
} else
{
printf("Conversion of %s to XML completed.\n\n", argv[i - 1]);
}
XML Export SDK C Programming Guide
fpFileToInputStreamCreate()
fpFileToInputStreamCreate()
This function creates an input stream from an input file.
Syntax
BOOL pascal _export fpFileToInputStreamCreate(
void *pContext,
char *pszFileName,
KVInputStream *pInput);
Arguments pContext pszFileName pInput
Pointer returned from fpInit().
Pointer to the name of the input file to be converted.
Pointer to the developer-assigned instance of KVInputStream.
The structure KVInputStream defines the input stream
containing the source for the conversion. See “KVInputStream” on page .
Returns
Discussion
After the conversion is complete, call fpFileToInputStreamFree()
to free the memory allocated by this function.
Example
If the call is successful, the return value is TRUE.
If this call is unsuccessful, the return value is FALSE. Processing is halted.
The following sample code is from the cnv2xml sample program: if(!(*KVXMLInt.fpFileToInputStreamCreate)(pKVXML, argv[i++],
&Input))
{
printf("Error creating input stream\n");
(*KVXMLInt.fpShutDown)(pKVXML);
mpFreeLibrary(hKVXML);
return (5);
}
XML Export SDK C Programming Guide
•
•
•
•
•
•
189
190
•
•
•
•
•
•
Chapter 8 XML Export API Functions
fpFileToInputStreamFree()
This function frees the memory used to create an input stream.
Syntax
BOOL pascal _export fpFileToInputStreamFree(
void *pContext,
KVInputStream *pInput);
Arguments pContext pInput
Pointer returned from fpInit().
Pointer to the developer-assigned instance of KVInputStream.
The structure KVInputStream defines the input stream containing
the source for the conversion. See “KVInputStream” on page .
Returns
If the call is successful, the return value is TRUE.
If this call is unsuccessful, the return value is FALSE. Processing is halted.
Discussion
After the conversion is complete, call this function to free the memory allocated by fpFileToInputStreamCreate() .
XML Export SDK C Programming Guide
fpFileToOutputStreamCreate()
fpFileToOutputStreamCreate()
This function creates an output stream from an output file.
Syntax
BOOL pascal _export fpFileToOutputStreamCreate(
void *pContext,
char *pszFileName,
KVOutputStream *pOutput );
Arguments pContext pszFileName pOutput
Pointer returned from fpInit().
Pointer to the name of the output file to be created.
Pointer to the developer-assigned instance of
KVOutputStream . The structure KVOutputStream defines the output stream to which Export writes the generated XML.
See
Returns
Discussion
After the conversion is complete, call fpFileToOutputStreamFree()
to free the memory allocated by this function.
Example
If the call is successful, the return value is TRUE.
If this call is unsuccessful, the return value is FALSE. Processing is halted.
The following sample code is from the cnv2xml sample program: if (!(*KVXMLInt.fpFileToOutputStreamCreate)(pKVXML, argv[i],
&Output))
{
printf("Error creating output stream\n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpShutDown)(pKVXML);
mpFreeLibrary(hKVXML);
return 6;
}
XML Export SDK C Programming Guide
•
•
•
•
•
•
191
192
•
•
•
•
•
•
Chapter 8 XML Export API Functions
fpFileToOutputStreamFree()
This function frees the memory used to create the output stream.
Syntax
BOOL pascal _export fpFileToOutputStreamFree(
void *pContext,
KVOutputStream *pOutput );
Arguments pContext pOutput
Pointer returned from fpInit().
Pointer to the developer-assigned instance of
KVOutputStream . The structure KVOutputStream defines the output stream to which Export writes the generated XML.
See “KVOutputStream” on page .
Returns
If the call is successful, the return value is TRUE.
If this call is unsuccessful, the return value is FALSE. Processing is halted.
Discussion
After the conversion is complete, call this function to free the memory allocated by fpFileToOutputStreamCreate() .
XML Export SDK C Programming Guide
fpGetAnchor()
fpGetAnchor()
This function gets the filename automatically generated by Export and used for external graphics referenced with <a xmlns:xlink= xlink href=> tags and for heading-level table of contents entries.
Syntax
BOOL pascal fpGetAnchor(
void *pCallingContext,
KVXMLAnchorType
char
eAnchorType,
*pszAnchor,
int cbAnchorMax,
BYTE
UINT
*pcHTML,
cbHTML);
Arguments pCallingContext Pointer passed back to the callback functions.
eAnchorType Graphic or block anchor type for the output stream. It must be one of the enumerated types defined in KVXMLAnchorType.
.
pszAnchor cbAnchorMax pcHTML cbHTML
Pointer to the location in which the new anchor is stored.
Maximum number of bytes to place in pszAnchor.
Pointer to either the markup defining the contents of the table of contents entry, a pointer to the external graphic name, or NULL.
Number of valid bytes in pcHTML.
Returns
If the call is successful, the return value is TRUE.
If this call is unsuccessful, the return value is FALSE. Processing is halted.
Discussion
pszAnchor must be assigned. It may be derived from the cbAnchorMax
, pcHTML , and cbHTML values that are also provided.
pcHTML
may be NULL if the graphic is an internal part of the document.
XML Export SDK C Programming Guide
•
•
•
•
•
•
193
Chapter 8 XML Export API Functions
This function is exposed so that it may be called from the
GetAnchor() callback function to obtain default behavior for anchor types the callback is not set to handle.
194
•
•
•
•
•
• XML Export SDK C Programming Guide
fpGetConvertFileList()
fpGetConvertFileList()
This function gets the list of files automatically converted to XML during a call to fpConvertStream() or KVXMLConvertFile() .
Syntax char ** pascal _export fpGetConvertFileList(
void *pContext,
int *pnSize );
Arguments pContext pnSize
Pointer returned from fpInit().
Pointer to the number of files generated by the conversion.
Returns
If no files are converted, the return value is a NULL pointer. Otherwise, the return value is a pointer to an array of strings that provides the available path information for each converted file.
Discussion
The array of file path information includes all externally generated files, including graphic files. Note that the main output file is not included in the array, nor in the count of the number of files converted.
The memory used by the array of file path information is freed by the API.
The array is not valid after a call to fpShutDown()
.
This function runs in-process or out of process. See
.
When converting out of process, this function must be called after the call to
KVXMLStartOOPSession()
and before the call to
KVXMLEndOOPSession()
.
See
“KVXMLStartOOPSession()” on page and
“KVXMLEndOOPSession()” on page .
XML Export SDK C Programming Guide
•
•
•
•
•
•
195
196
•
•
•
•
•
•
Chapter 8 XML Export API Functions
fpGetStreamInfo()
This function extracts file format information and character set from the source document.
Syntax
BOOL pascal _export fpGetStreamInfo (
void *pContext,
KVInputStream
KVStreamInfo
*pInput,
*pStreamInfo );
Arguments pContext pInput pStreamInfo
Pointer returned from fpInit().
Pointer to the developer-assigned instance of KVInputStream.
The structure KVInputStream defines the input stream containing
the source for the conversion. See “KVInputStream” on page .
Pointer to the developer-assigned instance of KVStreamInfo. The structure KVStreamInfo defines the input stream document type and character set. See
You can examine the fields in the structure to determine the appropriate template to use based on the document type.
Returns
If the call is successful, the return value is TRUE.
If this call is unsuccessful, the return value is FALSE.
XML Export SDK C Programming Guide
fpGetSummaryInfo()
fpGetSummaryInfo()
Syntax
BOOL pascal _export fpGetSummaryInfo(
void *pContext,
KVInputStream *pInput,
KVSummaryInfoEx *pSummary,
BOOL bFree );
Arguments pContext pInput pSummary bFree
Pointer returned from fpInit().
Pointer to the developer-assigned instance of
KVInputStream . The KVInputStream structure points to the input stream containing the source for the conversion. See
.
Points to the developer-assigned instance of
KVSummaryInfoEx
. See “KVSummaryInfoEx” on page .
In this structure, nElem provides a count of the number of metadata elements, and pElem points to the first element of the array of individual elements as defined by the structure
KVSumInfoElemEx
. See “KVSumInfoElemEx” on page
.
Flag to free or fill the memory allocated to the document metadata.
Returns
If the call is successful, the return value is TRUE. When the document does
not contain metadata, but the document reader can extract metadata from the specified format, then this function returns TRUE with nElem
set to 0.
If this call is unsuccessful, the return value is FALSE. This function returns
FALSE when the document reader does not support metadata extraction for the specified format, or there is an error in extraction. The section
lists the file formats for which metadata can be determined.
XML Export SDK C Programming Guide
•
•
•
•
•
•
197
Chapter 8 XML Export API Functions
Discussion
For metadata to be extracted by Export, metadata must be defined in the source document, and the document reader must be able to extract metadata
for the file format. The section “Supported Formats” on page
lists the file formats for which metadata can be determined. Export does not generate metadata automatically from the document contents.
This function runs in-process or out of process. See “Convert Files Out of
.
This function may be called any time after the call to KVXMLInit() .
When converting out of process, this function must be called after the call to
KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession() .
See
“KVXMLStartOOPSession()” on page and
“KVXMLEndOOPSession()” on page .
Call this function with bFree
set to FALSE to return an array of
KVSummaryInfoEx structures, each containing an element of available document metadata.
After processing the information in the structure, call this function with bFree set to TRUE to free the memory allocated to the document metadata.
198
•
•
•
•
•
• XML Export SDK C Programming Guide
fpInit()
fpInit()
This function initializes an Export session. Its return value, pContext
, is passed as the first parameter to the File Extraction interface and all other Export functions.
Syntax
Arguments void* pascal _export fpInit(
KVMemoryStream *pMemAllocator,
char
char
*pszKeyViewDir,
*pszDataFile,
KVErrorCode
DWORD
*pError,
dWord); pMemAllocator Pointer to a developer-defined memory allocator. If NULL is passed, then the default C run-time memory allocation is used.
pszKeyViewDir Pointer to the directory where the Export components are located.
This is normally the directory install \ OS \bin , where install is the pathname of the Export installation directory and OS is the name of the operating system.
pszDataFile Pointer to the directory and filename of the Export data file, formats_e.ini
. This file determines whether a format is supported. If a format does not exist in this file, the conversion fails.
The formats_e.ini file is normally stored in the directory install \ OS \bin , where install is the pathname of the Export installation directory and OS is the name of the operating system.
See
“File Format Detection” on page
for more information. pError dWord
Pointer to an error code defined in KVErrorCode or
KVErrorCodeEx in kvtypes.h. See
and
Reserved. Must be 0.
Returns
If the call is successful, the return value is a pointer passed to all other functions.
If the call is unsuccessful, the return value is a NULL pointer.
XML Export SDK C Programming Guide
•
•
•
•
•
•
199
200
•
•
•
•
•
•
Chapter 8 XML Export API Functions
Discussion
If pszKeyViewDir
is NULL, the required components cannot be found. Ensure it is valid.
If this function returns NULL, check stderr
for the KeyView installation error messages, “ KeyView Export SDK License Key has Expired ” and
“
KeyView Export SDK License Key is Invalid
”, and pass them to your application. See the Export SDK Installation Instructions for more information on the KeyView license feature.
To ensure multi-threaded conversions are thread-safe, you must create a unique context pointer for every thread by calling fpInit()
. In addition, threads must not share context pointers, and the same context pointer must be used for all API calls in the same thread. Creating a context pointer for every thread does not affect performance because the context pointer uses minimal resources.
When the conversion context is no longer required, it should be terminated by calling fpShutdown()
.
Example
The following sample code is from the cnv2xml sample program: pKVXML = (*KVXMLInt.fpInit)(NULL, ".", NULL, &error, 0); if(!pKVXML)
{
printf("Error initializing KVXML: %d\n", error);
mpFreeLibrary(hKVXML);
return 4;
}
XML Export SDK C Programming Guide
fpSetStyleMapping()
fpSetStyleMapping()
This function is used to set the mapping for user-defined styles. Export does not make a distinction between paragraph styles or character styles, but operates under the assumption that each style has a unique name.
Syntax
BOOL pascal _export fpSetStyleMapping(
void *pContext,
KVStyle *pStyles,
int iStyles,
BOOL bCopy);
Arguments pContext pStyles iStyles bCopy
Pointer returned from fpInit().
Pointer to the developer-assigned instance of KVStyle. See
“KVStyle” on page . The KVStyle structure defines the
elements of a custom style.
Number of elements in the pStyles array.
If Export is to allocate memory to copy the pStyles array, set this to TRUE. If pStyles remains valid throughout the conversion process, set this to FALSE.
Returns
If the call is successful, the return value is TRUE.
If this call is unsuccessful, the return value is FALSE.
Discussion
Paragraph styles are presently implemented only for documents in Microsoft
Word, RTF, Folio Flat files, WordPro, and WordPerfect 6.x.
This function runs in-process or out of process. See
.
When converting out of process, this function must be called after the call to
KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession() .
See
“KVXMLStartOOPSession()” on page and
“KVXMLEndOOPSession()” on page .
XML Export SDK C Programming Guide
•
•
•
•
•
•
201
Chapter 8 XML Export API Functions
Once this API function is called, the styles are valid until fpShutDown()
is called, or until this function is called again with a new style or NULL.
202
•
•
•
•
•
• XML Export SDK C Programming Guide
fpShutDown()
fpShutDown()
This function terminates an Export session that was initialized by fpInit()
, and frees allocated system resources. It is called when the conversion context is no longer required.
Syntax void pascal _export fpShutDown(KVXMLContext *pContext);
Arguments
Pointer returned from fpInit().
pContext
Returns
None.
Discussion
After this function is called, the pContext pointer must not be passed to any XML
Export API.
XML Export SDK C Programming Guide
•
•
•
•
•
•
203
Chapter 8 XML Export API Functions
fpValidateTemplate()
This function is used to ensure that the markup is well-formed and valid according to the DTD. It is currently not implemented.
204
•
•
•
•
•
• XML Export SDK C Programming Guide
KVXMLConfig()
KVXMLConfig()
This function is called directly and provides a way to configure options prior to the document conversion. Currently, the function is used for the following configurations:
Generate output without images
Generate output with verbose markup and without images. To generate output with minimal markup (ID and style paragraph attributes) and without images, set the bIndexOnly
member of the structure
KVXMLOptions
. See
Enable PDF position information
Include position information in the markup generated for a PDF document.
Configure PDF bookmarks
Specify whether bookmarks in a PDF file are converted to simple XLinks in the
XML output.
Configure Word bookmarks
Disable the conversion of Microsoft Word bookmarks to zone elements.
Designate temporary directory
Specify a directory in which temporary files created during XML conversion processes are stored.
NOTE On Windows systems, there is a 64K size limit to the temp directory. Once the limit is reached, you must either create a new directory or delete the contents of the existing directory; otherwise, you may receive an error message.
Configure XML conversion
Specify the elements and attributes extracted from an XML document based on the files document type.
Enable PDF logical reading order
Convert paragraphs in PDF files in the order in which they appear on the page and with left-to-right or right-to-left paragraph direction. See
Files to a Logical Reading Order” on page .
Configure PDF soft hyphens
XML Export SDK C Programming Guide
•
•
•
•
•
•
205
Chapter 8 XML Export API Functions
206
•
•
•
•
•
•
Specify whether soft hyphens are removed from the XML output. See “Control
Enable Revision Marks
Converts text and graphics that were deleted from a document with revision tracking enabled and includes revision tracking information in the XML output.
See
“Convert Revision Tracking Information” on page .
Protected file password
Specifies the password to use to open a password-protected file for export.
Syntax
Arguments
KVErrorCode pascal KVXMLConfig(
void *pContext,
int nType,
int nValue,
void *p ); pContext Pointer returned from fpInit().
nType The configuration flag. This is a symbolic constant defined in kvtypes.h
. The available options are described in
nValue p
Integer value defined for the flags above.
This is TRUE or FALSE for all flags except KVCFG_LOGICALPDF,
KVCFG_SETTEMPDIRECTORY , and KVCFG_SETXMLCONFIGINFO.
For KVCFG_LOGICALPDF, this is one of the paragraph direction options defined in the LPDF_DIRECTION enumerated type in kvtypes.h. See
.
For KVCFG_SETTEMPDIRECTORY and KVCFG_SETXMLCONFIGINFO, this is not set.
The data for the configuration flag.
This is NULL for all flags except KVCFG_SETTEMPDIRECTORY and
KVCFG_SETXMLCONFIGINFO .
For KVCFG_SETTEMPDIRECTORY, this is path to the directory where temporary files are stored.
For KVCFG_SETXMLCONFIGINFO, this is a pointer to the
KVXConfigInfo structure. See
For KVCFG_SETPASSWORD, this is the source file password.
XML Export SDK C Programming Guide
KVXMLConfig()
Configuration Flags
The following flags are available for the nType argument in KVXMLConfig() .
These flags are defined in kvtypes.h
.
Flag
KVCFG_SUPPRESSIMAGES
KVCFG_ENABLEPOSITIONINFO
KVCFG_SUPPRESSTOCPRINTIMAGE
KVCFG_DISABLEZONE
Description
If KVCFG_SUPPRESSIMAGES is set, the XML output includes verbose markup, but no images. If this option is not set, then embedded images in a document are regenerated as separate files and stored in the output directory. To generate output with minimal markup (ID and style paragraph attributes) and without images, set the bIndexOnly member of the structure
KVXMLOptions to TRUE. See
If KVCFG_ENABLEPOSITIONINFO is set, then a position element is included in the markup for PDF documents. The position element defines the absolute position of the text relative to the bottom left corner of the page, and includes additional information such as font and color.
If the flag KVCFG_SUPPRESSTOCPRINTIMAGE is set, then bookmarks in a PDF file are not converted to simple XLinks in the XML output. By default, PDF bookmarks are converted to source and destination anchors. For example,
<a xmlns:xlink="http://www.w3.org/TR/xlink" xlink:href="#bmk1">Highlight File Format</a>
<a xmlns:xlink="http://www.w3.org/TR/xlink" name="bmk1"/><img src="pdf14640.jpg"/>
If the flag KVCFG_DISABLEZONE is set, the conversion of
Microsoft Word bookmarks to zone elements (<zone name
=“xxx” >) in the output XML is disabled.
A bookmark in Microsoft Word documents is a name given to a selected area of the document. The bookmark may enclose words, paragraphs, tables, table cells, lists, list items or the entire document. In XML Export, bookmarks are converted to zone elements (<Zone name="xxx">) using the KeyView
KVT_ZONE token.
Depending on how bookmarks are defined in the original document, the creation of zone elements may result in malformed XML. In this case, you can disable zone creation to avoid these validity errors. Zone element creation is enabled by default.
XML Export SDK C Programming Guide
•
•
•
•
•
•
207
208
•
•
•
•
•
•
Chapter 8 XML Export API Functions
Flag
KVCFG_SETTEMPDIRECTORY
KVCFG_SETXMLCONFIGINFO
KVCFG_LOGICALPDF
KVCFG_DELSOFTHYPHEN
Description
The flag KVCFG_SETTEMPDIRECTORY enables you to specify the directory in which temporary files created during conversion processes are stored. By default, the system temporary directory is used.
To define a directory for temporary files generated during an out-of-process conversion, set the tempfilepath parameter in the formats_e.ini file. See
“Convert Files Out of Process” on page
.
NOTE: On Windows systems, there is a 64K size limit to the temp directory. Once the limit is reached, you must either create a new directory or delete the contents of the existing directory; otherwise, you may receive an error message.
The flag KVCFG_SETXMLCONFIGINFO enables you to define which elements and attributes are extracted from XML documents with a specified format ID or root element. This can be used to override the default settings for the supported XML formats (see
“Convert XML Files” on page ), or to define
settings for custom XML document types.
The settings are defined in the KVXConfigInfo structure (see
). To set custom settings for more than one document type, call the KVXMLConfig() function once for each type.
Element extraction settings can also be modified using the kvxconfig.ini
file. See
“Configure Element Extraction for
The flag KVCFG_LOGICALPDF converts paragraphs in a PDF file in the order in which they appear on the page (logical reading order). The nValue argument specifies the paragraph direction. See
“Convert PDF Files to a Logical Reading Order” on page
.
If the flag KVCFG_DELSOFTHYPHEN is set, soft hyphens in the source document are removed, and the hyphenated words are joined in the XML output. By default, soft hyphens are
maintained. See “Control Hyphenation” on page .
It is recommended you remove soft hyphens if you use Export to generate text output for an indexing engine or are not concerned with maintaining the document’s layout. See
“fpConvertStream()” on page or
“KVXMLConvertFile()” on page for more information on running Export in index mode.
XML Export SDK C Programming Guide
KVXMLConfig()
Flag Description
KVCFG_INCLREVISIONMARK
KVCFG_WP_NOCOMMENTS
KVCFG_PG_HIDEHIDDENSLIDE
KVCFG_PG_HIDECOMMENT
If this flag is set to TRUE, text and graphics that were deleted from a document with a revision tracking feature enabled is converted, and revision tracking information is included in the
XML output.
To reset the flag and exclude deleted content and revision tracking information from the XML output, set the flag to FALSE.
See
“Convert Revision Tracking Information” on page . The
default is FALSE.
Set to TRUE not to export text from comments in Microsoft Word documents. Comment text is exported by default from Microsoft
Word 97 to 2003 files.
Comment output can also be toggled by modifying the formats_e.ini
file. See
KVCFG_WP_SHOWHIDDENTEXT
KVCFG_WP_SHOWDATEFIELDCODE
KVCFG_SS_SHOWCOMMENTS
KVCFG_SS_SHOWFORMULA
Set to TRUE to export hidden text from Microsoft Word documents.
Set to TRUE to export date field codes from Microsoft Word documents.
KVCFG_WP_SHOWFILENAMEFIELDCODE Set to TRUE to export the file name field code from Microsoft
Word documents.
KVCFG_SS_SHOWHIDDENINFOR Set to TRUE to export hidden information from Microsoft Excel files.
Set to TRUE to export comments from Microsoft Excel files.
Set to TRUE to export formulas from Microsoft Excel files.
Set to TRUE not to export hidden slides from Microsoft
PowerPoint files.
Set to TRUE not to export comments from Microsoft PowerPoint files. Comments are exported by default from PowerPoint 97 to
2000 files.
XML Export SDK C Programming Guide
•
•
•
•
•
•
209
Chapter 8 XML Export API Functions
210
•
•
•
•
•
•
Flag
KVCFG_PG_SHOWCOMMENTSSLIDE
KVCFG_PG_SHOWSLIDNOTES
KVCFG_SETPASSWORD
Description
Set to TRUE to export comments slides from Microsoft
PowerPoint 2003 and 2007 files.
Set to TRUE to export slide notes from Microsoft PowerPoint files.
Slide note output can also be toggled by modifying the formats_e.ini
file. See
This flag enables you to define a password used to open a password-protected file for export. See
nValue is TRUE.
p is the source file password, which can have a maximum length of 255 characters (the final byte is null).
Returns
The return value is one of the error codes defined in
KVErrorCode
in kvtypes.h
.
Discussion
This function must be called after the call to fpInit() and before the call to fpConvertStream()
or
KVXMLConvertFile()
.
This function runs in-process or out of process. See
.
When converting out of process, this function must be called after the call to
KVXMLStartOOPSession()
and before the call to
KVXMLEndOOPSession()
.
See
“KVXMLStartOOPSession()” on page and
“KVXMLEndOOPSession()” on page .
Examples
To generate verbose markup, but no images:
(*fpXMLConfig)(pKVXML, KVCFG_SUPPRESSIMAGES, TRUE, NULL);
To specify bookmarks in a PDF file are not converted to XLinks in the XML output:
(*fpXMLConfig)(pKVXML, KVCFG_SUPPRESSTOCPRINTIMAGE, TRUE,
NULL);
To disable the conversion of zone elements:
XML Export SDK C Programming Guide
KVXMLConfig()
(*fpXMLConfig)(pKVXML, KVCFG_DISABLEZONE, TRUE, NULL);
To set a directory for temporary files:
char tmpDir[250]; strcpy (tmpDir, "c:\\temp\\xmlexport");
(*fpXMLConfig)(pKVXML, KVCFG_SETTEMPDIRECTORY, 0, tmpDir);
To specify custom extraction settings for conversion of an XML file:
KVXConfigInfo xinfo; /* populate xinfo */
(*fpXMLConfig)(pKVXML, KVCFG_SETXMLCONFIGINFO, 0, &xinfo);
To specify PDF files are converted to a logical reading order, and the paragraph direction for the PDF output is left to right:
(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_LTR, NULL);
To specify PDF files are converted to a logical reading order, and the paragraph direction for the PDF output is right to left:
(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_RTL, NULL);
To specify PDF files are converted to a logical reading order, and the paragraph direction for the PDF output is determined on the fly for each page:
(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_AUTO, NULL);
To specify soft hyphens are removed from the XML output:
(*fpXMLConfig)(pKVXML, KVCFG_DELSOFTHYPHEN, TRUE, NULL);
To convert text and graphics that are identified by revison marks:
(*fpXMLConfig)(pKVXML, KVCFG_INCLREVISIOMARK, TRUE, NULL);
To toggle hidden data output from Microsoft Word documents, use one of the
KVCFG_WP flags:
(*fpXMLConfig)(pKVXML, KVCFG_WP_NOCOMMENTS, TRUE, NULL);
To toggle hidden data output from Microsoft Excel documents, use one of the
KVCFG_SS flags:
(*fpXMLConfig)(pKVXML, KVCFG_SS_SHOWHIDDENINFOR, TRUE, NULL);
To toggle hidden data output from Microsoft PowerPoint documents, use one of the KVCFG_PG flags:
(*fpXMLConfig)(pKVXML, KVCFG_PG_HIDEHIDDENSLIDE, TRUE, NULL);
To specify a password to open a password-protected file for export:
(*fpXMLConfig)(pKVXML, KVCFG_SETPASSWORD, TRUE, password); where password is a null-terminated string of 255 or fewer characters.
XML Export SDK C Programming Guide
•
•
•
•
•
•
211
Chapter 8 XML Export API Functions
212
•
•
•
•
•
•
To include a position element in the markup for PDF documents:
(*fpXMLConfig)(pKVXML, KVCFG_ENABLEPOSITIONINFO, TRUE, NULL);
Using the PDF position element significantly changes the generated markup.
For example, without the option, the XML output from a section of a PDF document looks like this:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE VerityXMLExport (View Source for full doctype...)>
- <VerityXMLExport>
- <WP>
- <p id="p1" font-size="33pt">
<img src="ecpe.pdf38760.jpg" height="140px" width="292px" />
Economic Fiscal Update
<font size="18pt" color="#777777">Theand</font>
<font size="14pt" color="#ffffff">October 30, 2002</font>
<font size="29pt" color="#a4a4a4">Overview</font>
</p>
With the option enabled, the same section of the PDF document looks like this:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE VerityXMLExport (View Source for full doctype...)>
- <VerityXMLExport>
- <WP>
<Position style="position:absolute;top:534px;left:254px;font-family:'Times New
Roman';font-size:33pt;white-space:nowrap;" />
<Position style="position:absolute;top:393px;left:254px;white-space:nowrap;" />
<img src="ecpe.pdf36000.jpg" height="140px" width="292px" />
<Position style="position:absolute;top:308px;left:256px;font-family:'Times New
Roman';font-size:33pt;white-space:nowrap;" />
Economic
<Position style="position:absolute;top:346px;left:256px;font-family:'Times New
Roman';font-size:33pt;white-space:nowrap;" />
Fiscal Update
<Position style="position:absolute;top:298px;left:281px;font-family:'Times New
Roman';font-size:18pt;color:#777777;background-color:#ffffff;white-space:nowrap;"
/>
The
<Position style="position:absolute;top:336px;left:299px;font-family:'Times New
Roman';font-size:18pt;color:#777777;background-color:#ffffff;white-space:nowrap;"
/>
and
<Position style="position:absolute;top:543px;left:397px;font-family:'Times New
Roman';font-size:14pt;color:#ffffff;background-color:#000000;white-space:nowrap;"
/>
October 30, 2004
XML Export SDK C Programming Guide
KVXMLConfig()
<Position style="position:absolute;top:627px;left:382px;font-family:'Times New
Roman';font-size:29pt;color:#a4a4a4;background-color:#ffffff;white-space:nowrap;"
/>
Overview
XML Export SDK C Programming Guide
•
•
•
•
•
•
213
214
•
•
•
•
•
•
Chapter 8 XML Export API Functions
KVXMLConvertFile()
This function is called directly and converts a source file to an output file.
Syntax
BOOL pascal KVXMLConvertFile (
void *pContext,
void
char
*pCallingContext,
*pInFileName,
char *pOutFileName,
KVXMLTemplate *pTemplates,
KVXMLOptions *pOptions,
KVXMLTOCOptions *pTOCCreateOptions,
KVXMLCallbacks
BOOL
*pCallbacks,
bIndex,
KVErrorCode *pError)
Arguments pContext pCallingContext pInFileName pOutFileName pTemplates pOptions
Pointer returned from fpInit().
Pointer passed back to the callback functions.
Pointer to the input file.
Pointer to the output file.
Pointer to the data structure KVXMLTemplate. It defines the overall structure of the output. Individual elements within the structure define the markup written at specific points in the
output stream. See “KVXMLTemplate” on page
.
If this pointer is NULL, the default values for the structure are used.
Pointer to the data structure KVXMLOptions. It defines the options that control the markup written in response to the general style and attributes (font, color, and so on) of the
document. See “KVXMLOptions” on page .
If this pointer is NULL, the default values for the structure are used.
XML Export SDK C Programming Guide
KVXMLConvertFile() pTOCCreateOptions Pointer to the data structure KVXMLTOCOptions. It specifies whether a heading is included in the table of contents. See
If this pointer is NULL, the default values for the structure are used. pCallbacks Pointer to the data structure KVXMLCallbacks. It is a structure of functions that Export calls for specific, user-defined
purposes. See “KVXMLCallbacks” on page .
If callbacks are not used, then this can be NULL. bIndex pError
Set this to TRUE to generate output with minimal markup and without images. Since the generated output is minimized to textual content, it is suitable for an indexing engine. If bIndex is set to FALSE, embedded images in a document are regenerated as separate files and stored in the output directory.
This can also be set through the bNoPictures member in the template files.
Pointer to an error code if the call to KVXMLConvertFile() fails.
Returns
If the call is successful, the return value is TRUE.
If the call is unsuccessful, the return value is FALSE.
Discussion
Only pContext
, pInFileName
, pOutFileName
, and bIndex
are required. All other pointers should be NULL when they are not set.
If pCallbacks
is NULL, pOptions->pszDefaultOutputDirectory valid, except when bIndex is set to TRUE.
must be
This function runs in-process or out of process. See
.
When converting out of process, this function must be called after the call to
KVXMLStartOOPSession() and before the call to KVXMLEndOOPSession() .
See
“KVXMLStartOOPSession()” on page and
“KVXMLEndOOPSession()” on page .
When converting out of process, the values for the KVXMLTemplate,
KVXMLOptions , and KVXMLTOCOptions structures should be set to NULL.
These structures are already passed in the call to
KVXMLStartOOPSession()
.
See
“KVXMLStartOOPSession()” on page .
XML Export SDK C Programming Guide
•
•
•
•
•
•
215
Chapter 8 XML Export API Functions
Example if(!(*KVXMLInt.KVXMLConvertFile)(
pKVXML, /* Pointer returned by fpInit() */
NULL,
&InputFile,
&OutputFile,
/* Input file */
/* Output file */
&XMLTemplates,
&XMLOptions,
/* Mark-up and related variables
/* Options
*/
*/
NULL,
NULL,
/* TOC options */
/* Pointer to callback functions */
FALSE, /* Index mode
&error))
*/
/* Error return value */
{
printf("Error converting %s to XML %d\n", argv[i - 1], error);
} else
{
printf("Conversion of %s to XML completed.\n\n", argv[i - 1]);
}
216
•
•
•
•
•
• XML Export SDK C Programming Guide
KVXMLEndOOPSession()
KVXMLEndOOPSession()
This function terminates the current out-of-process conversion session, and releases the source data and resources related to the session.
Syntax
BOOL pascal KVXMLEndOOPSession(
void *pContext,
BOOL bKeepServantAlive,
KVErrorCodeEx *pError
DWORD
void
dwOptions,
*pReserved1,
void *pReserved2 );
Arguments pContext pError
Pointer returned from fpInit().
bKeepServantAlive Set this to TRUE to keep a Servant process active after the
Export out-of-process session is terminated. If the Servant remains active, subsequent conversion requests are processed more quickly because the Servant is already prepared to receive data.
Set this to FALSE to terminate the Export out-of-process session and the associated Servant process.
Pointer to an error code defined in KVErrorCodeEx in kvtypes.h
. dwOptions pReserved1 pReserved2
Reserved for future use.
Reserved for future use.
Reserved for future use.
Returns
If the call is successful, the return value is TRUE.
If the call is unsuccessful, the return value is FALSE.
Example
The following sample code is from the cnv2xmloop sample program:
XML Export SDK C Programming Guide
•
•
•
•
•
•
217
218
•
•
•
•
•
•
Chapter 8 XML Export API Functions
/* declare endsession function pointer */
BOOL (pascal *fpKVXMLEndOOPSession)( void
BOOL ,
KVErrorCode *,
DWORD
void
,
*,
void *);
*,
/* assign OOP endsession function pointer */ fpKVXMLEndOOPSession = (BOOL (pascal *)( void
BOOL ,
KVErrorCode *,
*,
DWORD
void
,
*,
void * ))mpGetProcAddress(hKVXML,
"KVXMLEndOOPSession"); if(!fpKVXMLEndOOPSession)
{
printf("Error assigning KVXMLEndOOPSession() pointer\n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);
mpFreeLibrary(hKVXML);
return 8;
}
/********END OOP SESSION, DO NOT KEEP SERVANT ALIVE *********/ if(!(*fpKVXMLEndOOPSession)(pKVXML,
FALSE,
&error,
0,
NULL,
NULL))
{
printf("Error calling fpKVXMLEndOOPSession \n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);
(*KVXMLInt.fpShutDown)(pKVXML);
mpFreeLibrary(hKVXML);
return 10;
}
XML Export SDK C Programming Guide
KVXMLSetStyleSheet()
KVXMLSetStyleSheet()
This function is called directly and is used to specify the full path and filename of an external Style Sheet (XSL or CSS).
Syntax
BOOL pascal KVXMLSetStyleSheet(
void *pContext,
char
char
*pszStyleSheetName,
*pszRef);
Arguments pContext Pointer returned from fpInit().
pszStyleSheetName Pointer to the full path and filename of the style sheet.
pszUrlRef Pointer to the URL or filename of style sheet.
Returns
If the call is successful, the return value is TRUE.
If this call is unsuccessful, the return value is FALSE.
Discussion
When the value for eStyleSheetType in KVXMLOptions is set to XML_XSL or
XML_CSS
, an external style sheet is referenced by a processing instruction of the form:
<?xml-stylesheet href="pszRef" type="text/xsl"?> or
<?xml-stylesheet href="pszRef" type="text/css"?>
If the value for pszStyleSheetName includes the output directory, the href only consists of the filename since the XML output resides in the same directory as the style sheet file.
If the value for pszStyleSheetName
points to a directory other than the output directory, the href consists of the full path and filename.
Style sheet information cannot be written to an external can only reference an existing XSL style sheet.
XSL
file. XML Export
XML Export SDK C Programming Guide
•
•
•
•
•
•
219
220
•
•
•
•
•
•
Chapter 8 XML Export API Functions
When
XML_CSS
is specified, a CSS file can be created based on pszStyleSheetName .
If the name of the CSS is not specified by using this function, a CSS style file is created with an automatically-generated filename.
This function runs in-process or out of process. See “Convert Files Out of
.
If this function is used to specify the name of the style file, that file is referenced in the processing instruction.
If the CSS file does not exist in the specified location, it is created.
If it exists, but is empty, CSS styles are written to it.
If the CSS file exists and is not empty, the file is not altered. There is no attempt made to validate the file.
If there are multiple calls made to fpConvertStream()
or
KVXMLConvertFile() , and the name of the style sheet has been set using
KVXMLSetStyleSheet
, the filename can be disabled by calling
KVXMLSetStyleSheet again with the pszStyleSheetName and pszRef set to
NULL. The filename can then be set to a different value by calling
KVXMLSetStyleSheet with the new filename prior to the next call to fpConvertStream()
or
KVXMLConvertFile()
.
When converting out of process, this function must be called after the call to
KVXMLStartOOPSession()
and before the call to
KVXMLEndOOPSession()
.
See
“KVXMLStartOOPSession()” on page and
“KVXMLEndOOPSession()” on page .
XML Export SDK C Programming Guide
KVXMLStartOOPSession()
KVXMLStartOOPSession()
This function performs the following:
Initializes the out-of-process session.
Specifies the input stream or file.
Sets conversion options in the KV X
KVXMLTOCOptions
data structures.
MLTemplate , KVXMLOptions , and
Creates a Servant process.
Establishes a communication channel between the application thread and the
Servant.
Sends the data to the Servant.
Syntax
BOOL pascal KVXMLStartOOPSession(
void *pContext,
KVInputStream
char
*pInputStream,
*pFileName,
KVXMLTemplate
KVXMLOptions
*pTemplates,
*pOptions,
KVXMLTOCOptions
DWORD
*pTOCCreateOptions
*pPID,
KVErrorCode
DWORD
*pError
dwOptions,
void
void
*pReserved1,
*pReserved2 );
XML Export SDK C Programming Guide
•
•
•
•
•
•
221
Chapter 8 XML Export API Functions
222
•
•
•
•
•
•
Arguments pContext pInputStream pFileName pTemplatesEx dwOptions pReserved1 pReserved2
Pointer returned from fpInit().
Pointer to the developer-assigned instance of
KVInputStream . The structure KVInputStream defines the input stream containing the source for the conversion.
If pInput is defined, then pFileName must be NULL. The input data can be defined as a data stream or file, but not both.
Pointer to the file to be converted. The file must exist on the same file system as the Servant.
If pFileName is defined, then pInput must be NULL. The input data can be defined as a data stream or file, but not both.
Pointer to the data structure KVXMLTemplate. It defines the overall structure of the output. Individual elements within the structure define the markup written at specific points in the output stream. See
If this pointer is NULL, the default values for the structure are used.
pOptionsEx Pointer to the data structure KVXMLOptions. It defines the options that control the markup written in response to the general style and attributes (font, color, and so on) of the
document. See “KVXMLOptions” on page .
If this pointer is NULL, the default values for the structure are used.
pTOCCreateOptions Pointer to the data structure KVXMLTOCOptions. It specifies whether a heading is included in the table of contents. See
If this pointer is NULL, the default values for the structure are used.
pPID pError
Address of a DWORD into which the Servant process ID is returned.
Pointer to an error code defined in KVErrorCode in kvtypes.h
.
Reserved for future use.
Reserved for future use.
Reserved for future use.
XML Export SDK C Programming Guide
KVXMLStartOOPSession()
Returns
Discussion
After the out-of-process session is started successfully, all conversion functions can be called. The data is then processed on the Servant until the
session is terminated by a call to KVXMLEndOOPSession()
.
All functions that can run out of process must be called within the out-of-process session, that is, after the call to
KVXMLStartOOPSession()
, and before the call to KVXMLEndOOPSession() .
The
KVXMLConvertFile()
, and fpGetSummary() called once in a single out-of-process session.
functions can only be
Since the KVXMLTemplate,
KVXMLOptions
, and
KVXMLTOCOptions
data structures are passed by this function, the same pointers in the call to
KVXMLConvertFile()
are ignored.
Example
If the call is successful, the return value is TRUE.
If the call is unsuccessful, the return value is FALSE.
The following sample code is from the cnv2xmloop sample program:
/* declare OOP startsession function pointer */
BOOL (pascal *fpKVXMLStartOOPSession)( void *,
KVInputStream
char
*,
*,
KVXMLTemplate
KVXMLOptions
*,
*,
KVXMLTOCOptions
DWORD
*,
*,
KVErrorCode
DWORD
*,
,
void
void
*,
* );
/* assign OOP startsession function pointer */ fpKVXMLStartOOPSession = (BOOL (pascal *)( void *,
KVInputStream
char
*,
*,
KVXMLTemplate
KVXMLOptions
*,
*,
KVXMLTOCOptions
DWORD
*,
*,
KVErrorCode *,
XML Export SDK C Programming Guide
•
•
•
•
•
•
223
224
•
•
•
•
•
•
Chapter 8 XML Export API Functions
DWORD
void
,
*,
void * ))mpGetProcAddress(hKVXML,
"KVXMLStartOOPSession"); if(!fpKVXMLStartOOPSession)
{
printf("Error assigning KVXMLStartOOPSession() pointer\n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);
mpFreeLibrary(hKVXML);
return 7;
}
/********START OOP SESSION *****************/ if(!(*fpKVXMLStartOOPSession)(pKVXML,
&Input,
NULL,
&XMLTemplates,
&XMLOptions,
/* Mark-up and related variables */
/* Options */
NULL, /* TOC options */
&oopServantPID,
&error,
0,
NULL,
NULL))
{
printf("Error calling fpKVXMLStartOOPSession \n");
(*KVXMLInt.fpFileToInputStreamFree)(pKVXML, &Input);
(*KVXMLInt.fpFileToOutputStreamFree)(pKVXML, &Output);
(*KVXMLInt.fpShutDown)(pKVXML);
mpFreeLibrary(hKVXML);
return 9;
}
XML Export SDK C Programming Guide
C HAPTER 9
XML Export API Callback
Functions
This section describes the XML Export API callback functions. It contains the following topics:
XML Export SDK C Programming Guide
•
•
•
•
•
•
225
226
•
•
•
•
•
•
Chapter 9 XML Export API Callback Functions
Introduction
The fpConvertStream() and KVXMLConvertFile() functions enable you to specify a callback function. A callback function controls the conversion while it is in progress. For example, you can specify a callback function to report progress during the conversion.
To use the API callback functions, declare one or more instances of the
KVXMLCallbacks
structure (see “KVXMLCallbacks” on page ). Each
member of this instance may then be initialized by assigning a function pointer to the application-defined callback functions, cast to the appropriate function prototype. Each instance of KVXMLCallbacks may define unique callback functions. Alternatively, the functions may be common to all instances of
KVXMLCallbacks ; these functions will take appropriate action, depending on the value of the pointer pCallingContext.
The second parameter (pCallingContext) of the call to fpConvertStream() and KVXMLConvertFile() provides a void pointer used to identify the context of this call. If more than one call to fpConvertStream() or KVXMLConvertFile() is made within a single application, any resulting callbacks are identified by the first parameter of the callback function. This allows the callback function to take any appropriate action, depending on which calling context is returned.
The seventh parameter (pCallbacks) of the call to fpConvertStream() and
KVXMLConvertFile() must be set to the address of the KVXMLCallbacks structure to be used for this call.
For sample code, see the sample program xmlcallback.c. It creates an XML stream and demonstrates the use of the callback functions.
XML Export SDK C Programming Guide
Continue()
Continue()
When fpConvertStream() or KVXMLConvertFile() is called control is not returned to the application until the entire document is processed. This callback function provides a means of monitoring progress and terminating the conversion process before the conversion is completed.
Syntax
BOOL (pascal *Continue) (
void *pCallingContext,
int nPercentComplete);
Arguments pCallingContext Pointer passed back to the caller-provided callback functions.
This pointer, which may be NULL, is specified as the second parameter of the call to fpConvertStream() and
KVXMLConvertFile() .
nPercentComplete Approximate percentage of the current conversion that is completed.
You can monitor the progress of the conversion by checking the value of nPercentDone, which indicates how many blocks out of the total number of blocks have been processed.
Returns
If the call is successful, the return value is TRUE.
If the call is unsuccessful, the return value is FALSE. Processing is halted.
Discussion
There is a callback to this function for every entry that appears in the generated table of contents.
The application is free to execute any required code in the callback function, with the exception of fpShutDown().
XML Export SDK C Programming Guide
•
•
•
•
•
•
227
228
•
•
•
•
•
•
Chapter 9 XML Export API Callback Functions
GetAnchor()
This function gets the filename automatically generated by Export and used for external graphics referenced with <a xmlns:xlink= xlink href=> tags, heading-level table of contents entries and external files (such as, CSS files and revision summary files).
Syntax
BOOL (pascal *GetAnchor) (
void *pCallingContext,
KVXMLAnchorType eAnchorType,
char *pszAnchor,
int cbAnchorMax,
BYTE
UINT
*pcHTML,
cbHTML);
Arguments pCallingContext Pointer that gets passed back to the caller-provided callback functions. This pointer, which may be NULL, is specified as the second parameter of the call to fpConvertStream().
eAnchorType The anchor type for the output stream. It must be one of the enumerated types defined in KVXMLAnchorType. See
.
pszAnchor cbAnchorMax pcHTML cbHTML
Pointer to the location where the new anchor is stored.
Maximum number of bytes to place in pszAnchor.
This is either NULL or a pointer to one of the following:
markup defining the contents of a table of contents entry
the external graphic filename
the external filename
Number of valid bytes in pcHTML.
Returns
If the call is successful, the return value is TRUE.
If the call is unsuccessful, the return value is FALSE. Processing is halted.
XML Export SDK C Programming Guide
GetAnchor()
Discussion
If this callback is NULL, default anchor names are generated. The generated names are unique across the document.
This function is called once per block, block chunk, graphic anchor, or extra file. Any required code may be executed here as long as a unique value for pszAnchor is assigned. If this string is not unique, an existing file may be overwritten, producing undesirable results. The callback function should contain the functionality to verify whether files already exist.
If you want to specify graphic anchor names, but use default anchor names for all other anchors, provide the graphic names when eAnchorType is
VectorPictureAnchor or RasterPictureAnchor. For all other anchor types, call with the same parameters you were passed.
pszAnchor must be assigned. It may be derived from the cbAnchorMax, pcHTML , and cbHTML values, which are also provided.
pcHTML may be null if the graphic is an internal part of the document.
XML Export SDK C Programming Guide
•
•
•
•
•
•
229
Chapter 9 XML Export API Callback Functions
GetAuxOutput()
This callback function allows the calling application to specify an auxiliary output stream for a block or graphic.
Syntax
BOOL (pascal *GetAuxOutput) (
void *pCallingContext,
KVXMLAnchorType eAnchorType,
char *pszAnchor,
KVOutputStream *pNewOutput);
Arguments pCallingContext Pointer passed back to the caller-provided callback functions.
This pointer, which may be NULL, is specified as the second parameter of the call to fpConvertStream(). eAnchorType Graphic or block anchor as defined by the enumerated types in
KVXMLAnchorType . See
pszAnchor pNewOutput
Pointer to location where a new anchor is stored. pszAnchor is based on the call to GetAnchor().
Pointer to a KVOutputStream structure that may be used to write data to the current block.
230
•
•
•
•
•
•
Returns
If the call is successful, the return value is TRUE.
If the call is unsuccessful, the return value is FALSE. Processing is halted.
Discussion
If GetAuxOutput() is NULL, the pszDefaultOutputDirectory member of the instance of KVXMLOptions is used as the base storage location for auxiliary output files. If pszDefaultOutputDirectory is also NULL, auxiliary files are placed in the current working directory. See
For each pszAnchor provided, create (malloc) an appropriate I/O structure.
Assign pNewOutput->pOutputStreamPrivateData to point to that structure. Each remaining member of the KVOutputStream should then be initialized by assigning a function pointer to the additional application-defined
XML Export SDK C Programming Guide
GetAuxOutput() functions, cast to the appropriate function prototype for Create(), Write(),
Seek() , Tell(), and Close(). Memory allocated to the I/O structure must be tracked and may be freed up within the call to Close(). See the callback.c
sample program.
XML Export SDK C Programming Guide
•
•
•
•
•
•
231
232
•
•
•
•
•
•
Chapter 9 XML Export API Callback Functions
UserCB()
This callback function is triggered by including the $USERCB token in a member of
KVXMLTemplate . For example, placing “$USERCB=my_callback “ in pszFirstH1Start results in a callback at the point when pszFirstH1Start is processed. The user callback function is identified by the text assigned to
$USERCB , which in this example is my_callback. This identifier is passed to the argument pszUserCBid.
Syntax
BOOL (pascal *UserCB) (
void *pCallingContext,
char *pszUserCBid,
KVOutputStream *pNewOutput
void *pReserved);
Arguments pCallingContext Pointer that gets passed back to the caller-provided callback function. This pointer, which may be NULL, is specified as the second parameter of the call to fpConvertStream().
pszUserCBid Pointer to a string that identifies the source of the callback. The identifier must be delimited by a trailing white space. For example, "my_callback ".
pNewOutput pReserved
Pointer to a KVOutputStream structure that can be used to write data to the current block.
Reserved for future use.
Returns
If the call is successful, the return value is TRUE.
If the call is unsuccessful, the return value is FALSE. Processing is halted.
XML Export SDK C Programming Guide
C HAPTER 10
XML Export API Structures
This section provides information on the structures used by the XML Export API.
These structures are defined in kvxml.h
, kvtypes.h
, and adinfo.h
. It contains the following topics:
XML Export SDK C Programming Guide
•
•
•
•
•
•
233
234
•
•
•
•
•
•
Chapter 10 XML Export API Structures
ADDOCINFO
This structure provides the format, file class, and version number of the source document. It is defined in adinfo.h
, and is initialized by calling the function fpGetStreamInfo()
typedef struct
{
ENdocClass
ENdocFmt
eClass;
eFormat;
long lVersion;
unsigned long ulAttributes;
} ADDOCINFO, *ADDOCINFOPTR;
Member Descriptions eClass eFormat
Source document’s file class (for example, spreadsheet, word processor or encapsulation format) as defined by the enumerated type ENDocClass.
Source document’s major format (for example Microsoft Word XML format, or Corel Presentation) as defined by the enumerated type
ENdocFmt in adinfo.h. The ENdocFmt type provides a unique ID for each major format. lVersion Version number of the file format. The number is multiplied by 1,000, so, for example, 1.02 is represented by 1020.
ulAttributes Other attributes of the document as defined by the enumerated type
ENdocAttributes .
Discussion
As format detection is enhanced in future releases, new format IDs may be added to the ENdocFmt enumerated type. When using this type, your code should ensure binary compatibility with future releases. For example, if you use an array to access format information based on a format ID, your code should check the format ID is less than
Max_Fmt
before accessing the data. This ensures new format codes are detected when you add KeyView binary files from new releases to your existing installation.
XML Export SDK C Programming Guide
KVInputStream
KVInputStream
This structure defines an input stream for the XML conversion. typedef struct tag_InputStream
{
void *pInputStreamPrivateData;
long lcbFilesize;
BOOL (pascal *fpOpen) (struct tag_InputStream *);
UINT (pascal *fpRead) (struct tag_InputStream *, BYTE *, UINT);
BOOL (pascal *fpSeek) (struct tag_InputStream *, long, int);
long (pascal *fpTell) (struct tag_InputStream *);
BOOL (pascal *fpClose)(struct tag_InputStream *);
}
KVInputStream;
Member Descriptions
All member functions are equivalent to their counterparts in the ANSI standard library, except fpOpen()
, which returns FALSE on failure. On fpOpen()
, if the size of the stream is known, assign that value to lcbFilesize . Otherwise, set lcbFilesize
to
0
.
XML Export SDK C Programming Guide
•
•
•
•
•
•
235
236
•
•
•
•
•
•
Chapter 10 XML Export API Structures
KVMemoryStream
This structure defines an optional memory allocator to be used by XML Export. It is initialized by calling fpInit() . See
typedef struct tag_MemoryStream
{
void *pMemoryStreamPrivateData;
void * (pascal *fpMalloc)(struct tag_MemoryStream*,size_t);
void (pascal *fpFree) (struct tag_MemoryStream*, void *);
void * (pascal *fpRealloc)(struct tag_MemoryStream*,void *, size_t);
void * (pascal *fpCalloc)(struct tag_MemoryStream*, size_t, size_t);
}
KVMemoryStream;
Member Descriptions
All member functions are equivalent to their counterparts in the ANSI standard library.
Discussion
fpRealloc()
must handle a NULL pointer.
For systems that do not support sample program, which demonstrates how to use the memory management features.
fpRealloc() , refer to the xmlcallback
If
KVMemoryStream allocation is used.
is not provided, then the default C run-time memory
XML Export SDK C Programming Guide
KVOutputStream
KVOutputStream
This structure defines an output stream for the XML conversion.
typedef struct tag_OutputStream
{
void *pOutputStreamPrivateData;
BOOL (pascal *fpCreate)(struct tag_OutputStream *,TCHAR *);
UINT (pascal *fpWrite) (struct tag_OutputStream *, BYTE *, UINT);
BOOL (pascal *fpSeek) (struct tag_OutputStream *, long, int);
long (pascal *fpTell) (struct tag_OutputStream *);
BOOL (pascal *fpClose) (struct tag_OutputStream *);
}
KVOutputStream;
Member Descriptions
All member functions are equivalent to their counterparts in the ANSI standard library.
XML Export SDK C Programming Guide
•
•
•
•
•
•
237
Chapter 10 XML Export API Structures
KVSTR
This structure is used to identify string types (string text and byte count) for the first three members of KVStyle . See
typedef struct tag_KVSTR
{
char *pcString;
int cbString;
}
KVSTR;
Member Descriptions pcString cbString
Text string.
Length of pcString, excluding the terminating NULL(s). This allows
UNICODE or double bytes to be employed.
238
•
•
•
•
•
• XML Export SDK C Programming Guide
KVStreamInfo
KVStreamInfo
This structure defines a document’s character set and format. The structure is initialized by calling the function fpGetStreamInfo() . See
typedef struct tag_KVStreamInfo
{
KVCharSet
ADDOCINFO
charset;
adInfo;
}
KVStreamInfo;
Member Descriptions charset adInfo
Character set of the source document, if that information is ascertainable.
This member is an integer corresponding to the KVCharSet enumerated type in kvtypes.h.
File class, major format, and version of the source document. Pointer to the ADDOCINFO structure. The structure of ADDOCINFO is defined in adinfo.h
adInfo.eClass
represents the source document’s class as defined by the enumerated type ENDocClass.
adInfo.eFormat
represents the source document’s format as defined by the enumerated type ENdocFmt.
adInfo.lVersion
represents the version number of the file format.
The number is multiplied by 1,000, so, for example, 1.02 is represented by 1020.
adInfo.ulAttributes
represents other attributes of the document as defined by the enumerated type ENdocAttributes.
Discussion
As format detection is enhanced in future releases, new format IDs may be added to the ENdocFmt enumerated type. When using this type, your code should ensure binary compatibility with future releases. For example, if you use an array to access format information based on a format ID, your code should check the format ID is less than
Max_Fmt
before accessing the data. This ensures new format codes are detected when you add KeyView binary files from new releases to your existing installation.
XML Export SDK C Programming Guide
•
•
•
•
•
•
239
240
•
•
•
•
•
•
Chapter 10 XML Export API Structures
KVStructHead
This structure contains the current KeyView version number and is the first member of other structures. It enables Autonomy to modify the structures in future releases, but to maintain backward compatibility. Before initializing a structure that contains the KVStructHead structure, use the macro KVStructInit to initialize
KVStructHead
. The structure and macro are defined in kvtypes.h
.
typedef struct _KVStructHead
{
WORD
WORD
version;
size;
DWORD
void
reserved;
*internal;
} KVStructHeadRec, *KVStructHead;
Member Descriptions version size reserved internal
The current KeyView version number. This is a symbolic constant
(KeyviewVersion) defined in kvxtract.h. This constant will be updated for each KeyView release.
The size of the KVStructHeadRec.
Reserved for internal use.
Reserved for internal use.
Example
KVStructInit(&openArg);
XML Export SDK C Programming Guide
KVStyle
KVStyle
This structure defines the style mapping support for
KVSTR
-defined styles. The first three members of KVStyle are KVSTR
structures (see “KVSTR” on page ).
Each KVSTR structure contains the text string and byte count for
StyleName
,
MarkUpStart , and MarkUpEnd . The structure is initialized by calling the function fpSetStyleMapping()
.
See
for more information on mapping styles.
XML Export supports both paragraph styles and character styles. It works on the assumption that each style has a unique name. Only one paragraph style may be active at one time; therefore, the opening of a new paragraph style automatically closes the previous paragraph style. By contrast, several character styles may be active at once. When XML Export receives an EndCharStyle token from the format parser, the most recent character style is terminated.
typedef struct tag_KVStyles
{
KVSTR
KVSTR
StyleName;
MarkUpStart;
KVSTR
DWORD
}
KVStyle;
MarkUpEnd;
dwFlags;
XML Export SDK C Programming Guide
•
•
•
•
•
•
241
242
•
•
•
•
•
•
Chapter 10 XML Export API Structures
Member Descriptions
StyleName
MarkUpStart
MarkUpEnd dwFlags
The name of the word processing style (for example, “Heading
1”) to which style mapping applies. A pointer to the KVSTR structure. See
.
Style names are case sensitive.
The markup added to the beginning of a paragraph or character style. A pointer to the KVSTR structure. See
The markup added to the end of a paragraph or character style.
A pointer to the KVSTR structure. See
Instructions on how to process the content associated with a paragraph or character style. The flag can be one of the types defined in kvtypes.h. They are described in
The value associated with each flag is a hexadecimal number.
You can set an option by either entering the converted decimal value or entering the flag’s text (for example, KVSTYLE_PRE)
The value of Flags in the template files is passed to this member of KVStyle.
Discussion
This structure applies to word processing documents only.
By default, XML Export maps the heading style “Heading 1” to <h1></h1> , and so on, for heading levels 1 through 6. If you use style mappings, the default mapping is overridden. Therefore, you must supply markup for all heading levels.
When the user-defined markup in KVStyle conflicts with other markup generated by XML Export, the user-defined markup takes precedence.
XML Export SDK C Programming Guide
KVSumInfoElemEx
KVSumInfoElemEx
This structure defines the individual metadata elements. typedef struct tag_KVSumInfoElemEx
{
int isValid;
KVSumInfoType
void
type;
*data;
char
}
*pcType;
KVSumInfoElemEx;
Member Descriptions isValid type data pcType
Specifies whether the data value is present in the document. The setting
1 specifies the value is valid and exists.
Data type of the metadata element. The types are defined in the structure KVSumInfoType in kvtypes.h. See
The content of the metadata field.
If the type member is KV_Int4 or KV_Bool, then this member contains the actual value. Otherwise, this member is a pointer to the actual value.
KV_DateTime and KV_IEEE8 point to an 8-byte value.
KV_String and KV_Unicode point to the beginning of the string containing the text.
Pointer to the name of the metadata field.
XML Export SDK C Programming Guide
•
•
•
•
•
•
243
244
•
•
•
•
•
•
Chapter 10 XML Export API Structures
KVSummaryInfoEx
This structure provides a count of the number of metadata elements in a document, and a pointer to the first element of the array of elements. The structure is initialized by calling the function fpGetSummaryInfo()
. See
“fpGetSummaryInfo()” on page .
typedef struct tag_KVSummaryInfoEx
{
int nElem;
KVSumInfoElemEx
}
*pElem;
KVSummaryInfoEx;
Member Descriptions nElem pElem
Number of metadata elements contained in the array. nElem may be zero.
This indicates that the document did not contain metadata, such as an
ASCII text document.
Points to the first element of the array of document metadata elements defined by the structure KVSumInfoElemEx. See
XML Export SDK C Programming Guide
KVXConfigInfo
KVXConfigInfo
This structure defines the document type of a source XML file, and the element extraction settings for that type. The settings can be applied based on the file format ID, or the file’s root element. This structure is in kvtypes.h
and is initialized by calling the function KVXMLConfig()
. See “Convert XML Files” on page and
.
typedef struct TAG_KVXConfigInfo
{
ENdocFmt
char*
eKVFormat;
pszRoot;
char*
char*
char*
char*
pszInMeta;
pszExMeta;
pszInContent;
pszExContent;
pszInAttribute; char*
}
KVXConfigInfo;
Member Descriptions eKVFormat pszRoot pszInMeta
The format ID as detected by the KeyView detection module. This determines the file type to which these extraction settings apply.
The format ID is defined by the enumerated type ENdocFmt. See
“File Format Detection” on page for more information on
format ID values.
If you are adding configuration settings for a custom XML document type, this is not defined.
The file’s root element. When the format ID is not defined, the root element is used to determine the file type to which these settings apply.
To further qualify the element, specify its namespace. See
“Specify an Element’s Namespace and Attribute” on page .
The elements extracted from the file as metadata. All other elements are extracted as text. Multiple entries must be separated by commas.
To further qualify the element, specify its namespace and/or attributes. See
“Specify an Element’s Namespace and Attribute” on page
.
XML Export SDK C Programming Guide
•
•
•
•
•
•
245
246
•
•
•
•
•
•
Chapter 10 XML Export API Structures pszExMeta pszInContent pszExContent
The child elements in the included metadata elements that are not extracted from the file as metadata. For example, the default extraction settings for the Visio XML format, extracts the
DocumentProperties element as metadata. This element includes child elements such as Title, Subject, Author,
Description , and so on. However, the child element
PreviewPicture is defined in pszExMeta because it is binary data and should not be extracted.
You cannot exclude any metadata elements from the output for
StarOffice files. All metadata is extracted regardless of this setting.
To further qualify the element, specify its namespace and/or attributes. See
“Specify an Element’s Namespace and Attribute” on page
.
The elements extracted from the file as content text. An asterisk
(*) extracts all elements including child elements.
To further qualify the element, specify its namespace and/or attributes. See
“Specify an Element’s Namespace and Attribute” on page
.
The child elements in the included content elements that are not extracted from the file as content text.
To further qualify the element, specify its namespace and/or attributes. See
“Specify an Element’s Namespace and Attribute” on page
.
pszInAttribute The attribute values extracted from the file. If attributes are not defined, attribute values are not extracted. The namespace (if used), element name and attribute name must be defined in the following format:
namespace:elementname@attributename
For example:
Autonomy:division@name
XML Export SDK C Programming Guide
KVXMLCallbacks
KVXMLCallbacks
This structure provides all callbacks that can result from a call to fpConvertStream() or KVXMLConvertFile()
. See “fpConvertStream()” on page and
214 . Any and all of the function
pointers can be NULL. typedef BOOL (pascal *KVXMLCB_CONTINUE)(
void *pcallingContext,
int nPercentDone); typedef BOOL (pascal *KVXMLCB_GETANCHOR)(
void *pCallingContext,
KVXMLAnchorType
char
eAnchorType,
*pszAnchor,
Int cbAnchorMax,
BYTE
UINT
*pcHTML,
cbHTML); typedef BOOL (pascal *KVXMLCB_GETAUXOUTPUT)(
void *pCallingContext,
KVXMLAnchorType
char
eAnchorType,
*pszAnchor,
KVOutputStream *pNewOutput); typedef BOOL (pascal *KVXMLCB_USERCB) (
void
char
*pCallingContext,
*psUserCBid,
KVOutputStream
void
*pOutput,
*pReserved); typedef struct tag_KVXMLCallbacks
{
KVXMLCB_CONTINUE
KVXMLCB_GETANCHOR
fpContinue;
fpGetAnchor;
KVXMLCB_GETAUXOUTPUT
KVXMLCB_USERCB
fpGetAuxOutput;
fpUserCB;
}
KVXMLCallbacks;
Member Descriptions
The members of this structure are function pointers to the functions described in
“XML Export API Callback Functions” on page .
If fpGetAuxOutput()
is NULL, the pszDefaultOutputDirectory
member of the instance of KVXMLOptions is used as the base storage location for auxiliary output files. If pszDefaultOutputDirectory
is also NULL, auxiliary files are placed in the current working directory. See
XML Export SDK C Programming Guide
•
•
•
•
•
•
247
248
•
•
•
•
•
•
Chapter 10 XML Export API Structures
KVXMLHeadingInfo
This structure defines how XML Export creates heading information based on the source document’s content and attributes. Source text is converted to a heading and included in the table of contents if it meets all the criteria defined by this structure, and the headingCreateType
member of automatic heading generation.
KVXMLTOCOptions
is set to allow
XML Export evaluates the text against each member in the order in which the members appear below.
See
“KVXMLTOCOptions” on page for more information on automatic
generation of headings.
typedef struct tag_KVXMLHeadingInfo
{
int minParaLen;
int maxParaLen;
int fontSizeMin;
int fontSizeMax;
BOOL
BOOL
BOOL
BOOL
bMustBeBold;
bMustBeItalic;
bMustBeUnderlined;
bNonZeroIndent;
BOOL
BOOL
bNoTabs;
bNoMultiSpaces;
int nSpaceBefore;
int nSpaceAfter;
}
KVXMLHeadingInfo;
XML Export SDK C Programming Guide
KVXMLHeadingInfo
Member Descriptions minParaLen maxParaLen bNonZeroIndent bNoTabs
The minimum number of characters that a paragraph in the source document can contain for the text to meet the criteria for heading conversion.
Applies to word processing documents only.
The default is 3 for heading levels 1 to 3.
The maximum number of characters that a paragraph in the source document can contain for the text to meet the criteria for heading conversion.
Applies to word processing documents only.
The default is 80 for heading levels 1 to 3. fontSizeMin fontSizeMax bMustBeBold
The minimum font size of text in the source document for the text to meet the criteria for heading conversion.
The default is 14 for heading level 1, and 12 for heading levels
2 and 3.
The maximum font size of text in the source document for the text to meet the criteria for heading conversion.
The default is 20 for heading level 1, and 14 for heading levels
2 and 3.
If this is set to TRUE, the text in the source document must be bold to meet the criteria for heading conversion.
The default is TRUE for heading levels 1 and 2, and FALSE for heading level 3.
bMustBeItalic If this is set to TRUE, the text in the source document must be italic to meet the criteria for heading conversion.
The default is FALSE. bMustBeUnderlined If this is set to TRUE, the text in the source document must be underlined to meet the criteria for heading conversion.
The default is FALSE.
If this is set to TRUE, the text in the source document must be indented to meet the criteria for heading conversion. If set to
FALSE, the text must be aligned left.
The default is FALSE.
If this is set to TRUE, the text in the source document must not contain tabs to meet the criteria for heading conversion.
The default is FALSE.
XML Export SDK C Programming Guide
•
•
•
•
•
•
249
Chapter 10 XML Export API Structures bNoMultiSpaces nSpaceBefore nSpaceAfter
If this is set to TRUE, the text in the source document must not contain two or more contiguous white spaces to meet the criteria for heading conversion.
The default is FALSE.
The amount of space in TWIPS (20th of a point) that must come before a paragraph in the source document for the text to meet the criteria for heading conversion. If –1 is used, the amount of space before the paragraph is not considered in the heading generation.
The default is 0.
The amount of space in TWIPS (20th of a point) that must follow a paragraph in the source document for the text to meet the criteria for heading conversion. If –1 is used, the amount of space after the paragraph is not considered in the heading generation.
The default is 0.
250
•
•
•
•
•
• XML Export SDK C Programming Guide
KVXMLInterface
KVXMLInterface
The members of this structure are pointers to the API functions described in
Export API Functions” on page .
typedef void* (pascal *KVXML_INIT) (
KVMemoryStream *pMemAllocator,
char
char
*pszKeyViewDir,
*pszDataFile,
KVErrorCode
DWORD
*pError,
dWord); typedef void (pascal *KVXML_SHUTDOWN)(void*); typedef BOOL (pascal *KVXML_CONVERT_STREAM) (
void *pContext,
void *pCallingContext,
KVInputStream
KVOutputStream
*pInput,
*pOutput,
KVXMLTemplate
KVXMLOptions
*pTemplates,
*pOptions,
KVXMLTOCOptions
KVXMLCallbacks
*pTOCCreateOptions,
*pCallbacks,
BOOL bIndex,
KVErrorCode *pError); typedef char** (pascal *KVXML_GET_FILE_LIST)(
void *pContext,
int *pnSize ); typedef BOOL (pascal *KVXML_GET_STREAM_INFO)(
void *pContext,
KVInputStream
KVStreamInfo
*pInput,
*pStreamInfo ); typedef BOOL (pascal *KVXML_GET_ANCHOR) (
void *pCallingContext,
KVXMLAnchorType
char
eAnchorType,
*pszAnchor,
int cbAnchorMax,
BYTE
UINT
*pcHTML,
cbHTML); typedef BOOL (pascal *KVXML_INPUTSTREAM_CREATE) (
void *pContext,
char *pszFileName,
KVInputStream *pInput); typedef BOOL (pascal *KVXML_INPUTSTREAM_FREE) (
void *pContext,
KVInputStream *pInput); typedef BOOL (pascal *KVXML_OUTPUTSTREAM_CREATE) (
void
char
*pContext,
*pszFileName,
KVOutputStream *pOutput );
XML Export SDK C Programming Guide
•
•
•
•
•
•
251
252
•
•
•
•
•
•
Chapter 10 XML Export API Structures typedef BOOL (pascal *KVXML_OUTPUTSTREAM_FREE)(
void *pContext,
KVOutputStream *pOutput ); typedef KVLanguageID (pascal *KVXML_LANGUAGE_ID)(void *pContext); typedef BOOL (pascal *KVXML_GET_SUMMARY_INFO)(
void *pContext,
KVInputStream *pInput,
KVSummaryInfoEx *pSummary,
BOOL bFree ); typedef BOOL (pascal *KVXML_SET_STYLE_MAPPING) (
void *pContext,
KVStyle *pStyles,
int iStyles,
BOOL bCopy); typedef BOOL (pascal *KVXML_VALIDATE_TEMPLATE)(
void *pContext,
KVOutputStream *pOutput,
KVXMLTemplate
KVXMLOptions
*pTemplate,
*pOptions,
KVXMLTOCOptions
KVXMLCallbacks
*pTOCOptions,
*pCallBalls,
KVMemoryStream *pMemStream) typedef struct tag_KVXMLInterface
{
KVXML_INIT fpInit;
KVXML_SHUTDOWN fpShutDown;
KVXML_CONVERT_STREAM
KVXML_GET_FILE_LIST
fpConvertStream;
fpGetConvertFileList;
KVXML_GET_STREAM_INFO
KVXML_GET_ANCHOR
fpGetStreamInfo;
fpGetAnchor;
KVXML_INPUTSTREAM_CREATE
KVXML_INPUTSTREAM_FREE
fpFileToInputStreamCreate;
fpFileToInputStreamFree;
KVXML_OUTPUTSTREAM_CREATE
KVXML_OUTPUTSTREAM_FREE
fpFileToOutputStreamCreate;
fpFileToOutputStreamFree;
KVXML_GET_SUMMARY_INFO
KVXML_SET_STYLE_MAPPING
fpGetSummaryInfo;
fpSetStyleMapping;
KVXML_VALIDATE_TEMPLATE
}
fpValidateTemplate;
KVXMLInterface;
Member Descriptions
The members of this structure are function pointers to the functions described in
“XML Export API Functions” on page .
KVXML_VALIDATE_TEMPLATE
is currently not implemented.
XML Export SDK C Programming Guide
KVXMLOptions
KVXMLOptions
This structure defines the options that control the XML markup written in response to the general style and attributes (font, color, and so on) of the document. The structure is initialized by calling the function fpConvertStream()
or
KVXMLConvertFile()
or
. typedef struct tag_KVXMLOptions
{
BOOL
char
bUseVerityDTD;
*pszVerityDTDPath;
KVXMLStyleSheetType
BOOL
eStyleSheetType
bUseExistingStyleSheet;
char
BOOL
*pszStyleSheet;
bIndexOnly;
KVCharSet
BOOL
eOutputCharSet;
bForceOutputCharSet;
KVCharSet
BOOL
eSrcCharSet;
bForceSrcCharSet;
KVLanguageID
BOOL
eOutputLanguageID;
bUseDocumentColors;
BOOL
BOOL
bUseDocumentFontInfo;
bNbspEmptyCells;
ENSATableBorder eSATableBorder;
int nTableBorderWidth;
char
char
char
char
char
char
BOOL
BOOL
*pszBaseURL;
*pszMainURL;
*pszDefaultOutputDirectory;
*pszPicPath;
*pszPicURL;
*pszJavaURL;
bRemoveFileNameSpaces;
bRasterizeFiles
KVXMLGraphicType
KVXMLGraphicType
eOutputRasterGraphicType;
eOutputVectorGraphicType;
int cxVectorToRasterXRes;
int cyVectorToRasterYRes;
int nCompressionQuality;
BOOL
long
bGenerateURLs;
lcbMaxMemUsage;
BYTE
BYTE
cReplaceChar;
cRedact;
KVXMLEmptyParaType eEmptyParaType;
KVXMLHardPageBreakType eHardPageBreakType;
BOOL
BOOL
BOOL
bSupportColumnHeadings;
bSupportRowHeadings;
bSupportCellSpan;
XML Export SDK C Programming Guide
•
•
•
•
•
•
253
254
•
•
•
•
•
•
Chapter 10 XML Export API Structures
BOOL
BOOL
BOOL
BOOL
bSupportRowSpan;
bSupportColumnWidth;
bRemoveEmptyColumns;
bRemoveEmptyRows;
BOOL bEnableEmptyRows;
int nRowsBeforeSplit;
}
KVXMLOptions;
Member Descriptions bUseVerityDTD pszVerityDTDPath eStyleSheetType
Set to TRUE to generate XML based on the Verity DTD. For more information, see
interchange format. If FALSE, the XML is based on the source document’s paragraph structure.
The default is TRUE.
If you move the Verity DTD from the default tempout directory to another output directory, set the string value of pszVerityDTDPath to the new location. This path is added to the document type declaration in the XML file.
The default is no path. That is, the DTD is assumed to be in the same directory as the generated XML files.
One of the enumerated options for processing style sheet information.
The options are defined in KVXMLStyleSheetType in kvxml.h. See
STYLESHEET_DISABLED —Disables style sheet formatting.
XML_CSS —Enables Cascading Style Sheet (CSS) formatting, and outputs the generated formatting data in an external CSS file referenced in the XML output as a tag.
XML_XSL —Enables Extensible Stylesheet Language (XSL) formatting, and uses an external XSL file referenced in a
<?xml-stylesheet...?> processing instruction.
The default is STYLESHEET_DISABLED.
XML Export SDK C Programming Guide
KVXMLOptions bUseExistingStyleSheet pszStyleSheet bIndexOnly eOutputCharSet
Set to TRUE to apply an existing XSL style sheet or a CSS to an XML document. The style sheet filename is inserted into the type declaration at the beginning of the XML file. The location of the external style sheet file is set by pszStyleSheet. If pszStyleSheet is not specified and the style sheet type is XSL, then a default XSL style sheet, appropriate for the source document type, is used. The default
XSL style sheets are:
wp.xls
(for word processing documents)
ss.xls
(for spreadsheets)
pg.xls
(for presentations)
If pszStyleSheet is not specified and the style sheet type is CSS, then a CSS file is created.
Existing style sheets are not validated.
The default is FALSE.
The path and filename of an external style sheet.
The default is no path.
Set this to TRUE to generate output with minimal markup (ID and style paragraph attributes) and without images. Since the generated output is minimized to textual content, it is suitable for an indexing engine. If bIndexOnly is set to FALSE, embedded images in a document are regenerated as separate files and stored in the output directory.
The template file named xml_index.ini and the xmlindex sample program demonstrate the effect of setting bIndexOnly.
To generate output with verbose markup and without images, set the nType argument of the function KVXMLConfig() to
KVCFG_SUPPRESSIMAGES
. See “KVXMLConfig()” on page .
Applies to word processing documents and spreadsheets only.
The default is FALSE.
The character set to use for textual output. To ensure the character set defined here is used, you must set bForceOutputCharSet to
TRUE. The available character sets are enumerated in KVCharSet in kvtypes.h
. See
“Convert Character Sets” on page .
The section
“Supported Formats” on page lists the file formats for
which character set information can be determined.
The default is KVCS_UNKNOWN.
XML Export SDK C Programming Guide
•
•
•
•
•
•
255
256
•
•
•
•
•
•
Chapter 10 XML Export API Structures bForceOutputCharSet eSrcCharSet bForceSrcCharSet eOutputLanguageID bUseDocumentColors bUseDocumentFontInfo bNbspEmptyCells
Set to TRUE to use the output character set specified in eOutputCharSet , regardless of the internal document information or
the source character set specified by eSrcCharSet. See “Convert
.
Forcing a character set to KVCS_UNKNOWN is always ignored.
The default is FALSE.
Specifies the character set of the document. To ensure the character set defined here is used, you must set bForceSrcCharSet to TRUE.
The available character sets are enumerated in KVCharSet in kvtypes.h
. See
“Convert Character Sets” on page . The section
“Supported Formats” on page lists the file formats for which
character set information can be determined.
The default is KVCS_UNKNOWN.
Set to TRUE to use the source character set specified in eSrcCharSet , regardless of the internal document information. See
“Convert Character Sets” on page .
Forcing a character set to KVCS_UNKNOWN is always ignored.
The default is FALSE.
The language for the textual output of language-specific data such as time and date. eOutputLanguageID must be in the system locale. If eOutputLanguageID is invalid or not supplied, the system default is used. Language IDs are defined in KVLanguageID in kvtypes.h.
The default is Language_UNKNOWN.
Set to TRUE to retain the color attributes information contained in the source document. If set to FALSE, no color attributes appear in the
<font> tags of the output.
The default is FALSE.
Set to TRUE to retain the font information contained in the source document. If set to FALSE, no font information appears in the <font> tags in the output.
The default is FALSE.
Set to TRUE to include a non-breaking space (<td> </td>) in the markup for empty table cells in the source document. If this is set to FALSE, <td></td> is generated for empty table cells.
Applies to word processing documents and spreadsheets only.
The default is TRUE.
XML Export SDK C Programming Guide
KVXMLOptions eSATableBorder nTableBorderWidth
Specifies whether table borders are based on the setting in the source document, are always on, or are always off. The options are enumerated in ENSATableBorder in kvtypes.h. See
Applies to word processing documents only.
The default is SA_BaseOnDocument.
Sets the width of the table border in pixels.
Applies to word processing documents only.
The default is 1.
pszBaseURL pszMainURL
The base URL that replaces the $BASE token in the XML output.
The default is NULL.
The main URL that replaces the $MAIN token in the XML output.
The default is NULL.
pszDefaultOutputDirectory The default output directory for auxiliary files created during the conversion.
The default is NULL, and the files are placed in the directory in which your application is running.
pszPicPath pszPicURL
The output directory for graphic files created during the conversion. If specified, this member can also be used by the callback functions
KVXMLGetAnchor and KVXMLGetAuxOutput.
Applies to word processing documents only.
The default is NULL, and the files are placed in the directory in which your application is running.
The URL of the graphic files created from embedded graphics in the source document. To specify a complete image source, this element must be combined with pszAnchor of the fpGetAnchor callback
function. See “GetAnchor()” on page .
For example, setting pszPicURL to ../cgi-bin/ and setting pszAnchor to pic.jpg results in the following markup:
<a xmlns:xlink= xlink href="../cgi-bin/pic.jpg"> pszJavaURL
Applies to word processing documents only.
The default is NULL.
The URL where the Java rasterizer (kvvector.jar) is located.
The Java rasterizer is not currently enabled.
The default is NULL.
XML Export SDK C Programming Guide
•
•
•
•
•
•
257
Chapter 10 XML Export API Structures
258
•
•
•
•
•
• bRemoveFileNameSpaces bRasterizeFiles eOutputRasterGraphicType eOutputVectorGraphicType cxVectorToRasterXRes cyVectorToRasterYRes nCompressionQuality bGenerateURLs
Set to TRUE to remove spaces from generated output filenames.
The default is FALSE.
Set to TRUE to rasterize slides from presentations into single images.
Set to FALSE to only extract text from presentation files. When this member is set to FALSE graphics do not appear in the output.
Since XML Export only extracts textual components from presentations, this member must be set to FALSE.
The default is FALSE.
The output format of rasterized embedded graphics. There are six options enumerated in KVXMLGraphicType in kvxml.h. See
The default is KVGFX_JPEG.
The output format of vector graphics. The options are enumerated in
KVXMLGraphicType in kvxml.h. The default is JPEG. See
“KVXMLGraphicType” on page . For more information on
converting vector graphics on UNIX or Linux, see
Graphics on UNIX and Linux” on page .
The default is KVGFX_JPEG.
Controls the X resolution (width in pixels) at which presentations and graphics are converted. This is set in conjunction with cyVectorToRasterYRes . To set this member, see
Resolution of Presentations and Graphics” on page
.
The default is 0, which means the original resolution is retained.
Controls the Y resolution (height in pixels) at which presentations and graphics are converted. This is set in conjunction with cxVectorToRasterXRes . To set this member, see
Resolution of Presentations and Graphics” on page
.
The default value is 0, which means the original resolution is retained.
Controls the output quality of graphics that support compression quality (for example, JPEG). A value of 0 means default quality (85 compression); 1 is the lowest quality (highest compression and therefore the smallest file size); 100 is the highest quality (no compression and therefore the largest file size).
Applies to word processing documents only.
The default is 0.
Set to TRUE to add anchor tags (<a xmlns:xlink= xlink href=> </a> ) to text starting with “www”, “http:” or “file:”.
Applies to word processing documents only.
The default is FALSE.
XML Export SDK C Programming Guide
KVXMLOptions lcbMaxMemUsage cReplaceChar cRedact eEmptyParaType eHardPageBreakType bSupportColumnHeadings bSupportRowHeadings
The maximum memory allocated dynamically for token buffers during file processing. If this maximum is reached, Export performs a swap-to-disk operation internally, and then reuses the memory blocks.
Export maintains an internal minimum memory size.
Applies to word processing or text documents only.
The default is LONG_MAX. The unit is in bytes.
The character used when a character in the source document’s character set cannot be mapped to the output character set.
The default replacement character is a question mark (?).
The character that replaces tagged text that has been designated, through style mapping, to be omitted from the output. This functionality is useful when you need to hide confidential or sensitive information.
The specified character is used for all text that has been mapped to a style processed with the KVSTYLE_REDACT flag (defined in kvtypes.h
). See
.
Applies to word processing documents only.
The default replacement character is “X”.
Determines if paragraphs without content generate markup or ID attributes in the output file. There are three options enumerated in
KVXMLEmptyParaType in kvxml.h. See
“KVXMLEmptyParaType” on page .
Applies to word processing documents only.
The default is KVEPT_SUPPRESS.
Determines if hard page breaks generate markup or ID attributes in the output file. There are four options enumerated in
KVXMLEmptyParaType in kvxml.h. See
“KVXMLHardPageBreakType” on page .
Applies to word processing documents only.
The default is KVHPBT_SUPPRESS.
Set to TRUE to include column headings from the source spreadsheet in the output.
Applies to spreadsheets only.
The default is FALSE.
Set to TRUE to include row headings from the source spreadsheet in the output.
Applies to spreadsheets only.
The default is FALSE.
XML Export SDK C Programming Guide
•
•
•
•
•
•
259
260
•
•
•
•
•
•
Chapter 10 XML Export API Structures bSupportCellSpan bSupportRowSpan bSupportColumnWidth bRemoveEmptyColumns bRemoveEmptyRows bEnableEmptyRows nRowsBeforeSplit
Set to TRUE to include colspan=”n” markup in the output.
Applies to spreadsheets only.
The default value is FALSE.
Set to TRUE to include row span data from the source spreadsheet in the output.
Applies to spreadsheets only.
The default value is FALSE. Currently not supported.
Set to TRUE to include column width data from the source spreadsheet in the output.
Applies to spreadsheets only.
The default value is FALSE.
Set to TRUE to remove spreadsheet columns that do not contain data and to disable cell merging.
Applies to spreadsheets only.
The default is FALSE.
Set this to TRUE to remove spreadsheet rows that do not contain data or color, and to disable cell merging.
Applies to spreadsheets only.
The default is FALSE.
Set to TRUE to display empty rows in a spreadsheet format. If set to
FALSE, empty rows are not displayed. This only applies to 20 or more consecutive empty rows.
Applies to spreadsheets only.
The default is FALSE.
The approximate number of spreadsheet rows to be processed before splitting a table. This helps to prevent large spreadsheet tables from occurring in a single document, which can cause speed and processing problems for the browser.
Applies to spreadsheets only.
The default is 0.
XML Export SDK C Programming Guide
KVXMLOptions
Discussion
A pointer to this structure is passed as an argument to fpConvertStream()
and
KVXMLConvertFile() . If the pointer to the structure is not NULL, the values of the members specified in the structure are used. If the pointer to the structure is
NULL, the default values are used.
Setting the Resolution of Presentations and Graphics
The members cxVectorToRasterXRes and cyVectorToRasterYRes are set in conjunction to specify the resolution (width in pixels) at which presentations and graphics are converted.
You can specify the resolution in one of two ways:
as a proportion of the original resolution
as a specified number of pixels
Setting the Resolution Proportionally
To set the resolution proportionally, set one of the members
( cxVectorToRasterXRes or cyVectorToRasterYRes ) to a percentage of the original resolution, and one to zero. For example, the following setting converts the graphic at 50 percent of the original resolution: cxVectorToRasterXRes=-50 cyVectorToRasterYRes=0
The following setting converts the graphic at 200 percent of the original resolution: cxVectorToRasterXRes=0 cyVectorToRasterYRes=-200
The member that is set to zero is automatically adjusted to maintain the aspect ratio. If both cxVectorToRasterXRes
and cyVectorToRasterYRes
are set to a percentage, cyVectorToRasterYRes defaults to zero during the conversion.
Setting the Resolution in Pixels
To set the resolution in pixels, set one of the members ( cxVectorToRasterXRes or cyVectorToRasterYRes
) to the number of pixels, and one to zero. For example: cxVectorToRasterXRes=0 cyVextorToRasterYRes=1500
The member that is set to zero is automatically adjusted to maintain the aspect ratio. The maximum resolution is 4,000 pixels.
XML Export SDK C Programming Guide
•
•
•
•
•
•
261
262
•
•
•
•
•
•
Chapter 10 XML Export API Structures
KVXMLTemplate
This structure defines the overall framework of the XML output. Members in this structure define the XML markup written at specific points in the output stream.
The pointers contain XML markup that may include embedded KeyView-defined tokens. The XML markup contained in these strings should be well-formed. For the generated document to be valid, the markup must conform to the Verity DTD.
The structure is initialized by calling the function fpConvertStream() or
KVXMLConvertFile()
or
.
typedef struct tag_KVXMLTemplate
{
char *pszMainTop;
char *pszMainBottom;
char *pszFirstH1Start;
char *pszFirstH1End;
char *pszMiddleH1Start;
char *pszMiddleH1End;
char *pszLastH1Start;
char *pszLastH1End;
char *pszH[2..6]XML;
char *pszTOCH[1..6]Start;
char *pszTOC_H[1..6];
char *pszTOCH[1..6]End;
char *pszXFile;
char *pszXStartBlock;
char *pszXEndBlock;
char *pszStartBlock;
char *pszEndBlock;
BOOL bPutBlocksInSeparateFiles;
BOOL bHardPageMakesNewBlock
long lcbBlockSize;
char *pszChunkTemplate;
char *pszUserSummary;
char *pszTOCH[1..6]LeafNode;
}
KVXMLTemplate;
XML Export SDK C Programming Guide
KVXMLTemplate
Member Descriptions pszMainTop pszMainBottom pszFirstH1Start pszFirstH1End pszMiddleH1Start pszMiddleH1End pszLastH1Start pszLastH1End
The markup and tokens inserted at the beginning of the main XML file.
Most of the sample template files feature <MetaData> tags with tokens that store the input document’s metadata. This member does not include the processing instructions or document type declarations that appears at the beginning of an XML document. The document type declaration
<?xml version= is automatically generated by XML Export. If you are using style sheets or the Verity DTD, the processing instructions
<?xml stylesheet= ...> are also automatically generated by XML Export.
The default is NULL.
The markup and tokens inserted at the end of the main XML file.
The default is NULL.
The markup and tokens inserted at the beginning of the first created H1
XML block (that is, the block associated with the first H1 table of contents entry).
The default is NULL.
The markup and tokens inserted at the end of the first created H1 XML block (that is, the block associated with the first H1 table of contents entry).
The default is NULL.
The markup and tokens inserted at the beginning of those H1 XML blocks that are neither the first nor the last H1 blocks created (that is, blocks associated with all but the first and last H1 table of contents entries).
The default is NULL.
The markup and tokens inserted at the end of those H1 XML blocks that are neither the first nor the last H1 blocks created (that is, blocks associated with all but the first and last H1 table of contents entries).
The default is NULL.
The markup and tokens inserted at the beginning of the last created H1
XML block (that is, the block associated with the last H1 table of contents entry).
The default is NULL.
The markup and tokens inserted at the end of the last created H1 XML block (that is, the block associated with the last H1 table of contents entry).
The default is NULL.
XML Export SDK C Programming Guide
•
•
•
•
•
•
263
264
•
•
•
•
•
•
Chapter 10 XML Export API Structures pszH[2..6]XML pszTOCH[1..6]Start pszTOC_H[1..6] pszTOCH[1..6]End pszXFile pszXStartBlock pszXEndBlock
The markup and tokens inserted in an XML block for heading levels 2 through 6.
The default is NULL.
The markup and tokens inserted at the beginning of a table of contents block for heading levels 1 through 6 entries. For example:
<ol list-style-type="upper-roman">
The default is NULL.
The markup and tokens required to process the table of contents entries for heading levels 1 through 6. For example:
<a xmlns:xlink="http://www.w3.org/TR/xlink" xlink href=
"#$ANCHOR"> $TOCTE</a>
If the table of contents heading contains special characters, such as an ampersand (&) or parentheses, you must use the $TOCPE token in the pszTOC_H[1..6] markup. This token retains character entities and prevents validity errors. See
for more information on table of contents tokens.
The default is NULL.
The markup and tokens inserted at the end of a table of contents block for heading levels 1 through 6 entries. For example:
</ol>
The default is NULL.
The markup and tokens generated and placed in an extra XML file. This file holds content from the source document. To process this file, you would use the $XANCHOR token. See
more information on Export tokens.
The default is NULL.
The markup and tokens inserted at the beginning of each XML block generated by the $XANCHOR token. If either this member or pszXEndBlock is defined, both pszStartBlock and pszEndBlock are ignored. See
for more information on
Export tokens.
The default is NULL.
The markup and tokens to output at the end of each XML block generated by the $XANCHOR token. If either this member or pszXStartBlock is defined, both pszStartBlock and pszEndBlock are ignored. See
“Export Tokens” on page for more information on Export tokens.
The default is NULL.
XML Export SDK C Programming Guide
KVXMLTemplate pszStartBlock pszEndBlock
The markup and tokens inserted at the beginning of each block created as a result of lcbBlockSize or bHardPageMakesNewBlock.
The default is NULL.
The markup and tokens inserted at the end of each block created as a result of lcbBlockSize or bHardPageMakesNewBlock.
The default is NULL.
bPutBlocksInSeparateFiles Set to TRUE to create a separate XML file for each heading level 1 block.
Each new block uses the markup defined in pszStartBlock and pszEndBlock . If set to FALSE, then each heading level 1 block is placed sequentially in the same file, after the initial markup is written.
The default is FALSE.
bHardPageMakesNewBlock lcbBlockSize pszChunkTemplate
Set to TRUE to have hard page breaks in the source document generate new XML files during the conversion process. The member pszchunktemplate provides the appropriate table of contents entry for the new block.
Applies to word processing documents and spreadsheets only.
The default is FALSE.
The maximum size (in bytes) of heading level 1 XML output files. This number is used as a guideline and may be exceeded to break content at a logical location (for example, a row boundary).
The default. This means the size is undefined and unlimited.
If an H1 XML block is subdivided into separate files as a result of the size limitations specified in lcbBlockSize, this member provides a template for creating a table of contents entry for the new file. The block number can be made a part of this template by inserting the token
$SPLITBLOCKNUMBER . For example:
Page $SPLITBLOCKNUMBER pszUserSummary
The default is NULL.
The markup and tokens generated when the tokens $USERSUMMARY or
$SUMMARY are used. For example:
<MetaData name=”$NAME” content=”$CONTENT”/> pszTOCH[1..6]LeafNode
The default is NULL.
The markup that replaces pszTOC_H[1..6] entries for leaf nodes in the table of contents. A leaf node is a node that has no children.
The default is NULL.
XML Export SDK C Programming Guide
•
•
•
•
•
•
265
Chapter 10 XML Export API Structures
Discussion
A pointer to this structure is passed as an argument to fpConvertStream()
and
KVXMLConvertFile() . If the pointer to the structure is not NULL, the values of the members specified in the structure are used. If the pointer to the structure is
NULL, the default values are used.
266
•
•
•
•
•
• XML Export SDK C Programming Guide
KVXMLTOCOptions
KVXMLTOCOptions
This structure defines whether a heading is included in the table of contents.
Source text is converted to a heading in the XML output if
it meets all the criteria defined by the members of the headingCreateType member of automatic heading generation.
KVXMLHeadingInfo
KVXMLTOCOptions is set to allow
, and
The structure is initialized by calling the function fpConvertStream() or
KVXMLConvertFile()
or
.
See
for more information on the criteria used to determine whether a heading is included in the table of contents.
Typedef struct tag_KVXMLTOCOptions
{
BOOL bAllowHeadingsInTables;
KVHeadingCreateOptions headingCreateType;
KVXMLHeadingInfo
KVXMLHeadingInfo
KVXMLHeadingInfo
KVXMLHeadingInfo
*pH1;
*pH2;
*pH3;
*pH4;
KVXMLHeadingInfo
KVXMLHeadingInfo
}
KVXMLTOCOptions;
*pH5;
*pH6;
XML Export SDK C Programming Guide
•
•
•
•
•
•
267
268
•
•
•
•
•
•
Chapter 10 XML Export API Structures
Member Descriptions bAllowHeadingsInTables Determines if the text in tables is considered for automatic heading generation. If set to TRUE, the text in tables is included in the determination of headings and table of contents entries.
Applies to word processing documents and spreadsheets only.
The default is FALSE.
headingCreateType
KVXMLHeadingInfo
Determines how XML Export subdivides the source document into table of contents entries. This can be set to one of the two options enumerated in
KVHeadingCreateOptions in kvxml.h. See
“KVHeadingCreateOptions” on page
.
The determination of table of contents entries is based on whether the source document contains heading styles or whether text attributes conform to the criteria defined in the structure KVXMLHeadingInfo. See
Heading styles are predefined style tags, such as “Heading 1” and
“Heading 2” tags in a Microsoft Word document. Text attributes are bold, underlined, italic, and so on.
Applies to word processing documents only.
The default is KVCS_DocHeadingsOnly.
Pointer to the structure KVXMLHeadingInfo. See “KVXMLHeadingInfo” on page .
When the table of contents entries are not based on the source documents heading styles, the table of contents entries are determined by whether text attributes (such as bold, underlined, and italic text) in the source document meet all the criteria defined in KVXMLHeadingInfo.
Discussion
A pointer to this structure is passed as an argument to fpConvertStream()
and
KVXMLConvertFile() . If the pointer to the structure is not NULL, the values of the members specified in the structure are used. If the pointer to the structure is
NULL, the default values are used.
XML Export SDK C Programming Guide
C HAPTER 11
Enumerated Types
This section provides information on some of the enumerated types used by the
XML Export API. It contains the following topics:
XML Export SDK C Programming Guide
•
•
•
•
•
•
269
Chapter 11 Enumerated Types
270
•
•
•
•
•
•
Introduction
The enumerated types are in adinfo.h, kvtypes.h,kvxml.h, and kvxtract.h
. These header files are in the include directory. The first entry in an enumerated type structure should be set to zero (0). Each subsequent entry is increased by 1. For example, the first five entries of KVCharSet in kvtypes.h are:
KVCS_UNKNOWN
KVCS_SJIS
KVCS_GB
KVCS_BIG5
KVCS_KSC
They would be set in the following way:
Enumerated Type
KVCS_UNKNOWN
KVCS_SJIS
KVCS_GB
KVCS_BIG5
KVCS_KSC
Setting
2
3
0
1
4
Many enumerated types may also be set by entering the appropriate symbolic constant, or TRUE/FALSE.
Programming Guidelines
As KeyView is enhanced in future releases, some enumerated types may be expanded. For example, new format IDs may be added to the ENdocFmt enumerated type, or new error codes may be added to the KVErrorCodeEx enumerated type. When using these expandable types, your code should ensure binary compatibility with future releases.
For example, if you use an array to access error messages based on an error code, your code should check the error code is less than KVError_Last before accessing the data. This ensures new error codes are detected when you add
KeyView binary files from new releases to your existing installation.
The following enumerated types are expandable:
KVErrorCodeEx
KVMetadataType
KVCharSet
XML Export SDK C Programming Guide
ENSATableBorder
KVLanguageID
KVSubfileType
ENdocFmt
ENSATableBorder
This enumerated type defines the type of border to display around tables. It is defined in kvtypes.h.
Definition typedef enum tag_ENSATableBorder
{
SA_BaseOnDocument,
SA_NoBorder,
SA_Border
}
ENSATableBorder;
Enumerators
SA_BaseOnDocument
SA_NoBorder
SA_Border
Border type is based on the document.
Table borders are always off.
Table borders are always on.
KVCredKeyType
This enumerated type defines the type of credential used to open a protected file.
See
“KVCredentialComponent” on page . It is defined in kvxtract.h.
Definition typedef enum tag_KVCredKeyType
{
KVCredKeyType_UserName,
KVCredKeyType_UserIdFile,
KVCredKeyType_Password,
}
KVCredKeyType;
XML Export SDK C Programming Guide
•
•
•
•
•
•
271
Chapter 11 Enumerated Types
272
•
•
•
•
•
•
Enumerators
KVCredKeyType_UserName The credential in KVCredentialComponent is a user name.
KVCredKeyType_UserIdFile The credential in KVCredentialComponent is a path to a file containing user IDs.
KVCredKeyType_Password The credential in KVCredentialComponent is a password.
KVErrorCode
This enumerated type defines the type of error generated if Export fails. It is defined in kvtypes.h.
Definition typedef enum tag_KVErrorCode
{
KVERR_Success, /* 0 Success*/
KVERR_DLLNotFound, /* 1 DLL or shared library not found*/
KVERR_OutOfCore, /* 2 memory allocation failure*/
KVERR_processCancelled, /* 3 fpContinue() returns FALSE*/
KVERR_badInputStream,
KVERR_badOutputType,
/* 4 Invalid/corrupt input stream*/
/* 5 Invalid output type requested*/
KVERR_General, /* 6 General error.... */
KVERR_FormatNotSupported, /* 7 Format not supported*/
KVERR_PasswordProtected,
KVERR_ADSNotFound,
/* 8 File is Password Protected*/
/* 9 Adobe Document Server not found*/
KVERR_AutoDetFail, /* 10 Autodetect error*/
KVERR_AutoDetNoFormat, /* 11 Unable to detect file format*/
KVERR_ReaderInitError,
KVERR_NoReader,
/* 12 Error initializing the reader*/
/* 13 No reader available for this format*/
KVERR_CreateOutputFileFailed, file*/
KVERR_CreateTempFileFailed,
/* 14 Unable to create output
/* 15 Unable to create temp file*/
KVERR_ErrorWritingToOutputFile, /* 16 Error writing to output file*/
KVERR_CreateProcessFailed,
KVERR_WaitForChildFailed,
/* 17 Error creating a child process*/
/* 18 Wait for child process failed*/
KVERR_ChildTimeOut, /* 19 Child process hung / timed out*/
XML Export SDK C Programming Guide
KVErrorCode
KVERR_ArchiveFileNotFound, file*/
KVERR_ArchiveFatalError should abort*/
/* 20 Attempt to extract nonexistent
/* 21 Fatal error processing archive -
}
KVErrorCode;
Enumerators
KVERR_SUCCESS
KVERR_DLLNotFound
KVERR_OutOfCore
KVERR_processCancelled
KVERR_badInputStream
KVERR_badOutputType
KVERR_General
KVERR_FormatNotSupported
KVERR_PasswordProtected
KVERR_ADSNotFound
KVERR_AutoDetFail
KVERR_AutoDetNoFormat
KVERR_ReaderInitError
KVERR_NoReader
KVERR_CreateOutputFileFailed
Function completed successfully.
A DLL or shared library was not found.
Memory allocation failure.
Callback function fpContinue() returns FALSE.
Invalid or corrupt input stream.
Invalid output is requested.
General error.
File format is not supported.
File is encrypted or password-protected. KeyView only supports secure PST files.
Adobe Document Server not found. This error is obsolete.
Autodetect error.
Unable to detect file format.
Error initializing the reader.
No reader available for this format.
Unable to create output file.
If the overwrite flag in KVExtractSubFileArg is FALSE, and a sub file has the same name as a file in the target path, this
error is generated. See “KVExtractSubFileArg” on page
.
KVERR_CreateTempFileFailed Unable to create temporary file.
KVERR_ErrorWritingToOutputFile Error writing to output file.
KVERR_CreateProcessFailed
KVERR_WaitForChildFailed
Error creating a child process.
Wait for child process failed.
XML Export SDK C Programming Guide
•
•
•
•
•
•
273
Chapter 11 Enumerated Types
274
•
•
•
•
•
•
KVERR_ChildTimeOut
KVERR_ArchiveFileNotFound
KVERR_ArchiveFatalError
Child process hung/timed out.
Attempt to extract nonexistent file.
Fatal error processing an archive file.
KVErrorCodeEx
This enumerated type defines extended error codes. It is defined in kvtypes.h.
Definition typedef enum tag_KVErrorCodeEx
{
KVError_OpenStreamFailure = KVERR_ArchiveFatalError + 1, /* 22
KVOpen stream failure */
/* 23 Interface function not found */
KVError_InputFileNotFound, /* 24 Cannot find input file*/
/* 25 Cannot open output file*/
KVError_MemoryLeak, /*
KVError_MemoryOverwrite, /*
26 Memory leak*/
27 Memory overwrite*/
KVError_GPF, filtering*/
/* 28 Exception during oop
KVError_OopCore, /*
KVError_KVoopLogFailed, /*
29 Core dump in child process*/
30 Creation of oop error log failed*/
KVError_OverNestedFileLimit, /* 31 File exceeds nested file limit*/
KVError_PSTAccessFailed,
KVError_PasswordRequired, file*/
/* 32 Access failed on PST files*/
/* 33 Password required to access
KVError_InvalidArgs invalid*/
KVError_OopBadConfig, incomplete*/
/*
KVError_ReaderUsageDenied, license*/
34 Input argument/structure is
/* 35 Reader requires a valid
/* 36 Config buffer data was
KVError_OopBrokenPipe, failed*/
KVError_OopPipeOEF, write*/
/* 37 Read/write to/from pipe
/* 38 Pipe was closed prior to read/
KVError_IPCTimeOut, select*/
/* 39 Pipe/socket timed out on poll/
OOP server but context driver does not exist on the server*/
XML Export SDK C Programming Guide
KVErrorCodeEx
OOP service that does not exist*/
KVError_ZeroFile, /* 42 Input file is empty or zero bytes */
KVError_CompressionNotSupported /* 43 File or subfile is compressed with unsupported method */
KVError_Last
}
KVErrorCodeEx;
Enumerators
KVError_OpenStreamFailure =
KVERR_ArchiveFatalError +1
Failed to open a stream during out-of-process filtering. This is an extended error for the code KVERR_General. This is used by KeyView Filter.
KVError_InterfaceFunctionNotFound An interface function was not found during out-of-process filtering. This is an extended error for the code
KVERR_General . This is used by KeyView Filter.
KVError_InputFileNotFound
KVError_OpenOutputFileFailed
Could not find the input file during out-of-process filtering.
This is an extended error for the code KVERR_General.
This is used by KeyView Filter.
Could not open the output file during out-of-process filtering. This is an extended error for the code
KVERR_General . This is used by KeyView Filter.
KVError_MemoryLeak
KVError_MemoryOverwrite
KVError_GPF
KVError_OopCore
KVError_KVoopLogFailed
KVError_OverNestedFileLimit
Memory leak occurred during out-of-process filtering. This is an extended error for the code KVERR_General. This is used by KeyView Filter.
Memory overwrite occurred during out-of-process filtering.
This is an extended error for the code KVERR_General.
This is used by KeyView Filter.
Exception occurred during out-of-process filtering. This is an extended error for the code KVERR_General. This is used by KeyView Filter.
Memory dump was generated in a child process during out-of-process filtering. This is an extended error for the code KVERR_General. This is used by KeyView Filter.
Creation of out-of-process error log failed. This is an extended error for the code KVERR_General. This is used by KeyView Filter.
The container file has more than the allowable number of child documents. One or more child documents were not converted. Currently, this is not used.
XML Export SDK C Programming Guide
•
•
•
•
•
•
275
276
•
•
•
•
•
•
Chapter 11 Enumerated Types
KVError_PSTAccessFailed
KVError_PasswordRequired
KVError_InvalidArgs
KVError_ReaderUsageDenied
KVError_OopBadConfig
KVError_OopBrokenPipe
KVError_OopPipeOEF
KVError_IPCTimeOut
The PST file could not be converted. This error may be returned when a call to fpOpenFile() returns NULL for one of the following reasons:
Microsoft Outlook client is not installed
Microsoft Outlook client is installed, but is not the default email client
Microsoft Outlook client is installed, but is not configured correctly
PST file is corrupt
PST file is read-only (PST files must allow read and write access)
MAPI call fails
To open the file, credentials must be provided. This error may be returned when a call to fpOpenFile() returns
NULL.
The input argument or structure is invalid. This is generated by the File Extraction APIs.
The current license key does not enable the document reader required to convert the file. This error may be returned when a call to fpOpenFile() returns NULL.
Some document readers are considered advanced features and are licensed separately from the KeyView
SDK (for example, the PST and MBX readers). Contact your Autonomy sales representative to get an updated license key.
Information in the kvxconfig.ini file is incomplete and cannot be used to the XML file. This is used by KeyView
Filter.
Data was not transferred between the parent and child processes during out-of-process filtering because either the parent or child failed. This is used by KeyView Filter.
Data was not transferred between the parent and child processes during out-of-process filtering because the parent process was shutdown. This is used by KeyView
Filter.
Either the parent or child process is waiting for a reply or request during out-of-process filtering. This is used by
KeyView Filter.
XML Export SDK C Programming Guide
KVXMLStyleSheetType
KVError_InvalidOopDriverSignature A client sent a request to an out-of-process server, but the context driver does not exist on the server. This is used by
KeyView Filter.
KVError_InvalidOopServiceSignature A client sent a request to a File Extraction service that does not exist.
If this error is generated on the call to fpClose(), it can be ignored. This is used by KeyView Filter.
KVError_ZeroFile
KVError_CompressionNotSupported
The input file is empty or zero bytes.
The file or subfile is compressed with an unsupported compression method.
Discussion
As error reporting is enhanced in future releases, new error messages may be added to this enumerator type. When using this type, your code should ensure binary compatibility with future releases. See
“Programming Guidelines” on page .
If an extended error code is called for a format to which the error does not apply, the code KVError_Last is returned.
KVXMLStyleSheetType
This enumerated type defines the options for processing style sheet information. It is defined in kvxml.h.
Definition typedef enum tag_KVXMLStyleSheetType
{
STYLESHEET_DISABLED = 0,
XML_CSS,
XML_XSL,
}
KVXMLStyleSheetType;
XML Export SDK C Programming Guide
•
•
•
•
•
•
277
278
•
•
•
•
•
•
Chapter 11 Enumerated Types
Enumerators
STYLESHEET_DISABLED Disables Cascading Style Sheet (CSS) formatting.
XML_CSS Enables cascading style sheet (CSS) formatting and generates an external file or uses an existing external file which is referenced in a <?xml-stylesheet...?> processing instruction.
XML_XSL Enables Extensible Stylesheet Language (XSL) formatting and uses an external XSL file which is referenced in a
<?xml-stylesheet...?> processing instruction.
KVXMLAnchorType
This enumerated type defines the anchor types for the output stream. It is defined in kvxml.h.
Definition typedef enum tag_KVXMLAnchorType
{
VectorPictureAnchor = 0,
RasterPictureAnchor,
H1Anchor,
H2Anchor,
H3Anchor,
H4Anchor,
H5Anchor,
H6Anchor,
XAnchor,
AnimatedGIFAnchor,
CSSAnchor,
XSLAnchor,
GeneralAnchor,
DBAnchor,
JPEGAnchor
}
KVXMLAnchorType;
XML Export SDK C Programming Guide
KVXMLGraphicType
Enumerators
VectorPictureAnchor
RasterPictureAnchor
H1Anchor
H2Anchor
H3Anchor
H4Anchor
H5Anchor
H6Anchor
XAnchor
AnimatedGIFAnchor
CSSAnchor
XSLAnchor
GeneralAnchor
DBAnchor
JPEGAnchor
Anchor for embedded vector graphics.
Anchor for embedded raster graphics.
Anchor for heading level H1 blocks.
Anchor for heading level H2 blocks.
Anchor for heading level H3 blocks.
Anchor for heading level H4 blocks.
Anchor for heading level H5 blocks.
Anchor for heading level H6 blocks.
Anchor for an external file.
Anchor for embedded animated GIF graphics.
Anchor for external CSS file.
Anchor for external XSL file.
Reserved for future use.
Used internally.
Anchor for embedded JPEG graphic.
KVXMLGraphicType
This enumerated type defines graphic formats to which embedded graphics and presentations are converted. It is defined in kvxml.h.
Definition typedef enum tag_KVXMLGraphicType
{
KVGFX_GIF,
KVGFX_JPEG,
KVGFX_PNG,
KVGFX_CGM,
KVGFX_WMF,
KVGFX_JAVA
}
KVXMLGraphicType;
XML Export SDK C Programming Guide
•
•
•
•
•
•
279
280
•
•
•
•
•
•
Chapter 11 Enumerated Types
Enumerators
KVGFX_GIF
KVGFX_JPEG
KVGFX_PNG
KVGFX_CGM
KVGFX_WMF
KVGFX_JAVA
Specifies GIF (Graphics Interchange Format) as the graphic type.
Specifies JPEG (Joint Photographic Experts Group) as the graphic type.
Specifies PNG (Portable Network Graphics) as the graphic type.
Specifies CGM (Computer Graphics Metafile) as the graphic type.
Specifies WMF (Windows Metafile) as the graphic type.
Deprecated.
Also see
“Display Vector Graphics on UNIX and Linux” on page .
KVHeadingCreateOptions
This enumerated type defines whether Export generates blocks and block chunks
(see “Definition of Terms” on page ) based only on the heading styles defined in
a source document (if they are available), or based on both the source document’s heading styles and headings that are created automatically by Export.
Headings that are created automatically by Export are based on the text attributes of the source document as defined by KVXMLHeadingInfo (see
). It is defined in kvxml.h.
Definition typedef enum tag_KVHeadingCreateOptions
{
KVHC_DocHeadingsOnly,
KVHC_CreateHeadingsAlways
}
KVHeadingCreateOptions;
XML Export SDK C Programming Guide
KVXMLEmptyParaType
Enumerators
KVHC_DocHeadingsOnly This instructs Export to rely exclusively on heading styles defined in the source document.
However, if the source document does not contain heading styles, Export generates blocks on its own using the criteria defined by the structure
KVHeadingInfo .
KVHC_CreateHeadingsAlways This instructs Export to use the heading styles in the source document when available, and to also automatically create table of contents entries based on the criteria defined by the structure
KVHeadingInfo .
KVXMLEmptyParaType
This enumerated type defines the options for paragraphs that do not contain content. It is defined in kvxml.h.
Definition typedef enum tag_KVXMLEmptyParaType
{
KVEPT_SUPPRESS,
KVEPT_EMPTY,
/* No markup generated */
/* Use <p/> */
KVEPT_VERBOSE
}
/* Use <p id="...> </p> */
KVXMLEmptyParaType;
XML Export SDK C Programming Guide
•
•
•
•
•
•
281
282
•
•
•
•
•
•
Chapter 11 Enumerated Types
Enumerators
KVEPT_SUPPRESS
KVEPT_EMPTY
KVEPT_VERBOSE paragraphs without content are ignored. They do not contribute white space and do not affect the ID number of subsequent paragraphs. This is the default value.
paragraphs without content are represented by an
“empty” paragraph element <p/>. These contribute minimal white space, but do not affect the ID number of subsequent paragraphs.
paragraphs without content contain a fully-defined start tag <p id=”...”> with all non-default attributes, a
character entity, and end tag </p>. These contribute additional white space and affect the ID number of subsequent paragraphs.
KVXMLHardPageBreakType
This enumerated type defines the options for hard page breaks. It is defined in kvxml.h
.
Definition typedef enum tag_KVXMLHardPageBreakType
{
KVHPBT_SUPPRESS,
KVHPBT_EMPTY,
/* No markup generated */
/* Use <Page/> */
KVHPBT_EMPTYID,
KVHPBT_ID
/* Use <Page id="n"/> */
/* Use <Page id="n"> ... </Page> */
}
KVXMLHardPageBreakType;
Enumerators
KVHPBT_SUPPRESS No markup is generated for hard page breaks. This is the default value.
XML Export SDK C Programming Guide
KVMetadataType
KVHPBT_EMPTY
KVHPBT_EMPTYID
KVHPBT_ID
An empty page element, <Page/>, without ID attributes is generated for hard page breaks.
An empty page element, <Page id=”n”/>, with ID attributes is generated for hard page breaks. The ID is incremented for each subsequent hard page break.
A “non-empty” “Page” element is generated for hard page breaks. The page tags enclose the contents immediately after the <WP> tag. When subsequent hard page breaks are encountered, the previous “Page” element is closed with a </
Page> tag, and a <Page id=”...”> opening tag is added.
The final “Page” element is closed immediately before the closing </WP> tag.
KVMetadataType
This enumerated type defines the data type of metadata that can be extracted from a sub file in a mail message or mail store. If a metadata field has a corresponding KeyView type in KVMetadataType, the metadata is converted to the KVMetadataElem structure, and the structure member isDataValid is 1.
See
“KVMetadataElem” on page . It is defined in kvtypes.h.
Definition typedef enum
{
KVMetadata_Unknown
KVMetadata_Bool
= 0,
= 1,
KVMetadata_Binary
KVMetadata_Int4
= 2,
= 3,
KVMetadata_UInt4
KVMetadata_Int8
= 4,
= 5,
KVMetadata_UInt8
KVMetadata_String
= 6,
= 7,
KVMetadata_Unicode
KVMetadata_DateTime
= 8,
= 9,
KVMetadata_Float
KVMetadata_Double
= 10,
= 11,
KVMetadata_Last
}
KVMetadataType;
XML Export SDK C Programming Guide
•
•
•
•
•
•
283
284
•
•
•
•
•
•
Chapter 11 Enumerated Types
Enumerators
KVMetadata_Unknown
KVMetadata_Bool
KVMetadata_Binary
KVMetadata_Int4
The value in the property is of an unknown type.
The value in the property is a boolean. The corresponding
MAPI type is PT_BOOLEAN.
The value in the property is a byte array. The corresponding MAPI type is PT_BINARY.
The value in the property is a signed 4-byte integer. The corresponding MAPI types are PT_I2, PT_SHORT,
PT_I4 , and PT_LONG.
KVMetadata_UInt4
KVMetadata_Int8
KVMetadata_UInt8
KVMetadata_String
The value in the property is an unsigned 4-byte integer.
This type is not currently supported.
The value in the property is a signed 8-byte integer. This type is not currently supported.
The value in the property is an unsigned 8-byte integer.
This type is not currently supported.
The value in the property is a string. The corresponding
MAPI type is PT_STRING8.
KVMetadata_Unicode The value in the property is a Unicode string. The corresponding MAPI type is PT_UNICODE.
KVMetadata_DateTime The value in the property is a date and time. The corresponding MAPI type is PT_SYSTIME.
KVMetadata_Float
KVMetadata_Double
The value in the property is a 4-byte float. The corresponding MAPI type is PT_FLOAT.
The value in the property is an 8-byte double. The corresponding MAPI type is PT_DOUBLE.
Discussion
New types may be added to this enumerated type. When using this type, your code should ensure binary compatibility with future releases. See
XML Export SDK C Programming Guide
KVMetaNameType
KVMetaNameType
This enumerated type defines the type of metadata fields extracted from a sub file in a mail message or mail store. See
“KVMetaName” on page . It is defined in
kvxtract.h
.
Definition typedef enum
{
KVMetaNameType_Integer = 0,
KVMetaNameType_String
}
KVMetaNameType;
Enumerators
KVMetaNameType_Integer
KVMetaNameType_String
The metadata field is an integer.
The metadata field is a string.
KVSumInfoType
This enumerated type defines the data type of the metadata field extracted from a document. See
“Extract Metadata” on page . It is defined in kvtypes.h.
Definition typedef enum tag_KVSumInfoType
{
KV_String
KV_Int4
= 0x1,
= 0x2,
KV_DateTime
KV_ClipBoard
= 0x3,
= 0x4,
KV_Bool = 0x5,
KV_Unicode = 0x6,
KV_IEEE8
KV_Other
}
KVSumInfoType;
= 0x7,
= 0x8
XML Export SDK C Programming Guide
•
•
•
•
•
•
285
Chapter 11 Enumerated Types
Enumerators
KV_String
KV_Int4
KV_DateTime
The value in the metadata field is a string.
The value in the metadata field is an integer.
The value in the metadata field is a date and time. This type is a
64-bit value representing the number of 100-nanosecond intervals since January 1, 1601 (Windows FILETIME EPOCH). You may need to convert this value into another format.
KV_ClipBoard
KV_Bool
KV_Unicode
KV_IEEE8
KV_Other
Currently not supported.
The value in the metadata field is a boolean.
The value in the metadata field is a Unicode string.
The value in the metadata field is an IEEE 8-byte integer.
The value in the metadata field is user-defined.
286
•
•
•
•
•
•
KVSumType
This enumerated type defines the metadata fields that can be extracted from a document.
Types 0 to 34 and type 42 are office summary fields.
Types 35 to 40 are computer-aided design (CAD) metadata fields.
Type 41, KV_OrigAppVersion, is shared by office software and CAD.
Types 43 or greater are reserved for any non-standard metadata field defined in a document. See
“Extract Metadata” on page . It is defined in kvtypes.h.
Definition typedef enum tag_KVSumType
{
KV_CodePage = 0,
KV_Title = 1,
KV_Subject = 2,
KV_Author = 3,
KV_Keywords = 4,
KV_Comments = 5,
KV_Template = 6,
KV_LastAuthor = 7,
XML Export SDK C Programming Guide
KVSumType
Enumerators
KV_CodePage
KV_Title
KV_Subject
KV_Author
KV_RevNumber = 8,
KV_EditTime = 9,
KV_LastPrinted = 10,
KV_Create_DTM = 11,
KV_LastSave_DTM = 12,
KV_PageCount = 13,
KV_WordCount = 14,
KV_CharCount = 15,
KV_ThumbNail = 16,
KV_AppName = 17,
KV_Security = 18,
KV_Category = 19,
KV_PresentationTarget = 20,
KV_Bytes = 21,
KV_Lines = 22,
KV_Paragraphs = 23,
KV_Slides = 24,
KV_Notes = 25,
KV_HiddenSlides = 26,
KV_MMClips = 27,
KV_ScaleCrop = 28,
KV_HeadingPairs = 29,
KV_TitlesofParts = 30,
KV_Manager = 31,
KV_Company = 32,
KV_LinksUpToDate = 33,
KV_HyperlinkBase = 34,
KV_Layouts = 35,
KV_Objects = 36,
KV_FileVersion = 37,
KV_LastFileVersion = 38,
KV_OrigFileVersion = 39,
KV_OrigFileType = 40,
KV_OrigAppVersion = 41,
KV_ContentStatus = 42,
KV_UserDefined = 43
}
KVSumType;
Code page of the document.
Contents of the “Title” property field taken from the source document.
Contents of the “Subject” property field taken from the source document.
Contents of the “Author” property field taken from the source document.
XML Export SDK C Programming Guide
•
•
•
•
•
•
287
Chapter 11 Enumerated Types
288
•
•
•
•
•
•
KV_Keywords
KV_Comments
KV_Template
KV_LastSavedby
KV_RevNumber
KV_EditTime
KV_LastPrinted
KV_Create_DTM
KV_LastSave_DTM
KV_PageCount
KV_WordCount
KV_CharCount
KV_ThumbNail
KV_AppName
KV_Security
KV_Category
Contents of the “Keywords” property field taken from the source document.
Contents of the “Comments” property field taken from the source document.
Contents of the “Template” property field taken from the source document.
Contents of the “Last saved by” property field taken from the source document.
Contents of the “Revision number” property field taken from the source document.
Contents of the “Total editing time” property field taken from the source document.
Contents of the “Printed” property field taken from the source document.
Contents of the “Created” property field taken from the source document.
Contents of the “Modified” property field taken from the source document.
Contents of the “Pages” property field taken from the source document. The field provides the number of pages in the document.
Contents of the “Words” property field taken from the source document. The field provides the number of words in the document.
Contents of the “Characters” property field taken from the source document.
The field provides the number of characters in the document.
Thumbnail image of a document.
Contents of the “Type” property field taken from the source document. This field identifies the application used to read the document.
Contents of the “Attributes” property field taken from the source document.
Contents of the “Category” property field taken from the source document.
KV_PresentationTarget Target format for presentations (35mm, printer, video, and so forth).
KV_Bytes Contents of the “Size” property field taken from the source document. The field provides the size of the file in bytes.
KV_Lines
KV_Paragraphs
Contents of the “Lines” property field taken from the source document. The field provides the number of lines in the document.
Contents of the “Paragraphs” property field taken from the source document. The field provides the number of paragraphs in the document.
KV_Slides
KV_Notes
Contents of the “Slides” property field taken from a presentation document.
The field provides the number of slides in the document.
Contents of the “Notes” property field taken from a presentation document.
The field provides the number of notes in the document.
XML Export SDK C Programming Guide
KVSumType
KV_HiddenSlides
KV_MMClips
KV_ScaleCrop
KV_HeadingPairs
KV_TitlesofParts
KV_Manager
KV_Company
KV_LinksUpToDate
KV_HyperlinkBase
KV_Layouts
KV_Objects
KV_FileVersion
KV_LastFileVersion
KV_OrigFileVersion
KV_OrigFileType
KV_OrigAppVersion
KV_ContentStatus
KV_UserDefined
Contents of the “Hidden slides” property field taken from a presentation document. The field provides the number of hidden slides in the document.
Contents of the “Multimedia clips” property field taken from a presentation document. The field provides the number of multimedia clips in the document.
Boolean specifies whether thumbnails are cropped or scaled.
Internally used property indicating the grouping of different document parts and the number of items in each group.
Contents of the “Document Contents” property field taken from the source document. The field contains a list of the parts of the file, such as the names of macro sheets in Microsoft Excel or the headings in Word.
Contents of the “Manager” property field taken from the source document.
Contents of the “Company” property field taken from the source document.
Boolean specifies whether links in the document are resolved and current.
The base address used for all relative links in the file.
The number of layouts in the AutoCAD drawing.
The approximate number of objects in the AutoCAD drawing.
The AutoCAD version (for example, R13, R14) of the drawing.
The AutoCAD version (for example, R13, R14) that the AutoCAD drawing was last saved as.
The AutoCAD version (for example, R13, R14) of the original source file.
The AutoCAD file type (for example, DWG, DXF or DWB) of the original source file.
The AutoCAD version (for example, R13, R14) of the application that created the originally source file.
The status of the content, for example Draft, Reviewed, or Final.
Contents of the first entry in the array of non-standard metadata. This could be user-defined metadata, or metadata unique to a file type.
XML Export SDK C Programming Guide
•
•
•
•
•
•
289
290
•
•
•
•
•
•
Chapter 11 Enumerated Types
LPDF_DIRECTION
This enumerated type defines the paragraph direction of extracted paragraphs from a PDF file when logical order is enabled. See
“Convert PDF Files to a Logical
Reading Order” on page It is defined in kvtypes.h.
Definition typedef enum{
LPDF_RAW = 0,
LPDF_LTR,
LPDF_RTL,
LPDF_AUTO
} LPDF_DIRECTION ;
Enumerators
LPDF_RAW
LPDF_LTR
LPDF_RTL
LPDF_AUTO
Unstructured paragraph flow. This is the default behavior.
Logical reading order and left-to-right paragraph direction.
Logical reading order and right-to-left paragraph direction.
Logical reading order. The PDF reader determines the paragraph direction for each PDF page, and then sets the direction accordingly.
This is the default when logical order is enabled.
XML Export SDK C Programming Guide
Appendixes
This section lists supported formats, supported character sets and redistributed files, and provides information on format detection. It contains the following appendixes:
Files Required for Redistribution
Extract and Format Lotus Notes Sub Files
Appendixes
292
•
•
•
•
•
• XML Export SDK C Programming Guide
A PPENDIX A
Supported Formats
This section lists information about the file formats that can be detected and processed (either filtered, converted, or displayed) by the KeyView suite of products. The KeyView suite includes KeyView Filter SDK, KeyView Export SDK, and KeyView Viewing SDK.
XML Export SDK C Programming Guide
•
•
•
•
•
•
293
294
•
•
•
•
•
•
Appendix A Supported Formats
Supported Formats
The tables in this section provide the following information:
The file formats supported by the Filter API, Export API, Viewing API, and File
Extraction API. The supported versions and the format’s extension are also listed.
The formats listed in this section can also be detected by the KeyView format detection module ( kwad ). The section
or displayed.
The file formats for which KeyView can detect and extract the character set and metadata information (properties such as title, author, and subject).
Even though a file format may be able to provide character set information, some documents may not contain character set information. Therefore, the document reader would not be able to determine the character set of the document. In this case, either the operating system code page or the character set specified in the API is used.
The document reader used to filter each format.
Symbol Description
Y
N
P
T
M
Format is supported.
Metadata can be extracted for this format.
Character set can be determined for this format.
Format is not supported.
Metadata cannot be extracted for this format.
Character set cannot be determined for this format.
Partial metadata is extracted from this format. Some non-standard fields are not extracted.
Only text is extracted from this format. Formatting information is not extracted.
Only metadata (title, subject, author, and so on) is extracted from this format. Text and formatting information are not extracted.
XML Export SDK C Programming Guide
Archive Formats
Format
7-Zip
AD1
BinHex
Bzip2
Expert Witness
Compression
Format (EnCase)
GZIP n/a
6
7
Version
4.57
n/a n/a
2
ISO
Java Archive
Legato
EMailXtender
Archive
MacBinary
Mac Disk Copy Disk
Image
Microsoft Backup
File
Microsoft Cabinet format
Microsoft Compiled
HTML Help n/a n/a n/a n/a n/a n/a
1.3
3
Reader z7zsr ad1sr kvhqxsr
Extension Filter Export View Extract Metadata Charset
7Z N N Y Y N n/a
AD1
HQX
N
N
N
N
Y
Y
Y
Y
N
N n/a n/a bzip2sr encasesr
BZ2 encase2sr Lx01
N
E01, L01 N
N
N
N
N
Y
Y
Y
Y
Y
Y
N
N
N n/a n/a n/a kvgzsr kvgz isosr unzip emxsr
GZ
GZ
ISO
JAR
EMX
N
N
N
N
N
N
N
N
N
N
Y
Y
N
Y
Y
Y
Y
Y
N
Y
N
N
N
N
N n/a n/a n/a n/a n/a
N
N
N
N
N
N
N
N
N
Header
/Footer
N
N macbinsr dmgsr bkfsr cabsr chmsr
BIN
DMG
BKF
CAB
CHM
N
N
N
N
N
N
N
N
N
N
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
N
N
N
N
N n/a n/a n/a n/a n/a
N
N
N
N
N
Format
Microsoft
Compressed Folder
Version n/a
Reader lzhsr
PKZIP
RAR archive
Tape Archive
UNIX Compress through 9.0
unzip
2.0 through
3.5
rarsr n/a n/a tarsr kvzeesr
UUEncoding all versions
Windows Scrap File n/a kvzee uudsr olesr
WinZip through 10 unzip
Binary Format
TAR
Z
Z
UUE
SHS
ZIP
Extension Filter Export View Extract Metadata Charset
LZH
LHA
N N N Y N n/a
ZIP
RAR
N
N
N
N
Y
N
Y
Y
N
N n/a n/a
N
N
Header
/Footer
N
N
N
N
N
N
N
N
N
N
N
N
N
Y
Y
Y
N
N
Y
N
Y
Y
Y
Y
Y
N
N
N
N
N
N n/a n/a n/a n/a n/a n/a
N
N
N
N
N
N
Format Version Reader Extension Filter
Executable n/a
Link Library n/a exesr exesr
EXE
DLL
N
N
Export
N
N
View
Y
Y
Extract Metadata
N
N
N
N
Charset n/a n/a
Header/
Footer
N
N
Computer-Aided Design Formats
Format
AutoCAD
Drawing
AutoCAD
Drawing
Exchange
CATIA formats
Version
R13, R14,
R15/2000,
2004, 2007,
2010, 2013
R13, R14,
R15/2000,
2004, 2007,
2010, 2013
5
Microsoft Visio 4, 5, 2000,
2002, 2003,
2007, 2010
5
2013
Reader kpODArdr kpDWGrdr
1 kpODArdr kpDXFrdr kpCATrdr vsdsr kpVSDrdr
ActiveX
components
Extension Filter Export View
DWG
DXF
Y
Y
CAT
4
VSD
VSD, VSS
VST
N
VSDM
VSSM
VSTM
VSDX
VSSX
VSTX
N
Y
Y
Y
2
Y
N
Y
Y
N
3
Y
Y
N
Y
Y
Y
7
Extract Metadata Charset
N
N
N
Y
N
N
6
Y
Y
Y
Y
Y
Y
Y
Y
N
Y
Y
N
Header/
Footer
N
N
N
N
N
N
1. On Windows platforms, kpODArdr is used for all versions up to 2007 and graphic rendering is supported; for later versions, only text extraction is supported through the kpDWGrdr or kpDXFrdr reader.
2. On non-Windows platforms, graphic rendering is supported through the kpDWGrdr reader for versions R13, R14, R15, and R18 (2004); for other versions, only text extraction is supported.
3. On non-Windows platforms, graphic rendering is supported through the kpDXFrdr reader for versions R13, R14, R15, and R18 (2004); for other versions, only text extraction is supported.
4. All CAT file extensions, for example CATDrawing, CATProduct, CATPart, and so on.
5. Viewing and Export use the graphic reader, kpVSDrdr, for Microsoft Visio 2003, 2007, and 2010, and vsdsr for all earlier versions; image fidelity in Viewing and Export is therefore only supported for versions 2003 and above. Filter uses vsdsr for all versions.
6. Extraction of embedded OLE objects is supported for Filter on Windows platforms only.
7. Visio 2013 is supported in Viewing only, with the support of ActiveX components from the Microsoft Visio 2013 Viewer. Image fidelity is supported but other features, such as highlighting, are not.
Database Formats
Format Version dBase Database III+, IV
Microsoft Access 95, 97, 2000,
2002, 2003,
2007, 2010,
2013
Microsoft Project 2000, 2002,
2003, 2007,
2010, 2013
Reader Extension Filter Export View Extract Metadata Charset dbfsr mppsr
DBF mdbsr MDB,
ACCDB
MPP
1. Charset is not supported for Microsoft Access 95 or 97.
Y
Y
Y
Y
T
Y
Y
T
Y
N
N
Y
N
N
Y
N
Y
1
Y
Header
/Footer
N
N
N
Desktop Publishing
Format
Microsoft
Publisher
Version Reader Extension Filter Export View Extract
98 to 2013 mspubsr PUB Y T T Y
Metadata
Y
Charset
Y
Header/
Footer
N
Display Formats
Format Version Reader
Adobe PDF 1.1 to 1.7
pdfsr kppdfrdr kppdf2rdr
2
Extension Filter Export View Extract Metadata
Y
N
N
Y
Y
Y
N
Y
Y
Y
1
N
N
Y
N
N
Charset
Y
N
N
Header/
Footer
N
N
N
1. Includes support for extraction of subfiles from PDF Portfolio documents.
2. kppdf2rdr is an alternate graphic-based reader that produces high-fidelity output but does not support other features such as highlighting or text searching.
Graphic Formats
Format
Computer Graphics
Metafile
CorelDRAW
2
Version Reader n/a kpcgmrdr
1
Extension
CGM
Filter Export View Extract Metadata Charset
Y Y Y N N N
Header
/Footer
N through
9.0
10, 11,
12, X3 n/a n/a kpcdrrdr kpdcxrdr dcmsr
CDR
DCX
DCM
N
N
M
Y
Y
N
Y
Y
N
N
N
N
N
N
Y
N
N
N
N
N
N
DCX Fax System
Digital Imaging &
Communications in
Medicine (DICOM)
Encapsulated
PostScript (raster)
Enhanced Metafile
TIFF header n/a kpepsrdr kpemfrdr
EPS
EMF
N
Y
Y
Y
Y
Y
N
N
N
Y
N
N
N
N
Format
GIF
JBIG2
JPEG
JPEG 2000
Lotus AMIDraw
Graphics
Lotus Pic
Macintosh Raster
MacPaint
Microsoft Office
Drawing
Omni Graffle
PC PaintBrush
Portable Network
Graphics
SGI RGB Image
Sun Raster Image n/a
2 n/a n/a n/a
3 n/a
Version Reader
87, 89 kpgifrdr gifsr n/a n/a
Extension
GIF kpJBIG2rdr JBIG2 kpjpgrdr JPEG n/a n/a jpgsr kpjp2000rdr JP2, JPF,
J2K, jp2000sr
JPWL,
JPX, PGX kpsdwrdr SDW N
M
N
N
N
M
Filter Export View Extract Metadata Charset
N
M
Y
M
Y
N
N
N
N
Y
N
N
Y
Y
M
Y
M
Y
Y
N
Y
N
N
N
N
N
N
N
N
Y
N
Y
N
N
N
N
N
N
N
N
N
N
Header
/Footer
N
N
Y Y N N N N n/a n/a kppicrdr kppctrdr kpmacrdr kpmsordr kpGFLrdr kppcxrdr kppngrdr pngsr kpsgirdr kpsunrdr
PIC
PIC
PCT
PNTG
MSO
RGB
RS
Y
N
N
N
GRAFFLE Y
PCX N
PNG
PNG
N
M
N
N
Y
Y
Y
Y
N
Y
Y
M
Y
Y
Y
Y
Y
Y
N
Y
Y
N
Y
Y
N
N
N
N
N
N
N
N
N
N
N
N
N
N
Y
N
N
Y
N
N
N
N
N
N
Y
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
Format
Tagged Image File
Truevision Targa
Windows Animated
Cursor
Windows Bitmap
Version Reader through
6.0
3 tifsr kptifrdr
2 n/a kptrardr kpanirdr n/a
Windows Icon Cursor
Windows Metafile n/a
3
WordPerfect Graphics 1 1
WordPerfect Graphics 2 2, 7
1. Files with non-partitioned data are supported.
2. CDR/CDR with TIFF header.
kpbmprdr bmpsr kpicordr kpwmfrdr kpwpgrdr kpwg2rdr
Extension
TIFF
TIFF
TGA
ANI
BMP
BMP
ICO
WMF
WPG
WPG
Filter Export View Extract Metadata Charset
M
N
N
N
N
M
N
Y
N
N
M
Y
Y
Y
Y
M
Y
Y
Y
Y
N
Y
Y
Y
Y
N
Y
Y
Y
Y
N
N
N
N
N
N
N
N
N
N sional, CCITT Group 4 T6, LZW, JPEG (only Gray, RGB and CMYK color space are supported), and PackBits.
Y
N
N
N
N
Y
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
Header
/Footer
N
N
N
N
N
N
N
N
Mail Formats
Format
Documentum
EMCMF
Domino XML
Language
1
Version n/a n/a
GroupWise FileSurf n/a
Legato Extender n/a
Lotus Notes database
Mailbox
2
4, 5, 6.0, 6.5,
7.0, 8.0
Thunderbird
1.0, Eudora 6.2
2004 Microsoft
Entourage
Database
Microsoft Outlook 97, 2000, 2002,
2003, 2007,
2010, 2013
5.0, 6.0
Microsoft Outlook
DBX
Microsoft Outlook
Express
Windows 6
MacIntosh 5
1.0, 2.0
Reader msgsr dxlsr gwfssr onmsr nsfsr mbxsr
3 entsr msgsr
dbxsr
mbxsr
icssr Microsoft Outlook iCalendar
Microsoft Outlook for Macintosh
2011 olmsr
Extension Filter Export View Extract Metadata Charset
EMCMF N N Y Y Y Y
DXL
GWFS
ONM
NSF
MBX various
N
N
N
N
N
N
MSG,
OFT
DBX
Y
N
EML
EML
ICS, VCS N
Y
N
OLM N
N
N
N
N
N
N
T
N
T
N
N
N
Y
Y
Y
Y
T
Y
T
Y
T
T
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
N
N
N
N
N
Y
Y
Y
Y
Y
Y
Y
Y
4
N
N
Header
/Footer
N
N
N
N
N
N
N
N
N
N
N
Format
Microsoft Outlook
Offline Storage File
Microsoft Outlook
Personal Folder
Version
97, 2000, 2002,
2003, 2007,
2010, 2013
97, 2000, 2002,
2003, 2007,
2010, 2013
97, 2000, 2002,
2003, 2007,
2010, 2013
2.1, 3.0, 4.0
Reader pffsr
pstnsr
Extension Filter Export View Extract Metadata Charset
OST
PST
PST
N
N
N
N
N
N
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
N
Y
Header
/Footer
N
N
N
Microsoft Outlook vCard Contact
Text Mail (MIME) vcfsr VCF Y Y T N Y N N n/a
mbxsr
tnefsr various various various
Y
Y
N
T
T
N
T
T
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
N
N
N Transport Neutral
Encapsulation
Format n/a
1. Only supports non-encrypted embedded files.
2. KeyView supports MBX files created by Eudora Email, and Mozilla Thunderbird. MBX files created by other common mail applications are typically filtered, converted, and displayed.
3. This reader supports both clear signed and encrypted S/MIME. KeyView supports S/MIME for PST, EML, MBX, and MSG files.
4. Returns “Unicode” character set for version 2003 and up, and “Unknown” character set for previous versions.
5. Uses Microsoft Messaging Application Programming Interface (MAPI).
Multimedia Formats
Viewing SDK plays some multimedia files using the Windows Media Control Interface (MCI). MCI is a set of Windows APIs that communicate with multimedia devices.
Format
Advanced Systems
Format
Audio Interchange
File Format
Microsoft Wave
Sound
MIDI
MPEG-1 Audio layer 3
MPEG-1 Video
MPEG-2 Audio
MPEG-4 Audio
NeXT/Sun Audio
QuickTime Movie
Windows Video
Version
1.2
n/a n/a n/a n/a
2, 3, 4
Reader asfsr
MCI aiffsr
MCI riffsr n/a MCI
ID3 v1 and v2 MCI
2, 3 n/a mp3sr
MCI
Extension Filter Export View Extract Metadata Charset
ASF
WMA
WMV
N N N N Y N
AIFF
AIFF
WAV
WAV
MID
MP3
MP3
MPG
MCI mpeg4sr MP4
3GP
MCI
MCI
MPEGA
AU
QT
MOV
N
M
N
M
N
N
M
N
N
M
N
N
N
N
N
N
N
N
M
N
N
N
N
N
Y
N
Y
N
Y
Y
Y
Y
Y
N
Y
Y
N
N
N
N
N
N
N
N
N
N
N
N
N
Y
N
Y
N
N
Y
N
N
Y
N
N
N
N
N
N
N
N
N
N
N
N
N
N
2.1
MCI AVI N N Y N N N
N
N
Header/
Footer
N
N
N
N
N
N
N
N
N
N
N
N
Presentation Formats
NOTE Depending on the default multimedia player installed on your computer, the View API may not be able to play some supported multimedia formats. To play multimedia files, the View API uses the Windows Media Control Interface (MCI) to communicate with the multimedia player installed on your computer. If the player does not play a multimedia file that is supported by the Viewing SDK, the
View API will not be able to play the file.
If you cannot play a supported multimedia file using the View API, install a different multimedia player or compressor/decompressor (codec) component.
Format
Applix Presents
Version
Apple iWork Keynote 2, 3, ‘08,
‘09
4.0, 4.2,
4.3, 4.4
Corel Presentations 6, 7, 8, 9,
10, 11, 12,
X3 n/a Extensible Forms
Description
Language
Lotus Freelance
Graphics
Lotus Freelance
Graphics 2
Macromedia Flash
Microsoft OneNote
96, 97, 98,
R9, 9.8
2 through 8.0
2007,
2010, 2013
Reader Extension kpIWPGrdr GZ kpagrdr kpshwrdr kpXFDLrdr kpprzrdr kpprerdr swfsr kpONErdr
AG
SHW
XFD
XFDL
PRZ
PRE
SWF
ONE
ONETOC2
Filter Export View Extract Metadata Charset
Y Y Y N Y Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
N
N
N
N
N
N
Y
N
N
Y
N
N
N
N
N
N
Y
N
N
Y
Y
1
N
N
N
N
N
N
N
Header
/Footer
N
Format
Microsoft
PowerPoint
Macintosh
Version
98
2001, v.X,
2004
Reader kpp40rdr kpp97rdr
Extension
PPT
PPT
PPS
POT
PPT Microsoft
PowerPoint PC
Microsoft
PowerPoint
Windows
Microsoft
PowerPoint
Windows
Microsoft
PowerPoint
Windows XML
OASIS Open
Document Format
OpenOffice Impress
StarOffice Impress
4
95
97, 2000,
2002, 2003
2007,
2010, 2013
1, 2
3
1, 1.1
6, 7 kpp40rdr kpp95rdr kpp97rdr kpppxrdr kpodfrdr sosr sosr
PPT
SXD
SXI
ODG
ODP
SXI
SXP
ODP
SXI
SXP
ODP
PPT
PPS
POT
PPTX
PPTM
POTX
POTM
PPSX
PPSM
PPAM
1. The character set cannot be determined for versions 5.x and lower.
Y
Y
Y
Y
Y
Y
Y
Filter Export View Extract Metadata Charset
Y
Y
Y
Y
Y
Y
N
N
N
P
N
Y
Header
/Footer
N
N
Y
Y
Y
Y
Y
T
T
Y
Y
Y
Y
Y
T
T
N
N
Y
Y
Y
N
N
4
P
P
P
Y
Y
Y
Y
N
Y
Y
Y
Y
Y
Y
N
N
Y
Y
N
N
N
2
2. Slide footers are supported for Microsoft PowerPoint 97 and 2003.
3. Generated by OpenOffice Impress 2.0, StarOffice 8 Impress, and IBM Lotus Symphony Presentation 3.0.
4. Supported using the embedded objects reader olesr..
Spreadsheet Formats
Format
Apple iWork Numbers
Applix Spreadsheets
Comma Separated
Values
Corel Quattro Pro
Version
‘08, ‘09 n/a
Reader iwsssr
4.2, 4.3, 4.4
assr csvsr
Extension Filter Export View Extract Metadata Charset
GZ
AS
CSV
5, 6, 7, 8
Data Interchange
Format
Lotus 1-2-3
X4 n/a
Lotus 1-2-3
Lotus 1-2-3 Charts
96, 97, R9,
9.8
2, 3, 4, 5
2, 3, 4, 5
Microsoft Excel Charts 2, 3, 4, 5, 6,
7
Microsoft Excel
Macintosh
98, 2001, v.X, 2004 qpssr qpwsr difsr l123sr xlssr
WB2
WB3
QPW
DIF
123 wkssr WK4 kpchtrdr 123 kpchtrdr XLS
XLS
Y
Y
Y
Y
Y
Y
Y
Y
N
N
Y
Y
Y
Y
Y
N
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
N
N
N
N
N
N
N
N
N
N
Y
1
Y
N
N
P
P
N
P
N
N
N
Y
Y
Y
N
Y
Y
N
Y
Y
N
N
Y
N
N
N
Header/
Footer
N
N
N
N
N
N
N
N
Format
Microsoft Excel
Windows
Microsoft Excel
Windows XML
Version
2.2 through
2003
2007, 2010,
2013
Reader xlssr xlsxsr
Extension Filter Export View Extract Metadata Charset
XLS
XLW
XLT
XLA
Y Y Y Y
2
Y Y
Y Y Y Y Y Y XLSX
XLTX
XLSM
XLTM
XLAM
XLSB Y Y Y N N N Microsoft Excel Binary
Format
Microsoft Works
Spreadsheet
OASIS Open Document
Format
OpenOffice Calc
StarOffice Calc
2007, 2010,
2013
2, 3, 4
1, 2
3
1, 1.1
6, 7 xlsbsr mwssr odfsssr sosr sosr
S30
S40
ODS
SXC
STC
SXC
ODS
OTS
SXC
ODS
Y
Y
Y
Y
Y
Y
T
T
Y
Y
T
T
1. Supported using the embedded objects reader olesr.
2. Supported for versions 97 and higher using the embedded objects reader olesr.
3. Generated by OpenOffice Calc 2.0, StarOffice 8 Calc, and IBM Lotus Symphony Spreadsheet 3.0.
N
Y
N
N
N
Y
Y
Y
Y
Y
Y
Y
Header/
Footer
Y
Y
N
N
N
N
N
Text and Markup Formats
Format
ANSI
ASCII
HTML
Microsoft Excel Windows
XML
Microsoft Word Windows
XML
Microsoft Visio XML
Version n/a n/a
3, 4
2003
2003
2003
MIME HTML
Rich Text Format
Unicode HTML
Unicode Text
XHTML
XML (generic)
Reader afsr afsr htmsr xmlsr xmlsr XML n/a
1 through
1.7
n/a
3, 4
1.0
1.0
xmlsr mhtsr rtfsr
VDX
VTX
MHT
RTF unihtmsr HTM unisr TXT htmsr xmlsr
HTM
XML
Extension Filter Export View Extract Metadata Charset
TXT Y Y Y N N N
TXT
HTM
XML
Y
Y
Y
Y
Y
T
Y
Y
T
N
N
N
N
P
Y
N
Y
Y
N
N
Header/
Footer
N
N
Y
Y
Y
Y
Y
Y
Y
Y
T
T
Y
Y
Y
Y
Y
T
T
T
Y
Y
Y
Y
Y
T
N
N
N
N
N
N
N
N
Y
Y
Y
P
Y
N
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
N
N
N
Y
N
N
N
N
Word Processing Formats
Format
Adobe FrameMaker
Interchange Format
Apple iChat Log
Apple iWork Pages
Applix Words
Corel WordPerfect
Linux
Corel WordPerfect
Macintosh
Corel WordPerfect
Windows
Corel WordPerfect
Windows
DisplayWrite
Folio Flat File
Founder Chinese
E-paper Basic
Fujitsu Oasys
Haansoft Hangul
Health level7
Version
5, 5.5, 6, 7
Reader mifsr
Extension
MIF
Filter Export View Extract Metadata Charset
Y Y Y N N Y
Header/
Footer
N
1, AV 2
AV 2.1, AV 3
‘08, ‘09
3.11, 4, 4.1,
4.2, 4.3, 4.4
6.0, 8.1
ichatsr iwwpsr awsr wp6sr
ICHAT
GZ
AW
WPS
1.02, 2, 2.1,
2.2, 3, 3.1
5, 5.1
wpmsr wosr
WPM
WO
6, 7, 8, 9, 10,
11, 12, X3
4
3.1
3.2.1
wp6sr dw4sr foliosr cebsr
1
7
97
2002, 2005,
2007, 2010
2.0
oa2sr hwpsr
OA2
HWP hwposr HWP hl7sr HL7
WPD
IP
FFF
CEB
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
N
Y
N
T
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
N
Y
N
T
Y
N
N
N
N
N
N
N
N
N
N
N
N
Y
N
N
Y
N
P
P
P
N
N
Y
N
P
N
Y
Y
N
Y
Y
Y
Y
Y
Y
Y
Y
N
N
Y
Y
Y
N
N
Y
Y
Y
N
N
N
N
N
N
Y
N
N
Format
IBM DCA/RFT
(Revisable Form Text)
JustSystems Ichitaro
Lotus AMI Pro
Lotus AMI Professional
Write Plus
Lotus Word Pro
Lotus SmartMaster
Microsoft Word
Macintosh
Version
SC23-0758-
1
8 through
2013
2, 3
2.1
Reader dcasr jtdsr lasr lasr
96, 97, R9
96, 97
4, 5, 6, 98
2001, v.X,
2004 lwpsr lwpsr mbsr mw8sr
4, 5, 5.5, 6 mwsr
1.0 and 2.0
misr
Microsoft Word PC
Microsoft Word
Windows
Microsoft Word
Windows
Microsoft Word
Windows
Microsoft Word
Windows XML
6, 7, 8, 95
97, 2000,
2002, 2003
2007, 2010,
2013 mw6sr
Microsoft Works
Microsoft Works
1, 2, 3, 4
6, 2000
Extension
DC
JTD
SAM
AMI
LWP
MWP
DOC
DOC
DOT
DOC
DOC
DOC mw8sr mwxsr
DOC
DOT
DOCM
DOCX
DOTX
DOTM mswsr WPS msw6sr WPS
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Filter Export View Extract Metadata Charset
Y Y Y N N Y
Header/
Footer
N
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
N
N
N
N
N
N
Y
N
N
N
Y
Y
N
N
2
P
P
N
P
N
Y
Y
N
N
Y
Y
Y
N
N
N
Y
N
N
N
N
Y
N
N
Y
Y
Y
N
N
Y
Y
Y
Y
N
Y
N
Y
Y
Y
Y
Y
Y
Y
Format
Microsoft Windows
Write
OASIS Open
Document Format
Omni Outliner
OpenOffice Writer
Version
1, 2, 3
1, 2
3 v3, OPML,
OOutline
1, 1.1
Reader mwsr
Extension
WRI odfwpsr
ODT
SXW
STW oo3sr OO3
OPML
OOUTLINE sosr epubsr
SXW
ODT
EPUB
Filter Export View Extract Metadata Charset
Y
Y
Y
Y
Y
Y
Y
T
Y
Y
Y
T
N
Y
N
N
N
Y
N
Y
Y
Y
Y
Y
Header/
Footer
N
Y
N
N
Open Publication
Structure eBook
StarOffice Writer
2.0, 3.0
Y Y Y N Y Y N
Skype Log
WordPad
6, 7
3 through
2003 n/a sosr skypesr rtfsr
SXW
ODT
DBB
RTF
Y
Y
Y
T
Y
Y
T
Y
Y
N
N
N
Y
N
P
Y
N
Y
N
N
N
XML Paper
Specification
XyWrite
Yahoo! Instant
Messenger
4.12
n/a xpssr xywsr yimsr
4
XPS
XY4
DAT
Y
Y
Y
T
Y
Y
T
Y
Y
N
N
N
N
N
N
N
N
N
N
N
N
1. This reader is only supported on Windows 32-bit platforms.
2. Supported using the embedded objects reader olesr.
3. Generated by OpenOffice Writer 2.0, StarOffice 8 Writer, and IBM Lotus Symphony Documents 3.0.
4. To successfully use this reader, you must set the KV_YAHOO_ID environment variable to the Yahoo user ID. You can optionally set the
KV_OTHER_YAHOO_ID environment variable to the other Yahoo user ID. If you do not set it, “Other” is used by default. If you enter incorrect values for the environment variables, erroneous data is generated.
Supported Formats (Detected)
Supported Formats (Detected)
The file formats listed in this section can be detected by the KeyView format detection module ( kwad ), but cannot be filtered, converted, or displayed. The detection module determines a file’s format and reports the information to the developer’s application.
The formats listed in
“Supported Formats” on page can be detected as well as
filtered, exported, and viewed.
Ability Office (SS, DB, GR, WP, COM)
ACT
Adobe FrameMaker Markup Language
Aldus Freehand (Macintosh)
Aldus PageMaker (Macintosh)
Amiga MOD sound
Apple Double
Apple Single
Appleworks
Applix Asterix
ARC/PAK Archive
ASCII-armored PGP encoded
ASCII-armored PGP signed
AutoDesk Animator Pro FLIC Animation
AutoShade Rendering
CADAM Drawing
CCITT Group 3 1-Dimensional (G31D)
Compactor/Compact Pro Archive
Corel Draw CMX
CPT Communication
Curses Screen Image (UNIX/VAX/SUN)
DCX Fax
AC3 audio
Adobe FrameMaker
AES Multiplus Comm
Aldus PageMaker (DOS)
Amiga IFF-8SVX sound
Apple Binary Property List
Apple Photoshop Document
Apple XML Property List
Applix Alis
Applix Graphics
ARJ Archive
ASCII-armored PGP Public Keyring
AutoDesk Animator FLIC Animation
AutoDesk WHIP
BlackBerry Activation File
CADAM Drawing Overlay
COMET TOP Word
Convergent Tech DEF Comm. cpio Archive (UNIX/VAX/SUN)
Creative Voice (VOC) sound
Data Point VISTAWORD
DEC WPS PLUS
XML Export SDK C Programming Guide
•
•
•
•
•
•
313
314
•
•
•
•
•
•
Appendix A Supported Formats
DECdx
Device Independent file (DVI)
Desktop Color Separation (DCS)
Digital Imaging and Communications in
Medicine ( DICOM)
DG CEOwrite
DIF Spreadsheet
Disk Doubler Compression
ENABLE eFax
Executable UNIX/VAX/SUN
Framework
Freehand 11
GEM Bit Image
Google SketchUp
DG Common Data Stream (CDS)
Digital Document Interchange Format
(DDIF)
EBCDIC Text
ENABLE Spreadsheet (SSF)
Envoy (EVY)
FileMaker (Macintosh)
Framework II
FTP Session Data
Ghost Disk Image
Graphics Environment Manager (GEM
VDI)
Harvard Graphics
Honey Bull DSA101
HP Graphics Language (Plotter)
IBM 1403 Line Printer
Hewlett-Packard
HP Graphics Language (HP-GL)
HP PCL and PJL Languages
IBM DCA-FFT
IBM DCF Script Informix SmartWare II
Informix SmartWare II Communication File Informix SmartWare II Database
Informix SmartWare Spreadsheet
Java Class file
Interleaf
JPEG File Interchange Format (JFIF)
KW ODA G31D (G31)
KW ODA Internal G32D (G32)
Lasergraphics Language
Lotus Notes Bitmap
Lotus Screen Cam
Macromedia Director
MacWrite II
KW ODA G4 (G4)
KW ODA Internal Raw Bitmap (RBM)
Link Library UNIX/VAX/SUN
Lotus Notes CDF
Lyrix
MacWrite
MASS-11
XML Export SDK C Programming Guide
Supported Formats (Detected)
MATLAB MAT Format
Microsoft Access 2007
Micrografx Designer
Microsoft Access 2007 Template
Microsoft Compiled HTML Help Microsoft Common Object File Format
(COFF)
Microsoft Device Independent Bitmap
Microsoft Excel 2007 Macro-Enabled
Spreadsheet Template
Microsoft Document Imaging (MDI)
Microsoft Excel 2007 Spreadsheet
Template
Microsoft Exchange Server Database File Microsoft Object File Library
Microsoft Office Drawing Microsoft Office Groove
Microsoft Outlook Restricted Permission
Message File
Microsoft Windows Cursor (CUR)
Graphics
Microsoft Windows Group File
Microsoft Windows Icon (ICO)
Microsoft Windows Help File
Microsoft Windows NT Event Log
Microsoft Windows OLE 2 Encapsulation Microsoft Windows Vista Event Log
Microsoft Word (UNIX) Microsoft Works (Macintosh)
Microsoft Works Communication
(Macintosh)
Microsoft Works Communication
(Windows)
Microsoft Works Database (Macintosh)
Microsoft Works Database (Windows)
Microstation
MORE Database Outliner (Macintosh)
MS DOS Batch File format
MultiMate 4.0
Navy DIF
NBI Net Archive Format
Microsoft Works Database (PC)
Microsoft Works Spreadsheet (Macintosh)
Milestone Document
MPEG-PS container with CDXA stream
MS DOS Device Driver
Multiplan Spreadsheet
NBI Async Archive Format
Netscape Bookmark file
Nero Encrypted File
NIOS TOP
NURSTOR Drawing
ODA/ODIF
Office Writer
OLIDIF
NeWS font file (SUN)
Nota Bene
Object Module UNIX/VAX/SUN
ODA/ODIF (FOD 26)
OLE DIB object
Open PGP (new format packets)
XML Export SDK C Programming Guide
•
•
•
•
•
•
315
316
•
•
•
•
•
•
Appendix A Supported Formats
OS/2 PM Metafile Graphics
Paradox (PC) Database
PC Library Module
PC True Type Font
PeachCalc Spreadsheet
PEX Binary Archive (SUN)
PGP Encrypted Data
PGP Secret Keyring
PGP Signed and Encrypted Data
Philips Script
Portable Bitmap Utilities (PBM)
Portable Pixmap Utilities (PPM)
PostScript Type 1 Font File
Program Information File
Q & A for Windows
Quadratron Q-One (V2.0)
QuickDraw 3D Metafile (3DMF)
RealLegal E-Transcript
Reflex Database
RIFF MIDI
SAMNA Word IV
SEG-Y Seismic Data format
SGML
SMTP document
Stuff It Archive (Macintosh)
Supercalc Spreadsheet
Symphony Spreadsheet
PaperPort image file
PC COM executable (detected in file mode only)
PC Object Module
PCD Image
Persuasion Presentation
PGP Compressed Data
PGP Public Keyring
PGP Signature Certificate
PGP Signed Data
Plan Perfect
Portable Greymap Utilities (PGM)
PostScript File
PRIMEWORD
Q & A for DOS
Quadratron Q-One (V1.93J)
Quark Express (Macintosh)
Real Audio
RealMedia Streaming Media
RIFF Device Independent Bitmap
RIFF Multimedia Movie
Samsung Electronics JungUm Global format
Serialized Object Format (SOF)
Encapsulation
Simple Vector Format (SVF)
SolidWorks
SUN vfont definition
SYLK Spreadsheet
Targon Word (V 2.0)
XML Export SDK C Programming Guide
Supported Formats (Detected)
Ultracalc Spreadsheet
Uniplex Ucalc Spreadsheet
Usenet format
VRML 2.0
Wang Office GDL Header Encapsulation
Wang WITA
Web ARChive (WARC)
Windows Journal
Windows Palette
Word Connection
WordMARC word processor
Uniplex (V6.01)
UNIX SHAR Encapsulation
VRML
Volkswriter
WANG PC
WANG WPS Comm.
Windows C++ Object Storage
Windows Micrografx Draw (DRW)
Windows scrap file (SHS)
WordERA (V 1.0)
WordPerfect General File
WordStar 6.0
Writing Assistant word processor
X Image
Xerox 860 Comm.
Xerox Writer word processor
WriteNow
X Bitmap (XBM)
X Pixmap (XPM)
Xerox DocuWorks
Yahoo! Messenger chat log
XML Export SDK C Programming Guide
•
•
•
•
•
•
317
Appendix A Supported Formats
318
•
•
•
•
•
• XML Export SDK C Programming Guide
A PPENDIX B
Files Required for
Redistribution
This section lists the Export files that may be redistributed in your applications under the licensing agreement. These files are in the directory install\OS\ bin , where install is the pathname of the Export installation directory and OS is the name of the operating system. This section contains the following topics:
Document Type Definition Files
NOTE On Windows systems, the libraries are .dll files. On
UNIX systems, the libraries are .so, .a, or .sl files.
XML Export SDK C Programming Guide
•
•
•
•
•
•
319
320
•
•
•
•
•
•
Appendix B Files Required for Redistribution
Core Files
The following core files may be redistributed with your application:
File formats_e.ini
htmlexport.* xmlcnv.* kpifcnvt.* kpifutil.* kvxtract.* kvxml.* kvexport.* kvolefio.* kvutil.* kvxpgsa.* kvxsssa.* kvxwpsa.* kwad.* regsvr32.exe
txtcnv.* xmlexport.*
Description
Initialization file. For more information on this file, see
Required by the Java API.
XML converter for the document token stream.
Graphic conversion routines.
Graphic utility routines.
File Extraction interface.
XML Export C API.
Export C API. Interface to the HTML and XML Export C APIs.
Embedded OLE object writer.
Internal KeyView utility functions.
Interface between presentations or graphic readers and the
Export API.
Interface between spreadsheet readers and the Export API.
Interface between word processing readers and the Export API.
File auto-recognition module.
A Microsoft Windows program used to register in-process COM objects.
Converter for document token stream.
Required by the Java API.
XML Export SDK C Programming Guide
Support Files
Support Files
chartbls.ux
chmdll.* kp3dwrld.* kpchtrdr.* kpjavwrt.* kpjpeg.* kppng.* kvxconfig.ini
kvgraph.* kvpie.* kvradar.* kv.lic
The following support files may be redistributed with your application:
File bentofio.* cbmap.map
kvraster.class
kvVector.class
kvvector.jar
mscomctl.ocx
msvbvm60.*
MSVCP60.* msvcrt.* oleaut32.*
Description
Required by l123sr.* and kpprzrdr.*.
Character mappings for Adobe Portable Document Format
(PDF).
Character mapping tables.
Required by chmsr.
Required for 3D charts.
Required for all spreadsheets (chart support).
Java utility routines.
JPEG file interchange format shared routines.
Portable Network Graphics (PNG) utilities.
Contains element extraction settings for source XML files.
Required for all spreadsheets (chart support).
Required for all spreadsheets (chart support).
Required for all spreadsheets (chart support).
Contains license information for KeyView products. This file is opened and validated when a KeyView API is used.
Java program used to convert vector graphics on UNIX and
Linux.
Java applet used to convert vector graphics on UNIX and Linux.
Java applet used to convert vector graphics on UNIX and Linux.
This must reside in the output directory.
Microsoft Common Control (for example, labels, dialog boxes).
Required for Visual Basic programs and COM objects.
Microsoft Visual Basic Runtime library V6.0.
Microsoft Visual C++ Runtime Library V6.0.
Microsoft Visual C Runtime library.
Microsoft OLE Automation Controls.
XML Export SDK C Programming Guide
•
•
•
•
•
•
321
Appendix B Files Required for Redistribution
File olepro32.* servant.exe
wpmap.* xmlsh.*
Description
Microsoft OLE property support library.
Executable required for out-of-process conversions.
Extended character mapping for WordPerfect and Corel
Presentation.
Contains a library of content handlers for each XML file type.
Required by the Expat XML parser.
322
•
•
•
•
•
•
Document Readers and Writers
.
The following readers and writers may be redistributed with your application:
File Description ad1sr.* afsr.* assr.* awsr.* bkfsr.* bzip2sr.* cabsr.* cebsr.* chmsr.* csvsr.* dbfsr.* dbxsr.* dcasr.* difsr.* dmgsr.* dw4sr.* dxlsr.*
AD1 Evidence file reader
ASCII reader
Applix spreadsheet reader
Applix Words reader
Microsoft Backup File reader
Bzip2 reader
Microsoft Cabinet format reader
Founder Chinese E-paper Basic reader
Microsoft Compiled HTML Help reader
Comma Separated Values reader dBase Database reader
Microsoft Outlook Express DBX reader
Document Content Architecture/Revisable Form Text (DCA/RFT) reader
Data Interchange Format reader
Mac Disk Copy Disk Image File reader
DisplayWrite 4 reader
Domino XML Language reader
XML Export SDK C Programming Guide
Document Readers and Writers
File emlsr.* jtdsr.* kpagrdr.* kpanirdr.* kpbmprdr.* kpbmpwrt.* kpcdrrdr.* kpcgmrdr.* kpcgmwrt.* kpdcxrdr.* kpDWGrdr.* kpDXFrdr.* htmsr.* hwposr.* ichatsr.* icssr.* isosr.* iwsssr.* iwwpsr.* jp2000sr.* emxsr.* encasesr.* encase2sr.* entsr.* epubsr.* foliosr.* gwfssr.* hl7sr.*
Description
Microsoft Outlook Express (EML) reader. This is used to convert
EML files when the MBX reader is not licensed.
Legato EMailXtender archive (EMX) reader
Expert Witness Compression Format (EnCase) v6 reader
Expert Witness Compression Format (EnCase) v7 reader
Microsoft Entourage Database Format reader
Open Publication Structure eBook reader
Folio Flat File reader
GroupWise FileSurf reader
Health level7 reader (metadata only)
HTML and XHTML reader
Hangul 2002, 2005, 2007 reader
Apple iChat Log reader
Microsoft Outlook iCalendar reader
ISO-9660 CD Disc Image Format reader
Apple iWork Numbers reader
Apple iWork Pages reader
JPEG 2000 metadata reader
JustSystems Ichitaro reader
Applix Presents reader
Animated cursor reader
Windows Bitmap reader
Windows Bitmap writer
Corel Draw
Computer Graphics Metafile reader
Computer Graphics Metafile writer
DCX (fax) reader
AutoCAD Drawing format reader
AutoCAD Drawing Exchange format reader
XML Export SDK C Programming Guide
•
•
•
•
•
•
323
324
•
•
•
•
•
•
Appendix B Files Required for Redistribution
File kpemfrdr.* kpepsrdr.* kpgifrdr.* kpicordr.* kpiwpgrdr.* kpjbig2rdr.* kpjp2000rdr.* kpjpgrdr.* kpjpgwrt.* kpnbmprdr.* kpmacrdr.* kpmsordr.* kpodfrdr.* kpODArdr.* kpONErdr.* kppdfrdr.* kppdf2rdr.* kpp40rdr.* kpp95rdr.* kpp97rdr.* kppctrdr.* kppcxrdr.* kppicrdr.* kppngrdr.* kppngwrt.* kpppxrdr.* kpprerdr.* kpprzrdr.*
Description
Enhanced Metafile reader
Encapsulated PostScript (EPS) reader
Graphic Interchange Format (GIF) reader
Windows Icon reader
Apple iWork Keynote reader
JBIG2 reader
JPEG 2000 reader
JPEG file interchange format reader
JPEG file interchange format writer
Lotus Notes Bitmap reader (for embedded images in DXL files)
MacPaint reader
Microsoft Office Drawing Objects (office 97, 2000, and XP) reader
Oasis Open Document Format presentation (ODP) reader
AutoCAD reader (Windows only)
Microsoft OneNote reader
Adobe Portable Document File (PDF) graphic-based reader
High-fidelity Adobe Portable Document File (PDF) graphic-based reader
Microsoft PowerPoint PC 4.0 and PowerPoint Mac reader
Microsoft PowerPoint 95 reader
Microsoft PowerPoint 97 and higher reader
Macintosh Quick Draw Picture (PICT) reader
PC Paintbrush (PCX) reader
Pictor PC Paint format (PIC) reader
Portable Network Graphics (PNG) reader
Portable Network Graphics (PNG) writer
Microsoft PowerPoint XML reader 2007
Lotus Freelance Graphics for Windows V2.0 reader
Lotus Freelance Graphics 96/97/98 reader
XML Export SDK C Programming Guide
Document Readers and Writers
File lasr.* ltbenn30.dll
ltscsn10.dll
lwpapin.dll
lwppann.dll
lwpsr.dll
macbinsr.* mbsr.* mbxsr.* mdbsr.* mifsr.* misr.* kpsdwrdr.* kpsgirdr.* kpshwrdr.* kpsunrdr.* kptgardr.* kptifrdr.* kpvsdrdr.dll
kpwg2rdr.* kpwmfrdr.* kpwmfwrt.* kpwpgrdr.* kpxfdlrdr.* kvgzsr.* kvhqxsr.* kvzeesr.* l123sr.*
Description
Lotus Ami Pro Graphics reader
SGI RGB reader
Corel Presentations reader
Sun Raster reader
Truevision Targa reader
Tagged Image File Format (TIFF) reader
Microsoft Visio reader
WordPerfect Graphics 2 reader
Windows Metafile reader
Windows Metafile writer
WordPerfect Graphics 1 reader
Extensible Forms Description Language reader
GZIP reader
BinHex reader
UNIX Compress reader
Lotus 123 v96/97/98 reader
Lotus AMI Pro reader
Lotus Word Pro support (supported on Windows x86 platform only)
Lotus Word Pro support (supported on Windows x86 platform only)
Lotus Word Pro support (supported on Windows x86 platform only)
Lotus Word Pro support (supported on Windows x86 platform only)
Lotus Word Pro reader (supported on Windows x86 platform only)
MacBinary reader
Microsoft Word Macintosh reader
Mailbox (MBX)
1
and Microsoft Outlook Express (EML) reader
Microsoft Access reader.
Adobe Maker Interchange Format reader
Microsoft Word 2 reader
XML Export SDK C Programming Guide
•
•
•
•
•
•
325
326
•
•
•
•
•
•
Appendix B Files Required for Redistribution
File mwsr.* mwssr.* mwxsr.* nsfsr.* oa2sr.* odfsssr.* odfwpsr.* olesr.* mp3sr.* mppsr.* msgsr.* mspubsr.* msw6sr.* mswsr.* mw6sr.* mw8sr.* olmsr.* oo3sr.* pdfsr.* pffsr.* pstsr.dll
pstnsr.* qpssr.* rarsr.* rtfsr.* skypesr.* sosr.* swfsr.*
Description
MP3 reader for metadata extraction
Microsoft Project reader
Microsoft Outlook (MSG) reader
Microsoft Publisher reader
Microsoft Works 6 and 2000 reader
Microsoft Works V1 and 2 reader
Microsoft Word 95 reader
Microsoft Word 97, 2000, and XP reader
Microsoft Word for DOS and Microsoft Write reader
Microsoft Works Spreadsheet reader
Microsoft Word 2007 XML reader
Fujitsu Oasys reader
Oasis Open Document Format spreadsheets (ODS) reader
Oasis Open Document Format word processing (ODT) reader
Embedded OLE object reader.
Microsoft Outlook for Macintosh reader
Omni Outliner reader
Adobe Portable Document File (PDF) reader
Microsoft Outlook Offline Storage File reader
Microsoft Outlook Personal Folders file MAPI-based reader
(supported on Windows platform only)
Microsoft Outlook Personal Folders file native reader
Quattro Pro spreadsheet reader
RAR Archive reader
Microsoft Rich Text Format reader
Skype log file reader
StarOffice/OpenOffice reader
Macromedia Flash reader
XML Export SDK C Programming Guide
Document Readers and Writers
File wkssr.* wosr.* wp6sr.* wpmsr.* xlsbsr.* xlssr.* xlsxsr.* xmlsr.* tarsr.* tnefsr.* unihtmsr.* unisr.* unzip.* uudsr.* vsdsr.* vcfsr.* xpssr.* xywsr.* yimsr.* z7zsr.*
Description
Tape archive reader
Transfer Neutral Encapsulation Format reader
Unicode HTML reader
Unicode reader
Zip file reader
UUEncoding reader
Microsoft Visio reader
Microsoft Outlook vCard Contact reader
Lotus 123 v2.0 through 5.0 reader
WordPerfect 5.x reader
WordPerfect 6.0 through 10.0 reader
WordPerfect for Macintosh reader
Microsoft Office 2007 Excel Binary Format reader
Microsoft Excel reader
Microsoft Excel 2007 XML reader
Generic XML reader
XML Paper Specification reader
XYWrite reader
Yahoo! Instant Messenger reader
7-Zip reader
1. This reader is an advanced feature and is sold and licensed separately from KeyView Export
SDK.
XML Export SDK C Programming Guide
•
•
•
•
•
•
327
328
•
•
•
•
•
•
Appendix B Files Required for Redistribution
Document Type Definition Files
The following files related to the verity.dtd may be redistributed with your application:
File
Verity.dtd
HTMLlat1x.ent
HTMLspecialx.ent
HTMLsymbolx.ent
wp.xsl
pg.xsl
ss.xsl
Description
The document type definition file that defines the structure of an XML document. XML document validity is based on the
Verity.dtd
. The Verity.dtd is required and must be in the same directory as the output XML file.
The file defining Latin characters. This file is referenced in the verity.dtd
. This file is required and must be in the same directory as the Verity.dtd.
The file defining special characters. This file is referenced in the verity.dtd. This file is required and must be in the same directory as the Verity.dtd.
The file defining symbols. This file is referenced in the verity.dtd
. This file is required and must be in the same directory as the Verity.dtd.
The default style sheet for word processing documents. This file is optional and must be in the same directory as the output
XML file.
The default style sheet for presentation graphics. This file is optional and must be in the same directory as the output XML file.
The default style sheet for spreadsheets. This file is optional and must be in the same directory as the output XML file.
XML Export SDK C Programming Guide
Token
$ANCHOR
$BASE
$CHARSET
A PPENDIX C
Export Tokens
This section contains an alphabetized list of the Export tokens.
Tokens are special strings inserted into the KVXMLTemplate structure,
XmlTemplateInfo class, and template files. They are placeholders for markup that appears in the XML output. For example, the $CHARSET token marks the place in the XML output where the name of the source document’s character set is inserted. It would be used in the tag <charset=$CHARSET>.
See the template files for examples of how to use tokens.
Description
Inserts an anchor for a heading level (h2-h6) for the current block.
Inserts the base URL for the XML file. Use in the tag
<base href=xx> .
Inserts the character set of the source document, if that information is ascertainable. The section
“Supported Formats” on page lists the
file formats for which character set information can be determined.
XML Export SDK C Programming Guide
•
•
•
•
•
•
329
330
•
•
•
•
•
•
Appendix C Export Tokens
Token
$CONTENT
$ENDNOTE
$FOOTER
$FOOTNOTE
$FOOTNOTEALL
$HEADER
$MAINURL
$NAME
$NEXT
$PREV
$STYLESHEET
Description
Inserts the content of the metadata field specified by the $NAME token.
This token is used in conjunction with the $SUMMARY, $USERSUMMARY, and $NAME tokens to insert source document metadata into the XML output. An example of this token’s use is: pszUserSummary=<MetaData name="$NAME" content="$CONTENT">
The section
lists file formats that support metadata.
Inserts endnotes from the current page of the document at this point in the output stream. Currently only implemented for Microsoft Word documents.
Inserts the footer from the current page of the document at this point in the output stream.
Inserts footnotes from the current page of the document at this point in the output stream. Currently only implemented for Microsoft Word documents.
Inserts all footnotes from the current document at this point in the output stream. Currently only implemented for Microsoft Word documents.
Inserts the header from the page of the document at this point in the output stream.
Inserts the URL to the file containing the start of the generated XML, that is, the main output stream.
Inserts the name of a metadata field. This token is used in conjunction
tokens to insert source document metadata into the XML output. An example of this token’s use is: pszUserSummary=< M etaData name="$NAME" content="$CONTENT">
The section
lists file formats that support metadata.
Inserts the anchor to the next block. If this is the last block, a link to the first block is inserted.
Inserts the anchor to the previous block. If the current block is the first block, a link to the last block is inserted.
Inserts the path to the style sheet.
XML Export SDK C Programming Guide
Token
$SUMMARY
$SUMMARYNN
$SPLITBLOCKNUMBER
$TOC
$TOCB
$TOCBE
$TOCE
$TOCTE
$TOCPE
$TOPANCHOR
Description
Inserts the data from standard metadata fields using the markup provided in the pszUserSummary member of the structure
KV XML Template . Standard fields are enumerated from 0 to 33 in
KVSumType
in kvtypes.h. See the tokens $USERSUMMARY ,
and
The section
lists file formats that support metadata.
Inserts the data from a specified metadata field. NN is a number from 0 through 33 enumerated in the KVSumType structure in kvtypes.h. An example of this token’s use is: pszMainTop =$SUMMARY01
The section
lists file formats that support metadata.
Inserts the page number for each block generated as a result of bHardPageMakesNewBlock or lcbBlockSize.
Inserts the table of contents at this point in the current output stream.
This token is typically embedded in pszMainTop.
Inserts the table of contents at this point for the current block.
Inserts the beginning entry for the table of contents at this point in the current output stream.
Inserts a table of contents entry at this point in the current output stream.
Inserts a text entry without XML markup at this point in the current output stream.
Inserts a partial table of contents entry at this point in the current output stream. XML tags are removed; however, character entities are retained. This allows angle brackets to appear in the table of contents entries (for example, <text>). Without this token, <text> would be interpreted as a non-valid XML tag and would be ignored by the browser.
Inserts the anchor for the top heading level (h1) for the current block.
XML Export SDK C Programming Guide
•
•
•
•
•
•
331
Appendix C Export Tokens
Token
$USERCB
$USERSUMMARY
$XANCHOR
Description
Triggers the callback function UserCB() and identifies the callback used in the function.
Inserts the data from every valid non-standard metadata field using the markup provided in the pszUserSummary member of the structure
KV XML Template . Non-standard metadata are any fields not listed from
0 to 33 in KVSumType, such as user-defined fields (for example, custom property fields in Word documents), or fields that are unique to a particular file type (for example, “Artist” or “Genre” fields in MP3 files).
See the tokens
.
The section
lists file formats that support metadata.
Inserts the anchor to an extra file into the XML output. The contents of the extra file is defined by pszXFile, and the block generated by this token is defined by pszXStartBlock and pszXEndBlock.
332
•
•
•
•
•
• XML Export SDK C Programming Guide
A PPENDIX D
Character Sets
This section provides information on the handling of character sets in the KeyView suite of products, which includes KeyView Filter SDK, KeyView Export SDK, and
KeyView Viewing SDK. It contains the following topics.
Multi-Byte and Bi-Directional Support
Multi-Byte and Bi-Directional Support
The KeyView SDKs can process files containing multi-byte characters. A multi-byte character encoding represents a single character with consecutive bytes. KeyView can also process text from files that contain bi-directional text.
Bi-directional text contains both Latin-based text which is read from left to right, and text that is read from right to left (Hebrew and Arabic).
Table indicates which character encodings are supported by KeyView for each
format.
Format
Archive
7-Zip (7Z)
AD1 Evidence file
XML Export SDK C Programming Guide
Single-byte Multi-byte n/a n/a n/a n/a
Bi-directional n/a n/a
•
•
•
•
•
•
333
Appendix D Character Sets
334
•
•
•
•
•
•
Format Single-byte
BinHex (HQX)
Bzip2 (BZ2)
EnCase – Expert Witness Compression
Format (E01)
GZIP (GZ)
ISO (ISO)
Java Archive (JAR)
Legato EMailXtender Archive (EMX)
MacBinary (BIN)
Mac Disk Copy Disk Image (DMG)
Microsoft Backup File (BKF)
Microsoft Cabinet format (CAB)
Microsoft Compiled HTML Help (CHM)
Microsoft Compressed Folder (LZH)
PKZip (ZIP)
Microsoft Outlook DBX (DBX)
Microsoft Outlook Offline Storage File (OST) Y
RAR Archive (RAR) n/a
Tape Archive (TAR)
UNIX Compress (Z) n/a n/a n/a n/a n/a
Y n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a
UUEncoding (UUE)
Windows Scrap File (SHS)
WinZip (ZIP)
Binary
Executable (EXE)
Link Library (DLL)
Computer-aided Design
AutoCAD Drawing (DWG) n/a n/a n/a n/a n/a
Y n/a n/a
Y
Y n/a n/a n/a n/a n/a n/a
Y n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a
Multi-byte n/a n/a n/a
XML Export SDK C Programming Guide n/a n/a
Y
Y n/a n/a n/a n/a n/a n/a
Y n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a
Bi-directional n/a n/a n/a
Multi-Byte and Bi-Directional Support
Format
AutoCAD Drawing Exchange (DXF)
CATIA formats (CAT)
Microsoft Visio (VSD)
Database dBase Database
Microsoft Access (MDB)
Microsoft Project (MPP)
Desktop Publishing
Microsoft Publisher
Display
Adobe Portable Document Format (PDF)
Graphics
Computer Graphics Metafile (CGM)
Corel DRAW (CDR)
DCX Fax System (DCX)
DICOM – Digital Imaging and
Communications in Medicine (DCM)
Encapsulated PostScript (EPS)
Enhanced Metafile (EMF)
Graphic Interchange Format (GIF)
JBIG2
JPEG
JPEG 2000
Lotus AMIDraw Graphics (SDW)
Lotus Pic (PIC)
Macintosh Raster (PICT/PCT)
MacPaint (PNTG)
Microsoft Office Drawing (MSO)
Y
Y
Y
N
Y n/a n/a n/a n/a
Y
Y n/a n/a n/a n/a n/a
Y n/a
Y n/a
Single-byte
Y
Y
Y
Multi-byte
Y
N
Y
N
Y
Y
Y
Y
1 n/a n/a n/a n/a
N
Y n/a n/a n/a n/a n/a
N n/a
N n/a
XML Export SDK C Programming Guide
N
Y
N
N
N n/a n/a n/a n/a
N
N n/a n/a n/a n/a n/a
N n/a
N n/a
Bi-directional
Y
N
Y
•
•
•
•
•
•
335
Appendix D Character Sets
336
•
•
•
•
•
•
Format
Omni Graffle (GRAFFLE)
PC PaintBrush (PCX)
Portable Network Graphics (PNG)
SGI RGB Image (RGB)
Sun Raster Image (RS)
Tagged Image File (TIFF)
Truevision Targa (TGA)
Windows Animated Cursor (ANI)
Windows Bitmap (BMP)
Windows Icon Cursor (ICO)
Windows Metafile (WMF)
WordPerfect Graphics 1 (WPG)
WordPerfect Graphics 2 (WPG)
Documentum EMCMF Format
Domino XML Language (DXL)
GroupWise FileSurf
Legato Extender (ONM)
Lotus Notes database (NSF)
Mailbox (MBX)
Microsoft Entourage Database
Microsoft Outlook (MSG)
Microsoft Outlook Express (EML)
Microsoft Outlook iCalendar
Microsoft Outlook for Macintosh
Microsoft Outlook Offline Storage File
Microsoft Outlook Personal File Folders
(PST)
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Single-byte n/a
Y n/a n/a
Y n/a n/a n/a
Y
Y n/a n/a
Y
Multi-byte n/a
N n/a n/a
N n/a n/a n/a
Y
N n/a n/a
N
Y
Y
Y
Y
N
Y
Y
Y
Y
Y
Y
Y
Y
XML Export SDK C Programming Guide
Bi-directional n/a
N n/a n/a
N n/a n/a n/a
N
N n/a n/a
N
Y
Y
Y
Y
N
N
Y
N
Y
Y
Y
Y
Y
Multi-Byte and Bi-Directional Support
Format
Microsoft Outlook vCard Contact
Text Mail (MIME)
Transport Neutral Encapsulation Format
Multimedia
Advanced Systems Format (ASF)
Audio Interchange File Format (AIFF)
Microsoft Wave Sound (WAV)
MIDI (MID)
MPEG 1 Audio Layer 3 (MP3)
MPEG 1 Video (MPG)
MPEG 2 Audio (MPEGA)
MPEG 4 Audio (MP4)
NeXT/Sun Audio (AU)
QuickTime Movie (QT/MOV)
Windows Video (AVI)
Presentations
Apple iWork Keynote (GZ)
Applix Presents (AG)
Corel Presentations (SHW)
Extensible Forms Description Language
(XFD)
Lotus Freelance Graphics 2 (PRE)
Lotus Freelance Graphics (PRZ)
Macromedia Flash (SWF)
Single-byte
Y
Y n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a
Multi-byte
Y
Y n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a
Bi-directional
Y character set
1252 only character set
1252 only
Y
Y
N
N
Y character set
850 only
Y
N
Y
Japanese, Simple
Chinese,
Traditional Chinese,
Thai only
Y N
N
N
N
N
N
N
Y
Y n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a
XML Export SDK C Programming Guide
•
•
•
•
•
•
337
Appendix D Character Sets
338
•
•
•
•
•
•
Format
Microsoft OneNote
Microsoft PowerPoint PC (PPT)
Microsoft PowerPoint Windows (PPT)
Single-byte
Y character set
1252 only
Y
Microsoft PowerPoint Macintosh (PPT)
Microsoft PowerPoint Windows XML 2007 and 2010 (PPTX)
OASIS Open Document (ODP)
OpenOffice Impress (ODP)
StarOffice Impress (ODP)
Spreadsheets
Apple iWork Numbers (GZ)
Applix Spreadsheets (AS)
Y
Y
Y
Y
Y
Y character set
1252 only
Comma Separated Values (CSV)
Corel Quattro Pro (QPW/WB3)
Data Interchange Format (DIF)
Lotus 1-2-3 (123)
Lotus 1-2-3 (WK4)
Lotus 123 Charts (123)
Microsoft Excel Charts (XLS)
Microsoft Excel Macintosh (XLS)
Microsoft Excel Windows (XLS)
Microsoft Excel Windows XML 2007 (XLSX) Y
Microsoft Office Excel Binary Format (XLSB) Y
Microsoft Works Spreadsheet (S30/S40)
Y
Y
Y
Y
Y
Y
Y
Y
Y character set
1252 only
Y
N
Y
Y
Y
N
N
Y
Y
Y
Y
Y
N
Y
Y
Y
N
Multi-byte
Y
Traditional Chinese only
N
Y
Japanese, Simple
Chinese,
Traditional Chinese,
Korean only
Bi-directional
N
N
Hebrew only
N
Y
N
N
N
Y
N
N
Y
2
N
N
N
N
N
N
N
N
N
XML Export SDK C Programming Guide
Multi-Byte and Bi-Directional Support
Format
OASIS Open Document (ODS)
OpenOffice Calc (ODS)
StarOffice Calc (ODS)
Text and Markup
ANSI (TXT)
ASCII (TXT)
HTML (HTM)
Microsoft Excel Windows XML 2003
Microsoft Word for Windows XML 2003
Microsoft Visio XML 2003
Rich Text Format (RTF)
Unicode HTML
Unicode Text (TXT)
XHTML
XML
Word Processing
Adobe Maker Interchange Format (MIF)
Apple iChat Log (ICHAT)
Apple iWork Pages (GZ)
Applix Words (AW)
DisplayWrite (IP)
Folio Flat File (FFF)
Founder Chinese E-paper Basic (CEB)
Fujitsu Oasys (OA2)
Hangul (HWP)
Single-byte
Y
Y
Y
Multi-byte
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y character set
1252 only character set
1252 only character set
500, 1026 only
Y
Y character set
1252 only
Y
N
Y
Y
N
N
N
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
XML Export SDK C Programming Guide
N
N
N
N
N
N
N
N
N
Y
Y
Y
Y
Bi-directional
N
N
N
•
•
•
•
•
•
339
Appendix D Character Sets
340
•
•
•
•
•
•
Format
Health level7 (HL7)
IBM DCA/RTF (DC)
JustSystems Ichitaro (JTD)
Lotus AMI Pro (SAM)
Lotus AMI Professional Write Plus (AMI)
Lotus Word Pro (LWP)
Lotus SmartMaster (MWP)
Microsoft Word PC (DOC)
Microsoft Word Windows V1-2 (DOC) Y
Microsoft Word Windows V6, 7, 8, 95 (DOC) Y
Y Microsoft Word Windows V97 through 2003
(DOC)
Microsoft Word Windows XML 2007 and
2010 (DOCX)
Y
Y
Y character set
1252 only
Microsoft Word Macintosh (DOC)
Microsoft Works (WPS)
Microsoft Write (WRI)
OASIS Open Document (ODT)
Omni Outliner (OO3)
OpenOffice Writer (ODT)
Open Publication Structure eBook (EPUB) Y
StarOffice Writer (ODT) Y
Skype Log (DBB)
Y
Y
Y
Y
Y
Y
Y
WordPad (RTF)
Single-byte
Y
Y
Y character sets
500, 1026 only
Y
Multi-byte
Y
N
Y
Simple Chinese,
Traditional Chinese,
Japanese, Thai only
Y
Y
Simple Chinese,
Traditional Chinese,
Japanese, Thai only
N
Y
N
Y
Y
Y
Y
Y
Y
Y
N
Japanese only
Japanese only
Y
Y (null-terminated charsets)
Y
N
Y
Bi-directional
Y
N
N
N
N
N
Y
N
N
N
N
N
N
N
Y
XML Export SDK C Programming Guide
Coded Character Sets
Format Single-byte Multi-byte Bi-directional
WordPerfect Linux (WPS)
WordPerfect Macintosh (WPS)
WordPerfect Windows (WO)
XML Paper Specification (XPS)
XYWrite Windows (XY4)
Y
Y
Y
Y character set
1252 only
Y
N
N
N
Y
N
N
N
N
N
N
Yahoo! Instant Messenger (DAT) Y (null-terminated charsets)
N
1. Multi-byte PDFs are supported, provided the PDF document is created using either Character ID-keyed (CID) fonts, predefined CJK CMap files, or ToUnicode font encodings, and does not contain embedded fonts. See the Adobe website and the Adobe Acrobat documentation for more information. Any multi-byte characters that are not supported are displayed using the replacement character. By default, the replacement character is a question mark (?).
To determine the type of font encodings that are used in a PDF, open the PDF in Adobe Acrobat, and select File |
Document Info | Fonts. If the Encoding column lists Custom or Embedded encodings, you may encounter problems converting the PDF.
2. Text direction in the output file may not be correct.
3. In Export SDK, a bi-directional right-to-left ( RTL ) tag is extracted from this format and included in the direction element
( <dir=RTL> ) of the output.
Coded Character Sets
Table lists which character set can be used to specify the target character set.
The coded character sets are enumerated in kvtypes.h and defined in the
Export class.
Coded Character Set
KVCS_UNKNOWN
KVCS_SJIS
KVCS_GB
KVCS_BIG5
KVCS_KSC
Description
Unknown character set
Japanese (uses multi-byte encoding), cp932
Simplified Chinese (China, Singapore, Malaysia) cp936
Traditional Chinese (Taiwan, Hong Kong, Macaw) cp950
Korean, cp949
Y
Y
Y
Can be set as target charset?
N
Y
XML Export SDK C Programming Guide
•
•
•
•
•
•
341
Appendix D Character Sets
342
•
•
•
•
•
•
KVCS_8859_6
KVCS_8859_7
KVCS_8859_8
KVCS_8859_9
KVCS_8859_14
KVCS_8859_15
KVCS_437
KVCS_737
KVCS_775
KVCS_850
KVCS_851
KVCS_852
KVCS_855
Coded Character Set
KVCS_1250
KVCS_1251
KVCS_1252
KVCS_1253
KVCS_1254
KVCS_1255
KVCS_1256
KVCS_1257
KVCS_1258
KVCS_8859_1
KVCS_8859_2
KVCS_8859_3
KVCS_8859_4
KVCS_8859_5
Description
Windows Latin 2 (Central Europe)
Windows Cyrillic (Slavic)
Windows Latin 1 (ANSI)
Windows Greek
Windows Latin 5 (Turkish)
Windows Hebrew
Windows Arabic
Windows Baltic Rim
Windows Vietnamese
ISO 8859-1 Latin 1 (Western Europe, Latin America)
ISO 8859-2 Latin 2 (Central Eastern Europe)
ISO 8859-3 Latin 3 (S.E. Europe)
ISO 8859-4 Latin 4 (Scandinavia/Baltic)
ISO 8859-5 Latin/Cyrillic
ISO 8859-6 Latin/Arabic
ISO 8859-7 Latin/Greek
ISO 8859-8 Latin/Hebrew
ISO 8859-9 Latin/Turkish
ISO 8859-14
ISO 8859-15
DOS Latin US
DOS Greek
DOS Baltic Rim
DOS Latin 1
DOS Greek
DOS Latin 2
DOS Cyrillic
XML Export SDK C Programming Guide
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Can be set as target charset?
Y
Y
Coded Character Sets
Coded Character Set
KVCS_857
KVCS_860
KVCS_861
KVCS_862
KVCS_863
KVCS_864
KVCS_865
KVCS_866
KVCS_869
KVCS_874
KVCS_PDFMACDOC
KVCS_PDFWINDOC
KVCS_STDENC
KVCS_PDFDOC
KVCS_037
KVCS_1026
KVCS_500
KVCS_875
KVCS_LMBCS
KVCS_UNICODE
KVCS_UTF16
KVCS_UTF8
KVCS_UTF7
KVCS_2022_JP
KVCS_2022_CN
KVCS_2022_KR
KVCS_WP6X
Description
DOS Turkish
DOS Portuguese
DOS Icelandic
DOS Hebrew
DOS Canadian French
DOS Arabic
DOS Nordic
DOS Cyrillic Russian
DOS Greek 2
Thai
PDF MAC DOC
PDF WIN DOC
Adobe Standard Encoding
Adobe standard PDF character set
EBCDIC code page 037
EBCDIC code page 1026
EBCDIC code page 500
EBCDIC code page 875
Lotus multibyte character set Group 1 and Group 2
Unicode, UCS-2
16-bit Unicode transformation format
8-bit Unicode transformation format
7-bit Unicode transformation format Y
ISO 2022-JP, Japanese mail and news safe encoding (JIS-7) N
ISO 2022-CN, Chinese mail and news safe encoding
ISO 2022-KR, Korean mail and news safe encoding
Word Perfect 6.x and higher character mapping N
N
N
N
Y
N
N
Y
Y
Y
Y
N
N
N
N
Y
Y
Y
Y
Y
Y
Y
Y
Can be set as target charset?
Y
Y
XML Export SDK C Programming Guide
•
•
•
•
•
•
343
Appendix D Character Sets
344
•
•
•
•
•
•
Coded Character Set
KVCS_10000
KVCS_KSC5601
KVCS_GB2312
KVCS_GB12345
KVCS_CNS11643
KVCS_JIS0201
KVCS_JIS0212
KVCS_EUC_JP
KVCS_EUC_GB
KVCS_EUC_BIG5
KVCS_EUC_KSC
KVCS_424
KVCS_856
KVCS_1006
Description
Western European (Macintosh)
Unified Hangul
Simplified Chinese (China, Singapore, Hong Kong)
Traditional Chinese (China) - analogue of GB2312
Traditional Chinese - Taiwan. Supplement to Big5
Japanese - contains ASCII character set (JIS-Roman)
Japanese. Supplement to JIS0208.
Japanese Extended UNIX Code
Simplified Chinese Extended UNIX Code
Traditional Chinese Extended UNIX Code
Korean Extended UNIX Code
EBCDIC Hebrew
PC Hebrew (old)
IBM AIX Pakistan (Urdu)
KVCS_KOI8R
KVCS_PDF_JAPAN1
Cyrillic (Russian)
Adobe-Japan1-2 character collection
KVCS_PDF_KOREA1 Adobe-Korea1-0 character collection
KVCS_PDF_GB1 Adobe-GB1-3 character collection
KVCS_PDF_CNS1
KVCS_2022_JP_8
KVCS_720
KVCS_VISCII
KVCS_8859_10
KVCS_8859_13
KVCS_57002
KVCS_57003
KVCS_57004
Adobe-CNS1-2 character collection
ISO 2022-JP, Japanese mail and news safe encoding (JIS8)
Arabic DOS-720
Vietnamese VISCII
ISO 8859-10 (Latin 6 Nordic)
ISO 8859-13 (Latin 7 Baltic)
ISCII Devanagari (x-iscii-de)
ISCII Bengali (x-iscii-be)
ISCII Tamil (x-iscii-ta)
N
N
Y
N
N
N
Y
Y
Y
1
Y
Y
Y
Y
N
N
N
N
Y
N
Y
Y
Y
N
Y
Y
Can be set as target charset?
Y
Y
XML Export SDK C Programming Guide
Coded Character Sets
Coded Character Set
KVCS_57005
KVCS_57006
KVCS_57007
KVCS_57008
KVCS_57009
KVCS_57010
KVCS_57011
KVCS_GB18030b2
KVCS_GB18030
KVCS_8859_11
KVCS_8859_16
KVCS_ARABICMAC
KVCS_KOI8U
KVCS_HZGB2312
Description
ISCII Telugu (x-iscii-te)
ISCII Assamese (x-iscii-as)
ISCII Oriya (x-iscii-or)
ISCII Kannada (x-iscii-ka)
ISCII Malayalam (x-iscii-ma)
ISCII Gujarathi (x-iscii-gu)
ISCII Panjabi (x-iscii-pa)
Reserved for internal use
GB18030 (Chinese 4-byte character set)
ISO 8859-11 (Thai)
ISO 8859-16 (Latin-10 South-Eastern Europe)
Arabic Mac (x-mac-arabic)
Cyrillic (KOI8U Ukrainian)
The 7-bit representation of GB 2312 / RFC 1842
Y
Y
Y
Y n/a
Y
Y
Y
Y
Y
Can be set as target charset?
Y
Y
n/a
Y
1. Character set cannot be forced as output in Export SDK and Viewing SDK because the character set is not supported by the major browsers.
XML Export SDK C Programming Guide
•
•
•
•
•
•
345
Appendix D Character Sets
346
•
•
•
•
•
• XML Export SDK C Programming Guide
A PPENDIX E
File Format Detection
This section describes how file formats are detected in the KeyView Export SDK.
It contains the following topics:
Category Values in formats_e.ini
Introduction
The KeyView format detection module (kwad) detects a file’s format, and reports the information to the API, which in turn reports the information to the developer’s application. If the detected format is supported by the KeyView SDK, the detection module also loads the appropriate structured access layer and document reader for further processing.
For a list of supported formats, see
XML Export SDK C Programming Guide
•
•
•
•
•
•
347
Appendix E File Format Detection
348
•
•
•
•
•
•
Extract Format Information
You can extract format information from a document using the fpGetStreamInfo() function. If required, this format information can then be reported to the developer’s application. The fpGetStreamInfo() function extracts format information, such as file class, format and version, and populates the ADDOCINFO structure. This structure is defined in the header file adinfo.h.
For information on how to translate the extracted format information, see
“Translate Format Information” on page .
Determine Format Support
Once the file format is extracted, the detection module then uses the formats_e.ini
file to determine whether the format is supported by KeyView, and the appropriate structured access layer and reader to load.
The formats_e.ini file is in the directory install\OS\bin, where install is the pathname of the Export installation directory and OS is the name of the operating system. It contains the following information:
Coded format information. To translate this information, see
Reader associated with each format. See
“Determine a Document Reader” on
Configuration parameters for out-of-process conversions.
Locale settings for internal use.
Below are some entries from the formats_e.ini file:
123=mw
152=xyw
178=wp6
189=mw6
2=af
200=pdf
205=mb
210=htm
251=htm
NOTE The formats_e.ini file applies to all formats except graphics.
Detection of graphics formats are handled by an internal module named
KeyView Picture Interchange Format (KPIF).
XML Export SDK C Programming Guide
Determine Format Support
Refine Detection of Text Files
During text detection, KeyView analyses the first 1kB and last 1kB of data in a document, and if less than 10% of that data consists of non-ASCII characters,
KeyView detects the document as a text file.
However, depending on the type of documents you are working with, the default settings may not provide the desired level of accuracy. Configuration flags allow you to change the amount of data to read at the end of a file, the percentage of non-ASCII characters permitted in a text file, and whether to use or ignore the file extension to determine the document format.
Change the Amount of File Data to Read
During file detection, KeyView reads characters from the beginning and end of a file—by default, it reads the first and last 1024 bytes of data. Large text files may contain many irrelevant characters at the end of a file, so KeyView may not accurately detect the file format. You can set a configuration flag to increase the amount of data to read from the end of a file during detection.
To change the amount of data to read during detection
In the formats_e.ini file, set the following flag in the detection_flags section:
[detection_flags] non_ascii_chars_end_block_size=kB where kB is the number of kilobytes to read from the end of the file, from 0 to
10. The default value is 1.
NOTE The file size must be greater than the value specified in the flag. If the flag value is greater than the file size, KeyView does not use the flag.
Change the Percentage of Allowed Non-ASCII Characters
By default, if less than 10% of the analyzed data in a document consists of non-ASCII characters, it is detected as a text file. Depending on the type of files you are working with, changing the default percentage may increase detection accuracy.
To change the percentage of non-ASCII characters allowed in text files
In the formats_e.ini file, set the following flag in the detection_flags section:
[detection_flags]
XML Export SDK C Programming Guide
•
•
•
•
•
•
349
Appendix E File Format Detection
350
•
•
•
•
•
• non_ascii_chars_in_text=N where N is the percentage of non-ASCII characters to allow in text files. Files that contain a lower percentage of non-ASCII characters than N are detected as text files. The default value is 10.
Use the File Extension for Detection
Sometimes KeyView detects certain file formats, such as CSV, as ASCII because of the content of the documents. In such cases, you can configure KeyView to use the file extension to determine the document format. Using the file extension can improve detection of formats such as CSV, but might not detect text files successfully if they have incorrect file extensions.
To use the file extension for ASCII files during detection
In the formats.ini file, set the following flag in the detection_flags section:
[detection_flags] use_extension_for_ascii=1
The default is 0 (do not use the file extension).
Translate Format Information
Format information can include file attributes in the following categories:
Major Format
File Class
Minor Format
Major Version
Minor Version
Not all categories are required. Many formats only include major format and file class, or major format only.
The format information has the following structure:
MajorFormat.FileClass.MinorFormat.MajorVersion.MinorVersion
For example:
81.2.0.9.0
XML Export SDK C Programming Guide
Translate Format Information
Each number in the format information represents a file attribute. The entry
81.2.0.9.0
represents a Lotus 1-2-3 Spreadsheet file version 9.0, where
81 = Lotus 1-2-3 Spreadsheet (major format)
2 = Spreadsheet (file class)
0 = not defined (minor format)
9 = 9 (major version)
0 = 0 (minor version)
The example above applies to formats_e.ini file. When extracting format information using the fpGetStreamInfo() function, the same format information is represented as 294.2.0.9.
NOTE The format values returned by fpGetStreamInfo() differ from those in formats_e.ini because the former defines a unique ID for each major format, while the latter uses a major version, minor version and minor format to distinguish between formats.
Distinguish Between Formats
The ADDOCINFO structure provides a unique ID for each major format. For example, a call to fpGetStreamInfo() would return 351.1.0 for a Microsoft
Word 2003 XML format. The major format 351 is unique to this format.
Unlike ADDOCINFO, the formats_e.ini file distinguishes between formats using the major version number. For example, in formats_e.ini, a Microsoft
Word 2003 XML format is defined as 285.1.0.100.0. The major format 285 and file class 1 are the same values for generic XML. The major version 100 distinguishes the format as Microsoft Word 2003 XML.
The major version is used in formats_e.ini to specify the following formats:
The Microsoft Office 2003 XML format has the same major format and file class as generic XML (285.1). It is distinguished from generic XML using the following major versions:
Word: 100
Excel: 101
Visio: 110
The XHTML format has the same major format and file class as HTML
(210.1). It is distinguished from HTML using the major version 100.
XML Export SDK C Programming Guide
•
•
•
•
•
•
351
Appendix E File Format Detection
352
•
•
•
•
•
•
Determine a Document Reader
The format detection module uses the formats_e.ini file to determine whether a format is supported and which reader should be used to parse a format. The entries in the formats_e.ini file lists each format’s coded value, and an abbreviation for the format’s reader. For example:
81.2.0.9.0=l123
The reader abbreviation is a truncated version of the reader’s library name.
Adding “sr” to the end of an abbreviation creates the name of the reader. The example entry above specifies that a Lotus 1-2-3 Spreadsheet file version 9.0 is parsed by the Lotus 1-2-3 reader, l123sr.
“Files Required for Redistribution” on page lists the document readers
provided with KeyView.
Category Values in formats_e.ini
This section lists the possible category values for format information in the formats_e.ini
file. The corresponding values for the format information extracted from a call to fpGetStreamInfo() are listed in the header file adinfo.h
.
5
6
3
4
7
Number
1
2
Format
AES Multiplus Comm Format
ASCII File word processor/MS DOS Batch File format
Applix Asterix
Microsoft Windows Bitmap image (BMP)
Convergent Tech DEF Comm. format
Corel Draw (CDR)
Keyword COM.FILE (KSIF)
File Class
Word processor
Word processor
Word processor
Raster image
Word processor
Vector graphic
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Number
28
29
30
31
24
25
26
27
32
33
34
20
21
22
23
16
17
18
19
12
13
14
15
8
9
10
11
Format
Computer Graphics Metafile (CGM)
Word Connection
COMET TOP Word
DG CEOwrite
Honey Bull DSA101
IBM DCA-RFT
Dummy File (Internal)
DG Common Data Stream (CDS)
Dummy Print File (Internal)
Windows Micrografx Draw (DRW)
Data Point VISTAWORD
Encapsulated PostScript (EPS)
DOS/Windows Executable (EXE, DLL)
CCITT Group 3 1-Dimensional (G31D)
Graphics Interchange format (GIF)
IBM 1403 Line Printer
IBM DCF Script
IBM DCA-FFT
GEM Bit Image
IBM Display Write 4
Raster Graphics
Keywords PICL
XML Export SDK C Programming Guide
File Class
Vector graphic
Word processor
Word processor
Word processor
Word processor
Word processor
Word processor
Vector graphic
Word processor
Raster image
Executable
Raster image
Raster image
Word processor
Word processor
Word processor
Raster image
Word processor
Raster image
•
•
•
•
•
•
353
354
•
•
•
•
•
•
Appendix E File Format Detection
Number
55
56
57
58
51
52
53
54
59
60
61
47
48
49
50
43
44
45
46
39
40
41
42
35
36
37
38
Format
Lotus AMI Pro
MORE Database Outliner (Mac)
MacPaint
Microsoft Word Mac
Informix SmartWare II Communication File
Microsoft Word for Windows
MultiMate 4.0
Multiplan Spreadsheet
Microsoft Rich Text Format (RTF)
Microsoft Word 5.0 (PC)
NBI Async Archive Format
Navy DIF
NBI Net Archive Format
NIOS TOP
FileMaker (Mac)
ODA/ODIF
Keyword OSM
Office Writer
PC Paint Brush Graphics (PCX)
CPT Communication Format
Lotus PIC
Macintosh Quick Draw Picture Format (PICT)
Philips Script
PostScript File
File Class
Word processor
Outline/planning
Raster image
Word processor
Communications
Word processor
Word processor
Spreadsheet
Word processor
Word processor
Word processor
Word processor
Word processor
Word processor
Database
Word processor
Word processor
Raster image
Word processor
Vector graphic
Raster image
Word processor
Vector graphic
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Number
82
83
84
85
78
79
80
81
86
87
88
74
75
76
77
70
71
72
73
66
67
68
69
62
63
64
65
Format
Quadratron Q-One (V1.93J)
Quadratron Q-One (V2.0)
SAMNA Word IV
Lotus AMI Pro Draw (SDW)
SYLK Spreadsheet
Informix SmartWare II
Symphony Spreadsheet
Truevision Targa
Tagged Image File (TIFF)
Targon Word (V 2.0)
Uniplex Ucalc Spreadsheet
Uniplex (V6.01)
Microsoft Word (UNIX)
WANG PC
WordERA (V 1.0)
WANG WPS Comm. format
WordPerfect Mac
WordPerfect 5.2
Lotus 1-2-3 Spreadsheet
WordMARC word processor
Microsoft Windows Metafile (WMF) Graphics
Informix SmartWare II Database
WordPerfect Graphics V1.0 (WPG)
Wang WITA Word processor
File Class
Word processor
Word processor
Word processor
Raster image
Spreadsheet
Word processor
Spreadsheet
Raster image
Raster image
Word processor
Spreadsheet
Word processor
Word processor
Word processor
Word processor
Word processor
Word processor
Word processor
Spreadsheet
Word processor
Raster image
Database
Raster image
XML Export SDK C Programming Guide
•
•
•
•
•
•
355
356
•
•
•
•
•
•
Appendix E File Format Detection
98
99
100
101
107
108
109
111
103
104
105
106
112
113
114
115
116
Number
93
94
95
96
97
89
90
91
92
Format
Xerox 860 Comm. format
Microsoft Excel Spreadsheet
Xerox Writer word processor
DIF Spreadsheet
ENABLE Spreadsheet
Supercalc Spreadsheet
Ultracalc Spreadsheet
Informix SmartWare Spreadsheet
Serialized Object Format (SOF) Encapsulation format
Microsoft PowerPoint (PC)
Microsoft PowerPoint (Mac)
Aldus PageMaker (Mac)
Aldus PageMaker (DOS)
Microsoft Works (Mac)
Microsoft Works Database (Mac)
Microsoft Works Spreadsheet (Mac)
Microsoft Works Communication (Mac)
Microsoft Works (PC)
Microsoft Works Database (PC)
Microsoft Works Spreadsheet (PC)
PC Library Module
MacWrite II
Aldus Freehand Mac
Disk Doubler Compression format
HP Graphics Language (HP-GL)
File Class
Word processor
Spreadsheet
Word processor
Spreadsheet
Spreadsheet
Spreadsheet
Spreadsheet
Spreadsheet
Encapsulation
Presentation
Presentation
Desktop
Publishing
Desktop
Publishing
Word processor
Database
Spreadsheet
Communication
Word processor
Database
Spreadsheet
Library module
Word processor
Vector graphic
Encapsulation
Vector graphic
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Number
117
135
136
137
138
139
140
141
142
143
123
124
126
127
118
119
120
121
128
129
131
132
133
134
Format
Adobe Maker Interchange Format (MIF)
JPEG File Interchange Format (JFIF)
Reflex Database
Framework II
Paradox (PC) Database
Microsoft Windows Write
Quattro Pro Spreadsheet (DOS)
Persuasion Presentation
Corel Presentation
Microsoft Windows Icon Format (ICO) Graphics
Microsoft Project
Harvard Graphics
Zip Archive Format
Microsoft Windows Cursor (CUR) Graphics
Quark Express (Mac)
ARC/PAK Archive format
Adobe FrameMaker
Microsoft Publisher
Plan Perfect
WordPerfect General File Format
Lotus Freelance
Microsoft Wave Sound File
MIDI Sound File
AutoCAD DXF Graphics
File Class
Desktop
Publishing
Raster image
Database
Mixed format
Database
Word processor
Spreadsheet
Presentation
Presentation
Raster image
Time scheduling
Desktop publishing
Encapsulation
Raster image
Desktop publishing
Encapsulation
Desktop publishing
Desktop publishing
Time scheduling
Miscellaneous
Presentation
Sound
Sound
Vector graphic
XML Export SDK C Programming Guide
•
•
•
•
•
•
357
358
•
•
•
•
•
•
Appendix E File Format Detection
Number
164
165
166
167
160
161
162
163
168
169
170
156
157
158
159
152
153
154
155
148
149
150
151
144
145
146
147
Format dBase Database
OS/2 PM Metafile Graphics
Lasergraphics Language
AutoShade Rendering File Format
Graphics Environment Manager (GEM VDI)
Microsoft Windows Help File
Ability Office (SS, DB, GR, WP, COM)
XyWrite/Nota Bene
Comma Separated Values (CSV)
Writing Assistant word processor
WordStar 2000
WordStar 6.0
HP Printer Control Language (PCL)
(UNIX/VAX/SUN) Executable
(UNIX/VAX/SUN) Object Module
(UNIX/VAX/SUN) Link Library
NeXT SUN Audio Data
NeWS font file (SUN) cpio Archive Format (UNIX/VAX/SUN)
PEX Binary Archive (SUN)
SUN vfont definition
Curses Screen Image (UNIX/VAX/SUN)
UU Encoded Encryption File
PC Object Module
Microsoft Windows Group File
File Class
Database
Vector graphic
Vector graphic
Vector graphic
Vector graphic
Miscellaneous
Word processor
Spreadsheet
Word processor
Word processor
Word processor
Vector graphic
Executable
Object module
Library module
Sound
Font
Encapsulation
Encapsulation
Font
Raster image
Encapsulation
Object module
Miscellaneous
XML Export SDK C Programming Guide
Category Values in formats_e.ini
187
188
189
190
183
184
185
186
191
192
193
194
195
179
180
181
182
175
176
177
178
Number
171
172
173
174
Format
PC True Type Font
Program Information File
PC COM executable file
Adobe FrameMaker Markup Language
Stuff It Archive (Mac)
PeachCalc Spreadsheet
Wang Office GDL Header Encapsulation
WordPerfect 6.0
Q & A for DOS
Q & A for Windows
DEC WPS PLUS
DCX Fax format
Microsoft Windows OLE 2 Encapsulation
Quattro Pro for Windows
Keyword Viewer Markup Format
EBCDIC Text
DCS
Microsoft Excel Spreadsheet 95, 2000
Microsoft Word for Windows 95
UNIX SHAR Encapsulation
Lotus Notes Bitmap
UNIX Compress Encapsulation
Lotus Notes CDF
UNIX TAR Encapsulation
WordPerfect Graphics V2.0 (WPG2)
196 ODA/ODIF (FOD 26)
XML Export SDK C Programming Guide
File Class
Font
Miscellaneous
Executable
Desktop publishing
Encapsulation
Spreadsheet
Encapsulation
Word processor
Word processor
Word processor
Word processor
Fax
Encapsulation
Spreadsheet
Word processor
Word processor
Spreadsheet
Word processor
Encapsulation
Raster image
Encapsulation
Word processor
Encapsulation
Raster image
Vector graphic
Word processor
•
•
•
•
•
•
359
360
•
•
•
•
•
•
Appendix E File Format Detection
215
216
217
218
211
212
213
214
219
220
221
222
Number
201
202
203
204
197
198
199
200
205
206
207
208
209
210
Format File Class
GZ Compress Encapsulation
Envoy (EVY)
Adobe Portable Document Format (PDF)
KW ODA Internal Raw Bitmap (RBM)
KW ODA G4 (G4)
KW ODA G31D (G31)
KW ODA Internal G32D (G32)
Microsoft Word for Mac V 4.x/5.x
BinHex 4.0 encoded file
SMTP document
MIME format - Microsoft Outlook Express (EML)/
Mailbox (MBX)
SGML document
HTML document
XHTML
1
ACT Format
Microsoft PowerPoint 95
Portable Network Graphics (PNG)
Video for Windows
Windows Animated Cursor
Windows C++ Object Storage
Windows Palette
RIFF Device Independent Bitmap
RIFF MIDI
RIFF Multimedia Movie
MPEG Movie
QuickTime Movie
Encapsulation
Word processor
Word processor
Raster image
Raster image
Raster image
Raster image
Word processor
Encapsulation
Encapsulation
Encapsulation
Word processor
Word processor
Word processor
Presentation
Raster image
Movie
Raster image
Mixed format
Raster image
Raster image
Sound
Movie
Movie
Movie
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Number
243
244
245
246
239
240
241
242
247
248
249
235
236
237
238
231
232
233
234
227
228
229
230
223
224
225
226
Format
Audio Interchange File Format (AIFF) Sound
Amiga MOD Sound
Amiga IFF (8SVX) Sound
Creative Voice (VOC) Sound
Microsoft Works (Windows)
Microsoft Works Spreadsheet (Windows)
AutoDesk Animator FLIC Animation
AutoDesk Animator Pro FLIC Animation
Microsoft Works Database (Windows)
Microsoft Works Communication (Windows)
Compactor / Compact Pro Archive
VRML
QuickDraw 3D Metafile (3DMF)
PGP Secret Keyring
PGP Public Keyring
PGP Encrypted Data
PGP Signed Data
PGP Signed and Encrypted Data
PGP Signature Certificate
ASCII-armored PGP Public Keyring
ASCII-armored PGP encoded
ASCII-armored PGP signed
OLE DIB object
PGP Compressed Data
SGI Image
Lotus Screen Cam
MPEG Audio
File Class
Sound
Sound
Sound
Sound
Word processor
Spreadsheet
Animation
Animation
Database
Communications
Encapsulation
Vector graphic
Vector graphic
Encapsulation
Encapsulation
Encapsulation
Encapsulation
Encapsulation
Encapsulation
Encapsulation
Encapsulation
Encapsulation
Raster image
Encapsulation
Raster image
Animation
Sound
XML Export SDK C Programming Guide
•
•
•
•
•
•
361
362
•
•
•
•
•
•
Appendix E File Format Detection
Number
270
271
272
273
266
267
268
269
274
275
276
262
263
264
265
258
259
260
261
254
255
256
257
250
251
252
253
Format
FTP Session Data
Netscape Bookmark file
Corel Draw CMX
AutoCAD Drawing (DWG)
AutoDesk WHIP
Macromedia Director
Real Audio
MS DOS Device Driver
Micrografx Designer
Simple Vector format (SVF)
WordPerfect Office document (WPD)
Applix Words
Applix Graphics
Microsoft Access
Usenet format
MacBinary
Apple Single
Apple Double
Lotus Word Pro
Microsoft Word 97, 2000
Enhanced Window Metafile
Microsoft Office Drawing
Microsoft PowerPoint 97, 2000
Extended or Custom XML
Device Independent file (DVI)
Unicode
Framework
File Class
Communications
Word processor
Vector image
Vector graphic
Vector graphic
Animation
Sound
Executable
Vector graphic
Vector graphic
Word processor
Presentation
Database
Word processor
Encapsulation
Encapsulation
Encapsulation
Word processor
Word processor
Vector graphic
Vector graphic
Presentation
Word processor
Vector graphic
Word processor
Mixed
XML Export SDK C Programming Guide
Category Values in formats_e.ini
294
295
296
297
298
299
290
291
292
293
286
287
288
289
300
301
302
Number
281
282
283
284
285
277
278
279
280
Format
KPIF Chart Stream
Applix Spreadsheet
Microsoft Device Independent Bitmap
KeyView GPF Filter
Microsoft Project 98, 2000, 2002
Folio Flat file
HWP (Arae-Ah Hangul)
JustSystems Ichitaro
Generic XML format
Microsoft Office 2003 XML format
2
Fujitsu Oasys
Portable Bitmap Utilities (PBM)
Portable Greymap Utilities (PGM)
Portable Pixmap Utilities (PPM)
X Bitmap (XBM)
X Pixmap (XPM)
X Image
PCD Image
Microsoft Visio
Microsoft Outlook (MSG)
XHTML document
Microsoft Outlook Personal Folders file (PST)
WinRAR Compressed Archive format (RAR)
Lotus Notes Database (NSF)
Legato Extender ONM
Macromedia Flash
Microsoft Word 2007 (XML format)
Microsoft Excel 2007 (XML format)
File Class
Spreadsheet
Raster image
Time scheduling
Word processor
Word processor
Word processor
Word processor
Word processor
Raster image
Raster image
Raster image
Raster image
Raster image
Raster image
Raster image
Presentation
Encapsulation
Word processor
Encapsulation
Encapsulation
Encapsulation
Word processor
Word processor
Spreadsheet
XML Export SDK C Programming Guide
•
•
•
•
•
•
363
364
•
•
•
•
•
•
Appendix E File Format Detection
Number
324
325
326
327
320
321
322
323
328
329
330
315
316
317
319
311
312
313
314
307
308
309
310
303
304
305
306
Format
Microsoft PowerPoint 2007 (XML format)
Open PGP (new format packets only)
Intergraph version 7 DGN
Microstation version 8 DGN
Microsoft Word 2007 Macro
Microsoft Excel 2007 Macro
Microsoft PowerPoint Macro
Microsoft Compression folder (LZH)
Office 2007 Document
XML Paper Specification
Lotus Domino Extensible Language
OASIS Open Document (ODT)
OASIS Open Document (ODS)
OASIS Open Document (ODP)
Legato EMailXtender Native Message
Transfer Neutral Encapsulation Format (TNEF)
CADAM Drawing
CADAM Drawing Overlay
NURSTOR Drawing
HP Graphics Language (Plotter)
Advanced Systems Format
Windows Media Audio Format
Windows Media Video Format
Legato EMailXtender Archive
7-Zip
Microsoft Office 2007 Excel Binary Format
Microsoft Cabinet File
File Class
Presentation
Encapsulation
Vector graphic
Vector graphic
Word processor
Spreadsheet
Presentation
Encapsulation
Miscellaneous
Word processor
Encapsulation
Word processor
Spreadsheet
Presentation
Word Processor
Encapsulation
Vector graphic
Vector graphic
Vector graphic
Vector graphic
Miscellaneous
Sound
Movie
Encapsulation
Encapsulation
Spreadsheet
Encapsulation
XML Export SDK C Programming Guide
Number
351
352
353
354
347
348
349
350
355
356
357
343
344
345
346
339
340
341
342
335
336
337
338
331
332
333
334
Format
CATIA formats
Yahoo! Instant Messenger
Founder Chinese E-paper Basic
Corel Quattro Pro X4
MIME HTML
Microsoft Document Imaging Format
Microsoft Office Groove File Format
Apple iWorks Pages
Apple iWorks Numbers
Apple iWorks Keynote
Microsoft Backup File
Microsoft Access 2007
Microsoft Entourage Database
Mac Disk Copy Disk Image File
Appleworks File
Omni Outliner (OO3) File
Omni Outliner (OPML) File
Omni Graffle XML File
Apple Photoshop Document
Apple Binary Property List
Apple iChat Format
Omni Outliner (OOUTLINE) File
Bzip 2 Compressed File
ISO-9660 CD Disc Image Format
Xerox DocuWorks
RealMedia Streaming Media
AC3 Audio File Format
XML Export SDK C Programming Guide
Category Values in formats_e.ini
File Class
Vector graphic
Word processor
Word processor
Spreadsheet
Word processor
Raster image
Word processor
Word processor
Spreadsheet
Presentation
Encapsulation
Database
Encapsulation
Encapsularion
Word processor
Word processor
Word processor
Vector graphic
Raster image
Miscellaneous
Word processor
Word processor
Encapsulation
Encapsulation
Word processor
Movie
Sound
•
•
•
•
•
•
365
366
•
•
•
•
•
•
Appendix E File Format Detection
381
382
383
384
377
378
379
380
385
386
387
388
389
371
372
373
374
375
376
Number
358
359
366
367
368
370
Format
Nero Encrypted File
SolidWorks
Extensible Forms Description Language
Apple XML Property List
OneNote Note Format
Digital Imaging and Communications in Medicine
(DICOM)
Expert Witness Compression Format
Shell Scrap Object File
Microsoft Project 2007
Microsoft Publisher 98–
Skype Log File
Lotus Notes Bitmap Format (DXL embedded images)
Health level7 message
Microsoft Outlook Offline Storage File
Open Publication Structure eBook
Microsoft Outlook Express DBX
BlackBerry Activation File
Disk Image
Milestone
RealLegal E-Transcript File
PostScript Type 1 Font
Ghost Disk Image File
JPEG-2000 JP2 File Format Syntax (ISO/IEC
15444-1)
Unicode HTML
Microsoft Compiled HTML Help
File Class
Encapsulation
Vector graphic
Presentation
Miscellaneous
Presentation
Raster image
Encapsulation
Encapsulation
Time scheduling
Desktop publishing
Word processor
Raster image
Word processor
Encapsulation
Word processor
Encapsulation
Word processor
Encapsulation
Raster Image
Word processor
Font
Encapsulation
Raster Image
Word processor
Encapsulation
XML Export SDK C Programming Guide
Category Values in formats_e.ini
Number Format File Class
390
393
395
397
409
412
414
Documentum EMCMF
JBIG2 File
AD1 Evidence file
Group Wise File Surf email
Microsoft Outlook for Macintosh
Microsoft Outlook vCard Contact
Microsoft Outlook iCalendar
Encapsulation
Raster image
Encapsulation
Encapsulation
Encapsulation
Word processor
Encapsulation
1. If the major version is 100, the file format is XHTML.
2. The major version determines whether the Microsoft Office XML file is a Word, Excel or Visio document. The major version for each format is as follows:
Word: 100
Excel: 101
Visio: 110
XML Export SDK C Programming Guide
•
•
•
•
•
•
367
368
•
•
•
•
•
•
Appendix E File Format Detection
Attribute
Number
12
13
14
15
08
09
10
11
04
05
06
07
0
01
02
03
16
17
18
19
20
21
File Class
No file class
Word processor
Spreadsheet
Database
Raster image
Vector graphic
Presentation
Executable
Encapsulation
Sound
Desktop publishing
Outline/planning
Miscellaneous
Mixed format
Font
Time scheduling
Communications
Object module
Library module
Fax
Movie
Animation
XML Export SDK C Programming Guide
22
23
24
25
18
19
20
21
Attribute
Number
12
13
14
15
08
09
10
11
16
17
04
05
06
07
00
01
02
03
Minor Format
Minor format not defined
Standard
Book
Chart
Macro
Text
Binary
PC
Windows
DOS
Macintosh
RGB
TIFF
IFF
Experimental
Format Information
RLE
Symbol
Old
Footnote
Style
Palette
Configuration
Activity
Resource
Calculation
XML Export SDK C Programming Guide
Category Values in formats_e.ini
•
•
•
•
•
•
369
Appendix E File Format Detection
31
32
33
34
27
28
29
30
Attribute
Number
26
Minor Format
Glossary
Spelling
Thesaurus
Hyphenation
Miscellaneous
UNIX
VAX
Driver
Archive
370
•
•
•
•
•
• XML Export SDK C Programming Guide
A PPENDIX F
File Formats and Extensions
This section lists the KeyView file format numbers and their associated file extensions. It contains the following topics:
File Format and Extension Table
File Format and Extension Table
Table lists the KeyView file format codes and the file extensions they are most
commonly associated with.
NOTE
is not a complete list of file extensions. KeyView returns format codes based on file content, which cannot always be predicted from the file extension. Some file extensions may also be associated with multiple format numbers.
XML Export SDK C Programming Guide
•
•
•
•
•
•
371
Appendix F File Formats and Extensions
372
•
•
•
•
•
•
Format Name
AES_Multiplus_Comm_Fmt
ASCII_Text_Fmt
MSDOS_Batch_File_Fmt
Applix_Alis_Fmt
BMP_Fmt
CT_DEF_Fmt
Corel_Draw_Fmt
CGM_ClearText_Fmt
CGM_Binary_Fmt
CGM_Character_Fmt
Word_Connection_Fmt
COMET_TOP_Word_Fmt
CEOwrite_Fmt
DSA101_Fmt
DCA_RFT_Fmt
CDA_DDIF_Fmt
DG_CDS_Fmt
Micrografx_Draw_Fmt
Data_Point_VistaWord_Fmt
DECdx_Fmt
Enable_WP_Fmt
EPSF_Fmt
Preview_EPSF_Fmt
11
12
13
14
15
16
17
18
19
20
21
22
23
9
10
7
8
3
4
1
2
5
6
Format
Number Format Description
Multiplus (AES)
Text
MS-DOS Batch File
APPLIX ASTERIX
Windows Bitmap
Convergent Technologies DEF
Comm. Format
Corel Draw
Computer Graphics Metafile
(CGM)
Computer Graphics Metafile
(CGM)
Computer Graphics Metafile
(CGM)
Word Connection
COMET TOP
CEOwrite
DSA101 (Honeywell Bull)
DCA-RFT (IBM Revisable
Form)
CDA / DDIF
DG Common Data Stream
(CDS)
Windows Draw (Micrografx)
Vistaword
DECdx
Enable Word Processing
Encapsulated PostScript
Encapsulated PostScript
Associated File
Extension
PTF
BAT
AX
BMP
CDR
CGM
CGM
CGM
CN
CW
RFT
CDS
DRW
DX
WPF
EPS
EPS
1
XML Export SDK C Programming Guide
File Format and Extension Table
Format Name
MS_Executable_Fmt
G31D_Fmt
GIF_87a_Fmt
GIF_89a_Fmt
HP_Word_PC_Fmt
IBM_1403_LinePrinter_Fmt
IBM_DCF_Script_Fmt
IBM_DCA_FFT_Fmt
Interleaf_Fmt
GEM_Image_Fmt
IBM_Display_Write_Fmt
Sun_Raster_Fmt
Ami_Pro_Fmt
Ami_Pro_StyleSheet_Fmt
MORE_Fmt
Lyrix_Fmt
MASS_11_Fmt
MacPaint_Fmt
MS_Word_Mac_Fmt
SmartWare_II_Comm_Fmt
MS_Word_Win_Fmt
Multimate_Fmt
Multimate_Fnote_Fmt
Multimate_Adv_Fmt
Multimate_Adv_Fnote_Fmt
40
41
42
43
36
37
38
39
44
45
46
47
48
32
33
34
35
28
29
30
31
Format
Number
24
25
26
27
Format Description
MSDOS/Windows Program
CCITT G3 1D
Graphics Interchange Format
(GIF87a)
Graphics Interchange Format
(GIF89a)
HP Word PC
IBM 1403 Line Printer
DCF Script
DCA-FFT (IBM Final Form)
Interleaf
GEM Bit Image
Display Write
Sun Raster
Lotus Ami Pro
Lotus Ami Pro Style Sheet
MORE Database MAC
Lyrix Word Processing
MASS-11
MacPaint
Microsoft Word for Macintosh
SmartWare II
Microsoft Word for Windows
MultiMate
MultiMate Footnote File
MultiMate Advantage
MultiMate Advantage Footnote
File
Associated File
Extension
EXE
HW
I4
IC
IF
IMG
IP
RAS
SAM
M1
PNTG
DOC
DOC
FNX
XML Export SDK C Programming Guide
•
•
•
•
•
•
373
Appendix F File Formats and Extensions
374
•
•
•
•
•
•
Format Name
Multimate_Adv_II_Fmt
Multimate_Adv_II_Fnote_Fmt
Multiplan_PC_Fmt
Multiplan_Mac_Fmt
MS_RTF_Fmt
MS_Word_PC_Fmt
MS_Word_PC_StyleSheet_Fmt
MS_Word_PC_Glossary_Fmt
MS_Word_PC_Driver_Fmt
MS_Word_PC_Misc_Fmt
NBI_Async_Archive_Fmt
Navy_DIF_Fmt
NBI_Net_Archive_Fmt
NIOS_TOP_Fmt
FileMaker_Mac_Fmt
ODA_Q1_11_Fmt
ODA_Q1_12_Fmt
OLIDIF_Fmt
Office_Writer_Fmt
PC_Paintbrush_Fmt
CPT_Comm_Fmt
Lotus_PIC_Fmt
Mac_PICT_Fmt
Philips_Script_Word_Fmt
PostScript_Fmt
71
72
73
67
68
69
70
63
64
65
66
59
60
61
62
56
57
58
51
52
53
54
55
Format
Number
49
50
Format Description
MultiMate Advantage II
Associated File
Extension
FNX
MultiMate Advantage II
Footnote File
Multiplan (PC)
Multiplan (Mac)
Rich Text Format (RTF)
Microsoft Word for PC
RTF
DOC
DOC
Microsoft Word for PC Style
Sheet
Microsoft Word for PC Glossary DOC
Microsoft Word for PC Driver DOC
Microsoft Word for PC
Miscellaneous File
DOC
NBI Async Archive Format
Navy DIF
NBI Net Archive Format
NIOS TOP
ND
NN
Filemaker MAC
ODA / ODIF
ODA / ODIF
OLIDIF (Olivetti)
FP5, FP7
OD
OD
Office Writer OW
PC Paintbrush Graphics (PCX) PCX
CPT
Lotus PIC
QuickDraw Picture
Philips Script
PIC
PCT
PostScript PS
XML Export SDK C Programming Guide
File Format and Extension Table
Format Name
PRIMEWORD_Fmt
Quadratron_Q_One_v1_Fmt
Quadratron_Q_One_v2_Fmt
SAMNA_Word_IV_Fmt
Ami_Pro_Draw_Fmt
SYLK_Spreadsheet_Fmt
SmartWare_II_WP_Fmt
Symphony_Fmt
Targa_Fmt
TIFF_Fmt
Targon_Word_Fmt
Uniplex_Ucalc_Fmt
Uniplex_WP_Fmt
MS_Word_UNIX_Fmt
WANG_PC_Fmt
WordERA_Fmt
WANG_WPS_Comm_Fmt
WordPerfect_Mac_Fmt
WordPerfect_Fmt
WordPerfect_VAX_Fmt
WordPerfect_Macro_Fmt
WordPerfect_Dictionary_Fmt
WordPerfect_Thesaurus_Fmt
WordPerfect_Resource_Fmt
WordPerfect_Driver_Fmt
WordPerfect_Cfg_Fmt
96
97
98
99
84
85
86
87
80
81
82
83
76
77
78
79
Format
Number
74
75
92
93
94
95
88
89
90
91
Format Description
PRIMEWORD
Q-One V1.93J
Q-One V2.0
SAMNA Word
Lotus Ami Pro Draw
SYLK
SmartWare II
Symphony
Targa
TIFF
Targon Word
Uniplex Ucalc
Uniplex
Microsoft Word UNIX
WANG PC
WordERA
WANG WPS
WordPerfect MAC
WordPerfect
WordPerfect VAX
WordPerfect Macro
WordPerfect Spelling
Dictionary
WordPerfect Thesaurus
WordPerfect Resource File
WordPerfect Driver
WordPerfect Configuration File
Associated File
Extension
Q1
Q1
WF
SAM
SDW
WR1
TGA
TIF, TIFF
TW
SS
UP
DOC
WPM, WPD
WO, WPD
WPD
XML Export SDK C Programming Guide
•
•
•
•
•
•
375
Appendix F File Formats and Extensions
376
•
•
•
•
•
•
Format Name
WordPerfect_Hyphenation_Fmt
WordPerfect_Misc_Fmt
WordMARC_Fmt
Windows_Metafile_Fmt
Windows_Metafile_NoHdr_Fmt
SmartWare_II_DB_Fmt
WordPerfect_Graphics_Fmt
WordStar_Fmt
WANG_WITA_Fmt
Xerox_860_Comm_Fmt
Xerox_Writer_Fmt
DIF_SpreadSheet_Fmt
Enable_Spreadsheet_Fmt
SuperCalc_Fmt
UltraCalc_Fmt
SmartWare_II_SS_Fmt
SOF_Encapsulation_Fmt
PowerPoint_Win_Fmt
PowerPoint_Mac_Fmt
PowerPoint_95_Fmt
PowerPoint_97_Fmt
PageMaker_Mac_Fmt
PageMaker_Win_Fmt
MS_Works_Mac_WP_Fmt
MS_Works_Mac_DB_Fmt
110
111
112
113
114
115
116
106
107
108
109
102
103
104
105
121
122
123
124
117
118
119
120
Format
Number
100
101
Format Description
WordPerfect Hyphenation
Dictionary
Associated File
Extension
WordPerfect Miscellaneous
File
WPD
WordMARC
Windows Metafile
WM, PW
WMF
Windows Metafile (no header) WMF
SmartWare II
WordPerfect Graphics
WordStar
WANG WITA
WPG, QPG
WS
WT
Xerox 860
Xerox Writer
Data Interchange Format (DIF) DIF
Enable Spreadsheet SSF
Supercalc
UltraCalc
CAL
SmartWare II
Serialized Object Format
(SOF)
SOF
PowerPoint PC
PowerPoint MAC
PowerPoint 95
PowerPoint 97
PPT
PPT
PPT
PPT
PageMaker for Macintosh
PageMaker for Windows
Microsoft Works for MAC
Microsoft Works for MAC
XML Export SDK C Programming Guide
File Format and Extension Table
Format Name
MS_Works_Mac_SS_Fmt
MS_Works_Mac_Comm_Fmt
MS_Works_DOS_WP_Fmt
MS_Works_DOS_DB_Fmt
MS_Works_DOS_SS_Fmt
MS_Works_Win_WP_Fmt
MS_Works_Win_DB_Fmt
MS_Works_Win_SS_Fmt
PC_Library_Fmt
MacWrite_Fmt
MacWrite_II_Fmt
Freehand_Fmt
Disk_Doubler_Fmt
HP_GL_Fmt
FrameMaker_Fmt
FrameMaker_Book_Fmt
Maker_Markup_Language_Fmt
Maker_Interchange_Fmt
JPEG_File_Interchange_Fmt
Reflex_Fmt
Framework_Fmt
Framework_II_Fmt
Paradox_Fmt
MS_Windows_Write_Fmt
Quattro_Pro_DOS_Fmt
Quattro_Pro_Win_Fmt
135
136
137
138
131
132
133
134
139
140
141
142
127
128
129
130
Format
Number
125
126
147
148
149
150
143
144
145
146
Format Description
Microsoft Works for MAC
Microsoft Works for MAC
Microsoft Works for DOS
Microsoft Works for DOS
Microsoft Works for DOS
Microsoft Works for Windows
Microsoft Works for Windows
Microsoft Works for Windows
DOS/Windows Object Library
MacWrite
MacWrite II
Freehand MAC
Disk Doubler
HP Graphics Language
FrameMaker
FrameMaker
Maker Markup Language
Maker Interchange Format
(MIF)
Interchange Format
Reflex
Framework
Framework II
Paradox
Windows Write
Quattro Pro for DOS
Quattro Pro for Windows
Associated File
Extension
WDB
WDB
FW3
DB
WRI
S30, S40
HPGL
FM, FRM
BOOK
MIF
JPG, JPEG
WB2, WB3
XML Export SDK C Programming Guide
•
•
•
•
•
•
377
Appendix F File Formats and Extensions
378
•
•
•
•
•
•
Format Name
Persuasion_Fmt
Windows_Icon_Fmt
Windows_Cursor_Fmt
MS_Project_Activity_Fmt
MS_Project_Resource_Fmt
MS_Project_Calc_Fmt
PKZIP_Fmt
Quark_Xpress_Fmt
ARC_PAK_Archive_Fmt
MS_Publisher_Fmt
PlanPerfect_Fmt
WordPerfect_Auxiliary_Fmt
MS_WAVE_Audio_Fmt
MIDI_Audio_Fmt
AutoCAD_DXF_Binary_Fmt
AutoCAD_DXF_Text_Fmt dBase_Fmt
OS_2_PM_Metafile_Fmt
Lasergraphics_Language_Fmt
AutoShade_Rendering_Fmt
GEM_VDI_Fmt
Windows_Help_Fmt
Volkswriter_Fmt
Ability_WP_Fmt
Ability_DB_Fmt
Ability_SS_Fmt
Ability_Comm_Fmt
169
170
171
172
165
166
167
168
173
174
175
176
177
161
162
163
164
157
158
159
160
153
154
155
156
Format
Number
151
152
Format Description
Persuasion
Windows Icon Format
Windows Cursor
Microsoft Project
Microsoft Project
Microsoft Project
ZIP Archive
Quark Xpress MAC
PAK/ARC Archive
Microsoft Publisher
PlanPerfect
WordPerfect auxiliary file
Microsoft Wave
MIDI
AutoCAD DXF
AutoCAD DXF dBase
OS/2 PM Metafile
Lasergraphics Language
AutoShade Rendering
GEM VDI
Windows Help File
Volkswriter
Ability
Ability
Ability
Ability
VDI
HLP
VW4
XML Export SDK C Programming Guide
Associated File
Extension
ICO
CUR
ZIP
ARC, PAK
WPW
WAV
MID, MIDI
DXF
DXF
DBF
MET
File Format and Extension Table
Format Name
Ability_Image_Fmt
XyWrite_Fmt
CSV_Fmt
IBM_Writing_Assistant_Fmt
WordStar_2000_Fmt
HP_PCL_Fmt
UNIX_Exe_PreSysV_VAX_Fmt
UNIX_Exe_Basic_16_Fmt
UNIX_Exe_x86_Fmt
UNIX_Exe_iAPX_286_Fmt
UNIX_Exe_MC68k_Fmt
UNIX_Exe_3B20_Fmt
UNIX_Exe_WE32000_Fmt
UNIX_Exe_VAX_Fmt
UNIX_Exe_Bell_5_Fmt
UNIX_Obj_VAX_Demand_Fmt
UNIX_Obj_MS8086_Fmt
UNIX_Obj_Z8000_Fmt
AU_Audio_Fmt
NeWS_Font_Fmt cpio_Archive_CRChdr_Fmt cpio_Archive_CHRhdr_Fmt
PEX_Binary_Archive_Fmt
Sun_vfont_Fmt
Curses_Screen_Fmt
189
190
191
192
193
185
186
187
188
194
199
200
201
202
195
196
197
198
181
182
183
184
Format
Number
178
179
180
Format Description
Ability
XYWrite / Nota Bene
CSV (Comma Separated
Values)
IBM Writing Assistant
WordStar 2000
HP Printer Control Language
Unix Executable (PDP-11/ pre-System V VAX)
Unix Executable (Basic-16)
Unix Executable (x86)
Unix Executable (iAPX 286)
Unix Executable (MC680x0)
Unix Executable (3B20)
Unix Executable (WE32000)
Unix Executable (VAX)
Unix Executable (Bell 5.0)
Unix Object Module (VAX
Demand)
Unix Object Module (old MS
8086)
Unix Object Module (Z8000)
NeXT/Sun Audio Data
NeWS bitmap font cpio archive (CRC Header) cpio archive (CHR Header)
SUN PEX Binary Archive
SUN vfont Definition
Curses Screen Image
Associated File
Extension
XY4
CSV
IWA
WS2
PCL
AU
XML Export SDK C Programming Guide
•
•
•
•
•
•
379
Appendix F File Formats and Extensions
380
•
•
•
•
•
•
Format Name
UUEncoded_Fmt
WriteNow_Fmt
PC_Obj_Fmt
Windows_Group_Fmt
TrueType_Font_Fmt
Windows_PIF_Fmt
MS_COM_Executable_Fmt
StuffIt_Fmt
PeachCalc_Fmt
Wang_GDL_Fmt
Q_A_DOS_Fmt
Q_A_Win_Fmt
WPS_PLUS_Fmt
DCX_Fmt
OLE_Fmt
EBCDIC_Fmt
DCS_Fmt
UNIX_SHAR_Fmt
Lotus_Notes_BitMap_Fmt
Lotus_Notes_CDF_Fmt
Compress_Fmt
GZ_Compress_Fmt
TAR_Fmt
ODIF_FOD26_Fmt
ODIF_FOD36_Fmt
ALIS_Fmt
Envoy_Fmt
221
222
223
224
217
218
219
220
225
226
227
228
229
213
214
215
216
209
210
211
212
205
206
207
208
Format
Number
203
204
Format Description
UU encoded
WriteNow MAC
PC (.COM)
StuffIt (MAC)
PeachCalc
WANG Office GDL Header
Associated File
Extension
UUE
DOS/Windows Object Module
Windows Group
TrueType Font TTF
Program Information File (PIF) PIF
COM
HQX
Q & A for DOS
Q & A for Windows
WPS-PLUS
JW
WPL
DCX FAX Format(PCX images DCX
OLE Compound Document
EBCDIC Text
OLE
DCS
SHAR
Lotus Notes Bitmap
Lotus Notes CDF
SHAR
Unix Compress
GZ Compress
TAR
ODA / ODIF
CDF
Z
GZ
TAR
F26
F36 ODA / ODIF
ALIS
Envoy EVY
XML Export SDK C Programming Guide
File Format and Extension Table
Format Name
PDF_Fmt
BinHex_Fmt
SMTP_Fmt
MIME_Fmt
USENET_Fmt
SGML_Fmt
HTML_Fmt
ACT_Fmt
PNG_Fmt
MS_Video_Fmt
Windows_Animated_Cursor_Fmt
239
240
Windows_CPP_Obj_Storage_Fmt 241
Windows_Palette_Fmt 242
RIFF_DIB_Fmt 243
RIFF_MIDI_Fmt
RIFF_Multimedia_Movie_Fmt
MPEG_Fmt
QuickTime_Fmt
AIFF_Fmt 248
244
245
246
247
Amiga_MOD_Fmt
Amiga_IFF_8SVX_Fmt
Creative_Voice_Audio_Fmt
AutoDesk_Animator_FLI_Fmt
AutoDesk_AnimatorPro_FLC_Fmt 253
Compactor_Archive_Fmt 254
249
250
251
252
232
233
234
235
Format
Number
230
231
236
237
238
Format Description
Portable Document Format
BinHex
SMTP
MIME
2
USENET
SGML
HTML
ACT
Portable Network Graphics
(PNG)
Video for Windows (AVI)
Windows Animated Cursor
Windows C++ Object Storage
Windows Palette
RIFF Device Independent
Bitmap
RIFF MIDI
RIFF Multimedia Movie
MPEG Movie
QuickTime Movie, MPEG-4
Audio
Audio Interchange File Format
(AIFF)
Amiga MOD
Amiga IFF (8SVX) Sound
Creative Voice (VOC)
AutoDesk Animator FLIC
AutoDesk Animator Pro FLIC
Compactor / Compact Pro
SGML
ACT
PNG
AVI
ANI
PAL
RMI
MOV, QT, MP4
AIF, AIFF
MOD
IFF
VOC
FLI
FLC
Associated File
Extension
HQX
SMTP
EML, MBX
XML Export SDK C Programming Guide
•
•
•
•
•
•
381
Appendix F File Formats and Extensions
382
•
•
•
•
•
•
Format Name
VRML_Fmt
QuickDraw_3D_Metafile_Fmt
PGP_Secret_Keyring_Fmt
PGP_Public_Keyring_Fmt
PGP_Encrypted_Data_Fmt
PGP_Signed_Data_Fmt
257
258
259
260
PGP_SignedEncrypted_Data_Fmt 261
Format
Number
255
256
PGP_Sign_Certificate_Fmt
PGP_Compressed_Data_Fmt
PGP_ASCII_Public_Keyring_Fmt
262
263
264
PGP_ASCII_Encoded_Fmt
PGP_ASCII_Signed_Fmt
OLE_DIB_Fmt
SGI_Image_Fmt
Lotus_ScreenCam_Fmt
MPEG_Audio_Fmt
FTP_Software_Session_Fmt
Netscape_Bookmark_File_Fmt
Corel_Draw_CMX_Fmt
AutoDesk_DWG_Fmt
AutoDesk_WHIP_Fmt
Macromedia_Director_Fmt
Real_Audio_Fmt
MSDOS_Device_Driver_Fmt
Micrografx_Designer_Fmt
SVF_Fmt
277
278
279
280
273
274
275
276
269
270
271
272
265
266
267
268
Format Description
VRML
QuickDraw 3D Metafile
PGP Secret Keyring
PGP Public Keyring
PGP Encrypted Data
PGP Signed Data
PGP Signed and Encrypted
Data
Associated File
Extension
WRL
PGP Signature Certificate
PGP Compressed Data
ASCII-armored PGP Public
Keyring
ASCII-armored PGP encoded PGP
ASCII-armored PGP encoded PGP
OLE DIB object
SGI Image
Lotus ScreenCam
MPEG Audio
RGB
FTP Session Data
Netscape Bookmark File
Corel CMX
AutoDesk Drawing (DWG)
AutoDesk WHIP
Macromedia Director
Real Audio
MSDOS Device Driver
Micrografx Designer
Simple Vector Format (SVF)
MPEGA
STE
HTM
CMX
DWG
WHP
DCR
RM
SYS
DSF
SVF
XML Export SDK C Programming Guide
File Format and Extension Table
Format Name
Applix_Words_Fmt
Applix_Graphics_Fmt
MS_Access_Fmt
MS_Access_95_Fmt
MS_Access_97_Fmt
MacBinary_Fmt
Apple_Single_Fmt
Apple_Double_Fmt
Enhanced_Metafile_Fmt
MS_Office_Drawing_Fmt
XML_Fmt
DeVice_Independent_Fmt
Unicode_Fmt
Lotus_123_Worksheet_Fmt
Lotus_123_Format_Fmt
Lotus_123_97_Fmt
Lotus_Word_Pro_96_Fmt
Lotus_Word_Pro_97_Fmt
Freelance_DOS_Fmt
Freelance_Win_Fmt
Freelance_OS2_Fmt
Freelance_96_Fmt
Freelance_97_Fmt
MS_Word_95_Fmt
MS_Word_97_Fmt
Excel_Fmt
Excel_Chart_Fmt
299
300
301
302
295
296
297
298
303
304
305
306
307
291
292
293
294
287
288
289
290
283
284
285
286
Format
Number
281
282
Format Description
Applix Words
Applix Graphics
Microsoft Access
Microsoft Access 95
Microsoft Access 97
MacBinary
Apple Single
Apple Double
Enhanced Metafile
Microsoft Office Drawing
EMF
XML
DeVice Independent file (DVI) DVI
Unicode
Lotus 1-2-3
Lotus 1-2-3 Formatting
Lotus 1-2-3 97
Lotus Word Pro 96
Lotus Word Pro 97
Lotus Freelance for DOS
Lotus Freelance for Windows
UNI
FM3
LWP
LWP
Lotus Freelance for OS/2
Lotus Freelance 96
Lotus Freelance 97
Microsoft Word 95
Microsoft Word 97
Microsoft Excel
Microsoft Excel
PRE
PRS
PRZ
PRZ
DOC
DOC
XLS
XLS
Associated File
Extension
AW
AG
MDB
MDB
MDB
BIN
XML Export SDK C Programming Guide
•
•
•
•
•
•
383
Appendix F File Formats and Extensions
384
•
•
•
•
•
•
Format Name
Excel_Macro_Fmt
Excel_95_Fmt
Excel_97_Fmt
Corel_Presentations_Fmt
Harvard_Graphics_Fmt
Harvard_Graphics_Chart_Fmt
Harvard_Graphics_Symbol_Fmt
Harvard_Graphics_Cfg_Fmt
Harvard_Graphics_Palette_Fmt
Lotus_123_R9_Fmt
Applix_Spreadsheets_Fmt
MS_Pocket_Word_Fmt
MS_DIB_Fmt
MS_Word_2000_Fmt
Excel_2000_Fmt
PowerPoint_2000_Fmt
MS_Access_2000_Fmt
MS_Project_4_Fmt
MS_Project_41_Fmt
MS_Project_98_Fmt
Folio_Flat_Fmt
HWP_Fmt
ICHITARO_Fmt
IS_XML_Fmt
Oasys_Fmt
316
317
318
319
320
310
311
312
313
Format
Number
308
309
314
315
325
326
327
328
321
322
323
324
329
330
331
332
Format Description
Microsoft Excel
Microsoft Excel 95
Associated File
Extension
XLS
XLS
XLS
XFD, XFDL
Microsoft Excel 97
Corel Presentations
Harvard Graphics
Harvard Graphics Chart CH3, CHT
Harvard Graphics Symbol File SY3
Harvard Graphics
Configuration File
Harvard Graphics Palette
Lotus 1-2-3 Release 9
Applix Spreadsheets
Microsoft Pocket Word
MS Windows Device
Independent Bitmap
AS
PWD, DOC
Microsoft Word 2000
Microsoft Excel 2000
Microsoft PowerPoint 2000
Microsoft Access 2000
Microsoft Project 4
Microsoft Project 4.1
Microsoft Project 98
Folio Flat File
DOC
XLS
PPT
MDB
, MPP
FFF
HWP HWP(Arae-Ah Hangul)
ICHITARO V4-10
Extended or Custom XML
Oasys format
OA2, OA3
XML Export SDK C Programming Guide
File Format and Extension Table
Format Name
PBM_ASC_Fmt
PBM_BIN_Fmt
PGM_ASC_Fmt
PGM_BIN_Fmt
PPM_ASC_Fmt
PPM_BIN_Fmt
XBM_Fmt
XPM_Fmt
FPX_Fmt
PCD_Fmt
MS_Visio_Fmt
MS_Project_2000_Fmt
MS_Outlook_Fmt
ELF_Relocatable_Fmt
ELF_Executable_Fmt
ELF_Dynamic_Lib_Fmt
MS_Word_XML_Fmt
MS_Excel_XML_Fmt
MS_Visio_XML_Fmt
SO_Text_XML_Fmt
SO_Spreadsheet_XML_Fmt
SO_Presentation_XML_Fmt
XHTML_Fmt
336
337
338
351
352
353
354
355
347
348
349
350
343
344
345
346
339
340
341
342
Format
Number
333
334
335
Format Description
Portable Bitmap Utilities ASCII
Format
Portable Bitmap Utilities Binary
Format
Portable Greymap Utilities
ASCII Format
Portable Greymap Utilities
Binary Format
Portable Pixmap Utilities ASCII
Format
Portable Pixmap Utilities Binary
Format
X Bitmap Format
X Pixmap Format
FPX Format
PCD Format
Microsoft Visio
Microsoft Project 2000
Microsoft Outlook
ELF Relocatable
ELF Executable
ELF Dynamic Library
Microsoft Word 2003 XML
Microsoft Excel 2003 XML
Microsoft Visio 2003 XML
StarOffice Text XML
StarOffice Spreadsheet XML
StarOffice Presentation XML
XHTML
Associated File
Extension
PGM
XBM
XPM
FPX
PCD
VSD
MSG, OFT
O
SO
VDX
SXI
, SXP
, ODP
XML Export SDK C Programming Guide
•
•
•
•
•
•
385
Appendix F File Formats and Extensions
386
•
•
•
•
•
•
Format Name
MS_OutlookPST_Fmt
RAR_Fmt
Lotus_Notes_NSF_Fmt
Macromedia_Flash_Fmt
MS_Word_2007_Fmt
MS_Excel_2007_Fmt
MS_PPT_2007_Fmt
OpenPGP_Fmt
Intergraph_V7_DGN_Fmt
MicroStation_V8_DGN_Fmt
MS_Word_Macro_2007_Fmt
MS_Excel_Macro_2007_Fmt
MS_PPT_Macro_2007_Fmt
LZH_Fmt
Office_2007_Fmt
MS_XPS_Fmt
Lotus_Domino_DXL_Fmt
ODF_Text_Fmt
ODF_Spreadsheet_Fmt
ODF_Presentation_Fmt
365
366
367
368
369
370
371
372
359
360
361
362
363
Format
Number
356
357
358
364
373
374
375
Format Description
Microsoft Outlook PST
RAR
IBM Lotus Notes Database
NSF/NTF
SWF
Microsoft Word 2007 XML
Microsoft Excel 2007 XML
Microsoft PPT 2007 XML
OpenPGP Message Format
(with new packet format)
Intergraph Standard File
Format (ISFF) V7 DGN
(non-OLE)
MicroStation V8 DGN (OLE)
Microsoft Word Macro 2007
XML
Microsoft Excel Macro 2007
XML
Microsoft PPT Macro 2007
XML
LHA Archive
Office 2007 document
Microsoft XML Paper
Specification (XPS)
IBM Lotus representation of
Domino design elements in
XML format
ODF Text
ODF Spreadsheet
ODF Presentation
Associated File
Extension
PST
RAR
NSF
SWF
DOCX, DOTX
XLSX, XLTX
PPTX, POTX, PPSX
PGP
DGN
DGN
PPTM, POTM,
PPSM, PPAM
LZH, LHA
XLSB
XPS
DXL
DOCM, DOTM
XLSM, XLTM, XLAM
ODT
, STW
, STC
XML Export SDK C Programming Guide
File Format and Extension Table
Format Name
Legato_Extender_ONM_Fmt bin_Unknown_Fmt
TNEF_Fmt
CADAM_Drawing_Fmt
CADAM_Drawing_Overlay_Fmt
NURSTOR_Drawing_Fmt
HP_GLP_Fmt
ASF_Fmt
WMA_Fmt
WMV_Fmt
EMX_Fmt
Z7Z_Fmt
MS_Excel_Binary_2007_Fmt
CAB_Fmt
CATIA_Fmt
YIM_Fmt
ODF_Drawing_Fmt
Founder_CEB_Fmt
QPW_Fmt
MHT_Fmt
MDI_Fmt
379
380
381
382
383
384
Format
Number
376
377
378
385
386
387
388
389
390
391
392
393
394
395
396
Format Description
Legato Extender Native
Message ONM n/a
Transport Neutral
Encapsulation Format (TNEF)
CADAM Drawing
CADAM Drawing Overlay
NURSTOR Drawing
HP Graphics Language
(Plotter)
Advanced Systems Format
(ASF)
Window Media Audio Format
(WMA)
Window Media Video Format
(WMV)
Legato EMailXtender Archives
Format (EMX)
7 Zip Format(7z)
Microsoft Excel Binary 2007
Microsoft Cabinet File (CAB)
CATIA Formats (CAT*)
Yahoo Instant Messenger
History
ODF Drawing
Founder Chinese E-paper
Basic (ceb)
Quattro Pro 9+ for Windows
MHT format
Microsoft Document Imaging
Format
Associated File
Extension
ONM various
CDD
CDO
NUR
HPG
ASF
WMA
WMV
EMX
7Z
XLSB
CAB
CAT
3
DAT
CEB
QPW
MHT
MDI
XML Export SDK C Programming Guide
•
•
•
•
•
•
387
Appendix F File Formats and Extensions
388
•
•
•
•
•
•
Format Name
GRV_Fmt
IWWP_Fmt
IWSS_Fmt
IWPG_Fmt
BKF_Fmt
MS_Access_2007_Fmt
ENT_Fmt
DMG_Fmt
CWK_Fmt
OO3_Fmt
OPML_Fmt
Omni_Graffle_XML_File
PSD_Fmt
Apple_Binary_PList_Fmt
Apple_iChat_Fmt
OOUTLINE_Fmt
BZIP2_Fmt
ISO_Fmt
DocuWorks_Fmt
RealMedia_Fmt
AC3Audio_Fmt
NEF_Fmt
SolidWorks_Fmt
XFDL_Fmt
404
405
406
407
408
409
410
399
400
401
402
403
Format
Number
397
398
415
416
417
418
419
411
412
413
414
420
Format Description
Associated File
Extension
Microsoft Office Groove Format GRV
Apple iWork Pages format PAGES, GZ
Apple iWork Numbers format
Apple iWork Keynote format
NUMBERS, GZ
KEY, GZ
Windows Backup File
Microsoft Access 2007
Microsoft Entourage Database
Format
BKF
ACCDB
Mac Disk Copy Disk Image File
AppleWorks File
Omni Outliner File
Omni Outliner File
Omni Graffle XML File
Photoshop Document
Apple Binary Property List format
OO3
OPML
GRAFFLE
PSD
Apple iChat format
OOutliner File
Bzip 2 Compressed File
ISO-9660 CD Disc Image
Format
DocuWorks Format
RealMedia Streaming Media
AC3 Audio File Format
Nero Encrypted File
SolidWorks Format Files
Extensible Forms Description
Language
OOUTLINE
BZ2
ISO
XDW
RM, RA
AC3
NEF
SLDASM, SLDPRT,
SLDDRW
XFDL, XFD
XML Export SDK C Programming Guide
File Format and Extension Table
Format Name
Apple_XML_PList_Fmt
OneNote_Fmt
Dicom_Fmt
EnCase_Fmt
Scrap_Fmt
MS_Project_2007_Fmt
MS_Publisher_98_Fmt
Skype_Fmt
Hl7_Fmt
MS_OutlookOST_Fmt
Epub_Fmt
MS_OEDBX_Fmt
BB_Activ_Fmt
DiskImage_Fmt
Milestone_Fmt
E_Transcript_Fmt
PostScript_Font_Fmt
Ghost_DiskImage_Fmt
JPEG_2000_JP2_File_Fmt
Unicode_HTML_Fmt
CHM_Fmt
EMCMF_Fmt
437
438
439
440
441
442
443
426
427
428
429
430
431
432
433
Format
Number
421
422
424
425
434
435
436
Format Description
Apple XML Property List format
OneNote Note Format
Digital Imaging and
Communications in Medicine
Expert Witness Compression
Format (EnCase)
Associated File
Extension
ONE
DCM
E01, L01, Lx01
Shell Scrap Object File
Microsoft Project 2007
Microsoft Publisher 98/2000/
2002/2003/2007/
Skype Log File
Health level7 message
Microsoft Outlook OST
Electronic Publication
Microsoft Outlook Express
DBX
SHS
DBB
HL7
OST
EPUB
DBX
BlackBerry Activation File
Disk Image
Milestone Document
DAT
RealLegal E-Transcript File
PostScript Type 1 Font
Ghost Disk Image File
MLS, ML3, ML4,
ML5, ML6, ML7,
ML8, ML9
PTX
PFB
GHO, GHS
JPEG-2000 JP2 File Format
Syntax (ISO/IEC 15444-1)
Unicode HTML
JP2, JPF, J2K,
JPWL, JPX, PGX
HTM
Microsoft Compiled HTML Help CHM
Documentum EMCMF format EMCMF
XML Export SDK C Programming Guide
•
•
•
•
•
•
389
Appendix F File Formats and Extensions
390
•
•
•
•
•
•
Format Name
MS_Access_2007_Tmpl_Fmt
Jungum_Fmt
JBIG2_Fmt
EFax_Fmt
AD1_Fmt
SketchUp_Fmt
GWFS_Email_Fmt
JNT_Fmt
Yahoo_yChat_Fmt
PaperPort_MAX_File_Fmt
ARJ_Fmt
RPMSG_Fmt
MAT_Fmt
SGY_Fmt
CDXA_MPEG_PS_Fmt
EVT_Fmt
EVTX_Fmt
MS_OutlookOLM_Fmt
WARC_Fmt
JAVACLASS_Fmt
VCF_Fmt
455
456
457
458
459
460
461
462
463
464
450
451
452
453
454
446
447
448
449
Format
Number
444
445
Format Description
Microsoft Access 2007
Template
Samsung Electronics Jungum
Global document
JBIG2 File Format eFax file
AD1 Evidence file
Google SketchUp
Group Wise File Surf email
Windows Journal format
Yahoo! Messenger chat log
PaperPort image file
ARJ (Archive by Robert Jung) file format
Microsoft Outlook Restricted
Permission Message
MATLAB file format
SEG-Y Seismic Data format
MPEG-PS container with
CDXA stream
Microsoft Windows NT Event
Log
Microsoft Windows Vista Event
Log
Microsoft Outlook for
Macintosh format
Web ARChive
Java Class format
Microsoft Outlook vCard file format
Associated File
Extension
ACCDT
GUL
JB2, JBIG2
EFX
AD1
SKP
GWFS
JNT
YCHAT
MAX
ARJ
RPMSG
MAT, FIG
SGY, SEGY
EVT
EVTX
OLM
WARC
CLASS
VCF
XML Export SDK C Programming Guide
File Format and Extension Table
Format Name
EDB_Fmt
ICS_Fmt
Format
Number
465
466
Format Description
Microsoft Exchange Server
Database file format
Microsoft Outlook iCalendar file format
Microsoft Visio 2013
Microsoft Visio 2013 macro
Associated File
Extension
EDB
ICS, VCS
MS_Visio_2013_Fmt
MS_Visio_2013_Macro_Fmt
467
468
VSDX, VSTX, VSSX
VSDM, VSTM, VSSM
1. This file extension can return more than one format number.
2. MHT, EML, and MBX files may return either format 2, 233 or 395, depending on the text contained in the file. In general, files that contain fields such as To, From, Date, or Subject are considered e-mail messages; files that contain fields such as content-type and mime-version are considered to be MHT files; and files that do not contain any of those fields are considered to be text files.
3. All CAT file extensions, for example CATDrawing, CATProduct, CATPart, and so on.
XML Export SDK C Programming Guide
•
•
•
•
•
•
391
Appendix F File Formats and Extensions
392
•
•
•
•
•
• XML Export SDK C Programming Guide
A PPENDIX G
Extract and Format Lotus
Notes Sub Files
This section describes how to create XML templates to alter the appearance of extracted Lotus mail note sub-files so that they maintain the look and feel of the original notes.
Template Elements and Attributes
Overview
KeyView uses the NSF reader, nsfsr, to extract Lotus database files, and places
Lotus mail notes in sub-files. The NSF reader uses a set of default XML templates to extract the notes and apply formatting, thereby approximating the look and feel of the original notes.
In some cases, you might need to customize the XML templates, for instance if your notes contain custom data. In such cases, you can modify the existing XML templates or create your own.
XML Export SDK C Programming Guide
•
•
•
•
•
•
393
Appendix G Extract and Format Lotus Notes Sub Files
394
•
•
•
•
•
•
During extraction, the NSF reader loads all XML files in the NSFtemplates directory and its subdirectories (except for the NSFtemplates\images directory, which is reserved for images). During initialization, the KeyView XML parser verifies the XML templates. If the templates contain any invalid XML, elements, or attributes, initialization fails and errors are recorded in the nsfsr.log
file.
Customize XML Templates
XML templates are enabled by default. In most cases, the default templates should be sufficient; however, you can customize them or create your own as required.
To customize XML templates for Lotus note extraction
1. Modify the template files in the following directory.
install\OS\bin\NSFtemplates
The main.xml file must exist in the NSFtemplates directory. It is the top-level template file that extracts all sub-files, usually by calling other templates.
2. Ensure that any modifications or additional XML files conform to the supported elements and attributes described in
“Template Elements and Attributes” on page .
3. Extract the Lotus database file.
Use Demo Templates
For testing purposes, you can extract notes using a set of demo templates, which are provided to demonstrate the proper usage of all the XML elements and attributes, because the default templates do not use all the XML elements.
The demo templates are available at:
install\OS\bin\NSFtemplates\demo
To use the demo XML templates
1. In the formats.ini file, set the following parameter.
[nsfsr]
UseDemoTemplate=1
2. In the main.xml file, uncomment the following section.
XML Export SDK C Programming Guide
Template Elements and Attributes
<ifini name="UseDemoTemplate" text="1">
<call file="demo.xml"/>
<quit/>
</ifini>
Use Old Templates
For testing purposes, you can extract notes using legacy templates, which produce MHTML output. You can generate similar output by disabling the XML templates, but using the old templates allows you to see the XML code and compare it to the standard and demo templates.
To use the old XML templates
1. In the formats.ini file, set the following parameter.
[nsfsr]
UseOldTemplate=1
2. In the main.xml file, uncomment the following section.
<ifini name="UseOldTemplate" text="1">
<call file="default_old.xml"/>
<quit/>
</ifini>
Disable XML Templates
For testing purposes, you can disable XML templates; KeyView will extract the notes in MHTML format. You can compare the MHTML output directly by the NSF reader with the MHTML output indirectly by the NSF reader through the XML templates.
To disable XML templates
In the formats.ini file, set the following parameter.
[nsfsr]
ExtractByTemplate=0
Template Elements and Attributes
This section lists the valid XML elements and attributes that you can use when creating or modifying templates. Refer to the demo templates for examples.
XML Export SDK C Programming Guide
•
•
•
•
•
•
395
Appendix G Extract and Format Lotus Notes Sub Files
396
•
•
•
•
•
•
Conditional Elements
The following table lists the valid conditional elements.
Element Description
<keyview>
<if*>
<ifex>, <ifnx>
KeyView XML template container (“root”) element
If condition from comparison is true, process XML.
Conditions can be nested up to 25 levels deep.
Attributes
name . (Required) Name of main item to compare to item or text.
item . (Required if no text) Name of item to compare to item specified by name.
text . (Required if no item) Text to compare to item specified by name.
If name item exists and has a text value or not.
The Notes item might have a value that cannot be converted to text, such as an image.
<ifeq>, <ifne>,
<iflt>, <ifle>,
<ifgt>, <ifge>
Respectively, if text ==, !=, <, >, <=, >, >=.
Text comparison uses a case-insensitive string compare.
<iftdeq>, <iftdne>,
<iftdlt>, <iftdle>,
<iftdgt>, <iftdge>
Respectively, if time/date ==, !=, <, >, <=, >, >=.
Time/date comparison converts dates to text in local time using the Notes default, TZFMT_NEVER, because Notes also sometimes converts fields to text internally. For example: text="06/30/2005 02:52:04 PM"
<iftzeq>, <iftzne> Respectively, if the time zone equals or does not equal the comparison text, for example CDT, EST, and so on.
<ifini> If the value of the INI option specified in name equals the text value.
<else>
<switch>
If the condition from the last <if> or <switch> was false, process XML.
If name value exists, process XML.
Attributes
name . (Required) Name of main item to compare in
<case> sub-elements.
XML Export SDK C Programming Guide
Template Elements and Attributes
<iftdeq>, <iftdne>,
<iftdlt>, <iftdle>,
<iftdgt>, <iftdge>
Respectively, if time/date ==, !=, <, >, <=, >, >=.
Time/date comparison converts dates to text in local time using the Notes default, TZFMT_NEVER, because Notes also sometimes converts fields to text internally. For example: text="06/30/2005 02:52:04 PM"
<iftzeq>, <iftzne> Respectively, if the time zone equals or does not equal the comparison text, for example CDT, EST, and so on.
<ifini> If the value of the INI option specified in name equals the text value.
<else>
<switch>
If the condition from the last <if> or <switch> was false, process XML.
If name value exists, process XML.
Attributes
name . (Required) Name of main item to compare in
<case> sub-elements.
XML Export SDK C Programming Guide
•
•
•
•
•
•
397
398
•
•
•
•
•
•
Appendix G Extract and Format Lotus Notes Sub Files
Control Elements
The following table lists the valid control elements.
Element
<call>
<log>
<quit>
<stop>
Description
Call another XML template. You can nest templates up to
10 levels deep.
Attributes
file . (Required) Template file name. Must be unique.
Log message to the NSF log file.
Attributes
text . (Required) Text to log.
type . (Optional) Type of log message. The following values are valid.
ERROR
WARN
INFO
DIAG (default)
DEBUG
DUMP
Quit processing the template. Exits without error.
Attributes
text . (Optional) Text to log.
type
. (Optional) Type of log message. See <log>
.
Stop processing the template. Exits with an ERROR type of log message.
Attributes
text . (Required) Text to log.
XML Export SDK C Programming Guide
Template Elements and Attributes
Data Elements
The following table lists the valid data elements.
Element
<text>
<rich>
<body>
<form>
<addr>
<name>
Description
Output text.
Attributes
name . (Required if no parent) Name of the item to output.
Output rich text (MHTML). Images are output in the next part or parts of the MHTML, after the first <HTML> part.
Attributes
name . (Required if no parent) Name of the item to output.
Output the message body in rich text (MHTML). As with
, images are output in the next part or parts of the
MHTML.
Output the message form (usually $Body field) in rich text
(MHTML).
Attributes
name . (Required if no parent) Name of the item to output.
Output an address.
Attributes
name . (Required if no parent) Name of the item to output.
type . (Optional) Type of address to output. If you use this attribute, you must set it to CN (Common Name), which is the only supported type.
Output the name of the last name item, or in other words the current main item. The item must exist.
XML Export SDK C Programming Guide
•
•
•
•
•
•
399
400
•
•
•
•
•
•
Appendix G Extract and Format Lotus Notes Sub Files
Element
<format>
<date>
<date_kv>
Description
Set default format for <date> and <date_kv>. This element does not set the <text> format. See
for a list of all Notes and
KeyView date and time formats and integer values
Attributes
format . (Optional. Omit to reset to defaults) Notes and
KeyView date/time format. You can set the following formats:
TD=int . Time Date format (TDFMT_*)
TS=int . Time Show format (TSFMT_*)
TT=int . Time Time format (TTFMT_*)
TZ=int . Time Zone format (TZFMT_*)
KV=int . KeyView date and time format.
where int is an integer value that corresponds to the desired format.
Separate multiple formats with commas. For example: format="TD=0,TS=2,TT=1,TZ=1,KV=55"
Output a Notes date.
Attributes
name . (Required if no parent) Name of the item to output.
format . (Optional) See
. You can set the following values:
TD
TS
TT
TZ
Output a KeyView date.
Attributes
name . (Required if no parent) Name of the item to output.
format . (Optional) See
. You can set the following values:
TZ
KV
XML Export SDK C Programming Guide
Date and Time Formats
Element
<time>
<zone>
<zone_utc>
<logo>
<image>
<image_uri>
Description
Output a time range, for example 1 hour, 30 minutes.
Attributes
name . (Required if no parent) Item name of the start date/time.
item . (Required) Item name of the end date/time.
Output a Notes time zone mnemonic, for example MST.
Attributes
name . (Required if no parent) Name of date item to output.
Output a time zone as UTC, for example (UTC-06:00).
Output the mail header logo.
The image link is output; the actual image is output to a different part of the MHTML sub-file.
Output an image.
The image link is output; the actual image is output to the
MHTML next part, as with
Output an image URI, in quotes. The actual image is output to a different part of the MHTML sub-file.
Attributes
link . (Required if no file) The image link, such as a form or title name. For example: link=”StdNotesLtr0”
file . (Required if no link) Image file name. The file must exist in the ../../templates/images directory. For example: file=”boxcheck.gif”
Date and Time Formats
This section lists the supported Notes and KeyView date/time formats for use with
<format> , <date> , and <date_kv>
.
XML Export SDK C Programming Guide
•
•
•
•
•
•
401
402
•
•
•
•
•
•
Appendix G Extract and Format Lotus Notes Sub Files
Lotus Notes Date and Time Formats
Table lists supported Lotus Notes date and time formats, and the integer
values that specify each one.
Format
TDFMT_FULL
TDFMT_CPARTIAL
TDFMT_PARTIAL
TDFMT_DPARTIAL
TDFMT_FULL4
TDFMT_CPARTIAL4
TDFMT_DPARTIAL4
TTFMT_FULL
TTFMT_PARTIAL
TTFMT_HOUR
TZFMT_NEVER
TZFMT_SOMETIMES
TZFMT_ALWAYS
TSFMT_DATE
TSFMT_TIME
TSFMT_DATETIME
TSFMT_CDATETIME
1
2
6
0
0
1
2
2
0
4
2
3
0
1
4
5
Integer
Value
1
Description
(Notes default) Year, month, and day.
Month and day, year if not this year.
Month and day.
Year and month.
Four-digit year, month, and day.
Month and day, four-digit year if not this year.
Four-digit year and month
(Notes default) Hour, minute, and second.
Hour and minute.
Hour.
(Notes default) All time zones are converted to current time zone.
Show only when outside the current time zone.
Show for all time zones.
Date.
Time.
(Notes default) Date and time.
Date and time, or time Today or time
Yesterday.
XML Export SDK C Programming Guide
Date and Time Formats
KeyView Date and Time Formats
Table lists KeyView date and time formats. The KeyView formats use the
following syntax:
Month
Weekday
Year
Day
Time
Separators
Month = full month name
Mon = abbreviated month name.
m = month (number) mm = two-digit month (leading 0)
Weekday = full weekday name
Wday = abbreviated weekday name yy = two-digit year yyyy = four-digit year d = day (number) dd = two-digit day (leading 0) h = 12-hour
H = 24-hour m = minutes s = seconds
P = AM/PM p = am/pm
_ = space c = comma s = slash a = dash o = dot
Format
12-Hour and 24-Hour Time Formats
KVDTF_P
KVDTF_P_hmm
KVDTF_hmm_P
XML Export SDK C Programming Guide
Output
P
P h:mm h:mm P
1
2
3
Integer
Value
•
•
•
•
•
•
403
Appendix G Extract and Format Lotus Notes Sub Files
404
•
•
•
•
•
•
Format
KVDTF_P_hhmm
KVDTF_hhmm_P
KVDTF_P_hhmmss
KVDTF_hhmmss_P
KVDTF_Hmm
KVDTF_HHmm
KVDTF_mmss
KVDTF_Hmmss
KVDTF_HHmmss
Numerical Date Formats with Slashes
KVDTF_mmsdd
KVDTF_msdsyy
KVDTF_mmsddsyy
KVDTF_mmsddsyyyy
KVDTF_ddsmm
KVDTF_ddsmmsyy
KVDTF_ddsmmsyy_Hmm
KVDTF_ddsmm_P_hmm
KVDTF_ddsmm_hmm_P
KVDTF_ddsmm_P_hhmm
KVDTF_ddsmm_hhmm_P
KVDTF_ddsmmsyy_P_hmm
KVDTF_ddsmmsyy_hmm_P
KVDTF_ddsmmsyy_P_hmmss
KVDTF_ddsmmsyy_hmmss_P
KVDTF_ddsmmsyy_P_hhmmss
KVDTF_ddsmmsyy_hhmmss_P
Output
P hh:mm hh:mm P
P hh:mm:ss hh:mm:ss P
H:mm
HH:mm mm:ss
H:mm:ss
HH:mm:ss mm/dd m/d/yy mm/dd/yy mm/dd/yyyy dd/mm dd/mm/yy dd/mm/yy H:mm dd/mm P h:mm dd/mm h:mm P dd/mm P hh:mm dd/mm hh:mm P dd/mm/yy P h:mm dd/mm/yy h:mm P dd/mm/yy P h:mm:ss dd/mm/yy h:mm:ss P dd/mm/yy P hh:mm:ss dd/mm/yy hh:mm:ss P
Integer
Value
8
9
10
11
12
6
7
4
5
25
26
27
28
29
21
22
23
24
17
18
19
20
13
14
15
16
XML Export SDK C Programming Guide
Format
KVDTF_yysmmsdd_P_hhmmss
KVDTF_yysmmsdd_hhmmss_P
KVDTF_msdsyy_Hmm
KVDTF_mmsddsyy_Hmm
KVDTF_msdsyy_P_hmm
KVDTF_msdsyy_hmm_P
KVDTF_mmsddsyy_hmm_P
KVDTF_mmsdd_P_hhmm
KVDTF_mmsdd_hhmm_P
KVDTF_mmsddsyy_P_hhmmss
KVDTF_mmsddsyy_hhmmss_P
KVDTF_msd
KVDTF_yysm
KVDTF_yysmm
KVDTF_yysmsd
KVDTF_yysmmsdd
KVDTF_yyyysmmsdd
Numerical Date Formats with Dashes
KVDTF_ddammayy
KVDTF_mmadd
KVDTF_mmayy
KVDTF_yyammadd
KVDTF_yyyyammadd
KVDTF_yyyyammaddaHHmmss
Numerical Date Formats with Dots
KVDTF_yyomod
KVDTF_yyommodd
XML Export SDK C Programming Guide
Output yy/mm/dd P hh:mm:ss yy/mm/dd hh:mm:ss P m/d/yy H:mm mm/dd/yy H:mm m/d/yy P h:mm m/d/yy h:mm P mm/dd/yy h:mm P mm/dd P hh:mm mm/dd hh:mm P mm/dd/yy P hh:mm:ss mm/dd/yy hh:mm:ss P m/d yy/m yy/mm yy/m/d yy/mm/dd yyyy/mm/dd dd-mm-yy mm-dd mm-yy yy-mm-dd yyyy-mm-dd yyyy-mm-dd-HH:mm:ss yy.m.d
yy.mm.dd
Date and Time Formats
47
48
49
50
51
52
53
54
Integer
Value
42
43
44
45
46
38
39
40
41
34
35
36
37
30
31
32
33
•
•
•
•
•
•
405
Appendix G Extract and Format Lotus Notes Sub Files
406
•
•
•
•
•
•
Format Output
KVDTF_mod m.d
KVDTF_mmodd mm.dd
Numerical/String Date Formats with Dashes, Commas, and Spaces
KVDTF_ddaMon
KVDTF_daMonayy
KVDTF_ddaMonayy
KVDTF_ddaMonayyyy dd-Mon d-Mon-yy dd-Mon-yy dd-Mon-yyyy
KVDTF_Mon
KVDTF_Monayy
KVDTF_Monayyyy
KVDTF_Monaddayy
KVDTF_yyammadd_P_hhmmss
KVDTF_mmadd_P_hhmm
KVDTF_Mon_yy
KVDTF_Monc_yy
Mon
Mon-yy
Mon-yyyy
Mon-dd-yy yy-mm-dd P hh:mm:ss mm-dd P hh:mm
Mon yy
Mon, yy
KVDTF_Month
KVDTF_Monthayy
KVDTF_Month_yy
KVDTF_Monthc_yy
KVDTF_Monthayyyy
KVDTF_Month_yyyy
KVDTF_Monthc_yyyy
KVDTF_Mon_dc_yyyy
KVDTF_d_Monc_yyyy
KVDTF_yyyy_Mon_d
KVDTF_Month_dc_yyyy
KVDTF_d_Monthc_yyyy
Month
Month-yy
Month yy
Month, yy
Month-yyyy
Month yyyy
Month, yyyy
Mon d, yyyy d Mon, yyyy yyyy Mon d
Month d, yyyy d Month, yyyy
XML Export SDK C Programming Guide
69
70
71
72
65
66
67
68
61
62
63
64
57
58
59
60
77
78
79
80
73
74
75
76
Integer
Value
55
56
Format
KVDTF_yyyy_Month_d
Weekday Date Formats
KVDTF_Wday
KVDTF_Weekday
KVDTF_Wdayc_Mon_dc_yyyy
KVDTF_Weekdayc_Month_dc_yyyy
KVDTF_Weekdayc_d_Monthc_yyyy
Output yyyy Month d
Wday
Weekday
Wday, Mon d, yyyy
Weekday, Month d, yyyy
Weekday, d Month, yyyy
Date and Time Formats
82
83
84
85
86
Integer
Value
81
XML Export SDK C Programming Guide
•
•
•
•
•
•
407
Appendix G Extract and Format Lotus Notes Sub Files
408
•
•
•
•
•
• XML Export SDK C Programming Guide
A PPENDIX H
Password Protected Files
This section lists supported password-protected container and non-container files and describes how to open them.
Supported Password Protected File Types
Open Password Protected Container Files
Export Password Protected Files
Supported Password Protected File Types
Table lists the password-protected file types that KeyView supports.
Symbol
S
V
Y
N
P
C
Description
Format is supported.
Format is not supported.
Support for viewing sub-files.
Support for viewing content.
Password required.
Password and certificate or User
ID file required.
XML Export SDK C Programming Guide
•
•
•
•
•
•
409
Appendix H Password Protected Files
File Type
PST (Windows)
PST (non-Windows)
1
ZIP
7-Zip
RAR
SMIME in MSG,
EML, MBX
Lotus Notes NSF
Adobe PDF
Microsoft Office
Version n/a n/a n/a n/a n/a n/a n/a n/a
97-2003
2007
2010
Filter
N
N
N
N
N
N
N
Y
Y
Export
N
N
N
N
N
N
N
Y
Y
Extract
Y
Y
Y
Y
Y
Y
Y
Y
Y
View
S
S
S
S
S
N
N
V
V
Credentials
P
P
P
N
P
C
C
P
P
1. The native PST reader, pstnsr, does not require credentials to open password-protected PST files that use Compressible Encryption.
410
•
•
•
•
•
•
Open Password Protected Container Files
This section describes how to extract password-protected container files using the
C API. The following guidelines apply to specific file types.
Lotus Notes NSF files. If you are running a Notes client with an active user connected to a Domino server, you must specify the user’s password as a credential regardless of whether the NSF files you are opening are protected.
This allows KeyView to access the Notes client and the Lotus Notes API. If the
Notes client is not running with an active user, KeyView does not require credentials to access the client.
PST files.To open password-protected PST files that use High Encryption
(Microsoft Outlook 2003 only), you must use the MAPI-based PST reader
(pstsr). The native PST reader (pstnsr) returns the error message
KVERR_PasswordProtected if a PST is encrypted with High Encryption.
XML Export SDK C Programming Guide
Export Password Protected Files
To open container files
1. Define the credential information in the KVOpenFileArg data structure. See
2. Pass KVOpenFileArg to the fpOpenFile() function. See
3. Call fpCloseFile(). See
Export Password Protected Files
This section describes how to export password-protected non-container files with the C API.
To export password-protected files
1. Call the fpInit() function. See
2. Call the KVXMLConfig() function with the following arguments (see
Argument nType nValue pData
Parameter
KVCFG_SETPASSWORD
TRUE
The source file password. The password is a null-terminated string with a maximum length of
255 characters (the final byte is null).
For example:
(*fpXMLConfig)(pKVXML, KVCFG_SETPASSWORD, TRUE, password); where password is a null-terminated string of 255 or fewer characters.
3. Call the fpConvertStream() or KVXMLConvertFile() function. See
or
.
XML Export SDK C Programming Guide
•
•
•
•
•
•
411
Appendix H Password Protected Files
412
•
•
•
•
•
• XML Export SDK C Programming Guide
Index
Symbols
$ANCHOR
$BASE
$CHARSET
$CONTENT
$ENDNOTE
$FOOTER
$FOOTNOTE
$FOOTNOTEALL
$HEADER
$MAINURL
$NAME
$NEXT
$PREV
$SPLITBLOCKNUMBER
$STYLESHEET
$SUMMARY
$SUMMARYNN
$TOC
$TOCB
$TOCBE
$TOCE
$TOCPE
$TOCTE
$TOPANCHOR
$USERCB
$USERSUMMARY
$XANCHOR
7-Zip
7-Zip reader
A
absolute text positioning
XML Export SDK C Programming Guide
Abstract Windowing Toolkit
access layer
AD1
AD1 Evidence file reader
ad1sr
ADDOCINFO
,
adInfo
adinfo.h
,
,
,
Adobe Maker Interchange Format (MIF)
reader
Adobe PDF
advanced document readers enabling in an existing installation
license information
Lotus Notes database (NSF)
Mailbox (MBX)
Microsoft Outlook Personal Folders
Advanced Systems Format (ASF)
afsr
allocating memory
Ami Pro Graphics reader
anchor
token
,
Animated cursor reader
ANSI (TXT)
Apple iChat Log
Apple iChat Log reader
Apple iWork
Keynote (GZ)
Numbers (GZ)
Pages (GZ)
Apple iWork Keynote reader
Apple iWork Numbers reader
Apple iWork Pages reader
Applix
•
•
•
•
•
•
413
Index
414
•
•
•
•
•
•
Presents (AG)
Presents reader
Spreadsheets (AS)
Spreadsheets reader
Words (AW)
Words reader
architecture
archive formats
ASCII (TXT)
reader
assr
attachment external path
Audio Interchange File Format (AIFF)
AutoCAD Drawing
Exchange Format (DXF)
format (DWG)
AutoCAD Drawing Exchange format reader
AutoCAD Drawing format reader
AutoCAD reader
automatic heading generation
awsr
AWT
See Abstract Windowing Toolkit
B
bAllowHeadingsInTables
base URL token
bEnableEmptyRows
bentofio
bForceOutputCharSet
bForceSrcCharSet
bGenerateURLs
bHardPageMakesNewBlock
bi-directional text
in PDF file
right-to-left (RTL) tag
Big Endian
,
binary files supported
bIndexOnly
BinHex
reader
bKeepServantAlive
bkfsr
block chunks
blocks
bMustBeBold
bMustBeItalic
bMustBeUnderlined
bNbspEmptyCells
bNoMultiSpaces
bNonZeroIndent
bNoTabs
bookmarks
converting to XLinks
bPutBlocksInSeparateFiles
bRasterizeFiles
bRemoveEmptyColumns
bRemoveEmptyRows
bSupportCellSpan
bSupportColumnHeadings
bSupportColumnWidth
bSupportRowHeadings
bSupportRowSpan
bUseDocumentColors
bUseDocumentFontInfo
bUseExistingStyleSheet
bUseVerityDTD
Bzip2
bzip2sr
C
C API configure XML element extraction
enable logical order for PDF files
enabling logical order for PDF files
extensible stylesheet language
extracting sub file metadata
extracting sub files
map styles
opening a file
,
running in out-of-process mode
XML Export SDK C Programming Guide
style sheets
cabsr
cache configuring
CAD. See Computer-aided design
callback functions
Cascading Style Sheets
CATIA
cbAnchorMax
cbHTML
cbmap
cbString
cebsr
character encoding supported
character entities
character set determining output
force output
force source
license information
mapping
setting during file extraction
setting source
setting target
supported
token
character styles
charset
chartbls.ux
childArray
chmdll
chmsr
chunks
CloseFile()
,
closing a file
cnv2xml sample program
cnv2xmloop sample program
Comma-Separated Values (CSV)
reader
compound documents
Computer Graphics Metafile (CGM)
XML Export SDK C Programming Guide
D writer
Computer Graphics Metafile reader
Computer-aided design
configuration options setting
ConnectRetry
ConnectRetryInterval
container files
archive
default filenames
determining number of sub file
,
example tree structure
recreating file hierarchy
,
sub file infoflag
supported
Continue()
conversion options
setting using template files
setting using the API
converting spreadsheets
XML files
–
,
ConvertStream()
,
,
Corel
CorelDraw (CDR)
Draw reader
Presentations (SHW)
Presentations reader
Quattro Pro (QPW, WB3)
cRedact
credentials defining for protected files
cReplaceChar
CSS template
csvsr
cxVectorToRasterXRes
cyVectorToRasterYRes
D
Data Interchange Format (DIF)
Data Interchange Format reader
•
•
•
•
•
•
415
Index
416
•
•
•
•
•
• dBase
dBase Database reader
dbfsr
dbxsr
DCA/RFT reader
dcasr
DCX (fax) reader
DCX Fax System
definition of terms
deleted text
detectPSTbyExtension
difsr
Digital Imaging and Communications in Medicine
directory structure
DiskCacheSize
DisplayWrite (IP)
reader
dmgsr
document readers
,
document type
,
Documentum EMCMF
Domino XML Language
Domino XML Language reader
DTD
character entities
modifying
root element
dw4sr
dwFlags
,
dxlsr
E
eClass
,
eEmptyParaType
eFormat
,
eHardPageBreakType
eKVFormat
email files supported
embedded OLE objects
,
converting using Conversion APIs
converting using File Extraction APIs
linked
naming convention
reader
writer
emlsr
emxsr
Encapsulated PostScript (EPS)
reader
encase2sr
encasesr
EndCharStyle
endnote token
ENdocAttributes
ENDocClass
ENdocFmt
,
Enhanced Windows Metafile (EMF)
reader
ENSATableBorder
entsr
eOutputLanguageID
eOutputRasterGraphicType
eOutputVectorGraphicType
epubsr
error codes
extended
,
Outlook PST
eSATableBorder
eSrcCharSet
Executable (EXE)
Expat XML parser
Expert Witness Compression Format (Encase)
Expert Witness Compression Format (EnCase) v6 reader
Expert Witness Compression Format (EnCase) v7 reader
Export Demo sample program
extended error codes
Extensible Forms Description Language
Extensible Forms Description Language reader
ExtractSubFile()
XML Export SDK C Programming Guide
F
F
file cache
configuring
file extraction extract sub file
extraction flags
extraction path
get main file information
get sub file information
get sub file metadata from mail formats
input parameters
Lotus Domino XML (DXL)
Lotus Notes database (NSF)
Mailbox (MBX)
Microsoft Outlook
Microsoft Outlook Express (EML)
Microsoft Outlook Personal Folders
output to file
output to stream
PDF file
sub file properties
ZIP file
File Extraction interface
entry point
file hierarchy
childArray
parentIndex
file time
EPOCH
filenames default for sub files
FileToInputStreamCreate()
,
FileToInputStreamFree()
FileToOutputStreamCreate()
FileToOutputStreamFree()
flags extraction flags
KVCFG_DELSOFTHYPHEN
KVCFG_DISABLEZONE
KVCFG_ENABLEPOSITIONINFO
XML Export SDK C Programming Guide
KVCFG_INCLREVISIONMARK
KVCFG_INCLTRACKCHANGES
KVCFG_LOGICALPDF
KVCFG_PG_HIDECOMMENT
KVCFG_PG_HIDEHIDDENSLIDE
KVCFG_PG_SHOWCOMMENTSSLIDE
,
KVCFG_PG_SHOWSLIDENOTES
KVCFG_SETPASSWORD
KVCFG_SETTEMPDIRECTORY
KVCFG_SETXMLCONFIGINFO
KVCFG_SS_SHOWCOMMENTS
KVCFG_SS_SHOWFORMULA
KVCFG_SS_SHOWHIDDENINFOR
KVCFG_SUPPRESSIMAGES
KVCFG_SUPPRESSTOCPRINTIMAGE
KVCFG_WP_NOCOMMENTS
KVCFG_WP_SHOWDATEFIELDCODE
,
KVCFG_WP_SHOWFILENAMEFIELDCODE
,
KVCFG_WP_SHOWHIDDENTEXT
KVExtractionFlag_CreateDir
,
KVExtractionFlag_ExcludeMailHeader
KVExtractionFlag_GetFormattedBody
KVExtractionFlag_Overwrite
,
KVExtractionFlag_SaveAsMSG
KVMainFileInfoFlag_HasContent
KVOpenFileFlag_CreateRootNode
KVSubFileExtractInfoFlag_CharsetCon verted
KVSubFileExtractInfoFlag_External
KVSubFileExtractInfoFlag_FileCreate d
KVSubFileExtractInfoFlag_FolderCrea ted
KVSubFileExtractInfoFlag_NeedsExtra ction
KVSubFileExtractInfoFlag_NonFormatt edBodyExtracted
KVSubFileInfoFlag_External
•
•
•
•
•
•
417
Index
418
•
•
•
•
•
•
KVSubFileInfoFlag_MailItem
KVSubFileInfoFlag_NeedsExtraction
KVSubFileInfoFlag_Secure
KVSubFileInfoFlag_SMIME
KVSubFileMetaInfoFlag_CharsetConver ted
main file properties
metadata
open file
sub file properties
Flash reader
Folio Flat File (FFF)
reader
foliosr
fontSizeMax
fontSizeMin
footer token
footnote token
format detection
ADDOCINFO
coding practice
,
determining format support
extracting format information
file class
KVStreamInfo
major format
major version
minor format
minor version
module
,
,
translating format information
formats
–
binary
container
container (email)
graphic
multimedia
presentations
word processing
formats_e
formats_e.ini
,
configuring file cache
converting hidden text in spreadsheets
converting MSG files directly using the MSG reader
determining document reader
enable logical order for PDF files
out-of-process configuration
formats.ini
formulas extracting from Excel files
supported Excel formula functions
Founder Chinese E-paper Basic (CEB)
Founder Chinese E-paper Basic reader
fpCloseFile()
,
,
fpContinue()
fpConvertStream()
,
,
callbacks
fpExtractSubFile()
fpFileToInputStreamCreate()
,
fpFileToInputStreamFree()
fpFileToOutputStreamCreate()
,
fpFileToOutputStreamFree()
,
fpFreeStruct()
,
fpGetAnchor()
fpGetAuxOutput()
fpGetConvertFileList()
fpGetMainFileInfo()
fpGetStreamInfo()
,
,
fpGetSubFileInfo()
fpGetSubFileMetadata()
fpGetSummaryInfo()
,
,
fpInit()
,
fpOpenFile()
,
,
fpSetStyleMapping()
fpShutDown()
,
fpValidateTemplate()
fpXMLConfig()
free File Extraction structures
FreeStruct()
,
Fujitsu Oasys (OA2)
reader
function suites
XML Export SDK C Programming Guide
G
G
generating minimal attributes
generating output with minimal markup and without images
,
,
generating output with verbose markup and without images
,
,
GetAnchor()
,
GetAuxOutput()
GetConvertFileList()
GetMainFileInfo()
GetStreamInfo()
,
,
GetSubFileInfo()
GetSubFileMetadata()
GetSummaryInfo()
,
glossary
Graphic Interchange Format (GIF)
,
reader
graphics displaying vector graphics on Windows
setting resolution
supported
suppressing
,
GroupWise FileSurf
GroupWise FileSurf reader
gwfssr
GZIP
reader
H
Hangul (HWP)
Hangul 2002, 2005, 2007 reader
header files
header token
heading generation
headingCreateType
Health level7
Health level7 reader
hidden data
,
Excel comments
formulas
,
hidden information
,
PowerPoint comments
comments slides
,
hidden slides
,
slide notes
toggle output
Word comments
date field codes
file name field codes
,
hidden text
hidden text converting in spreadsheets
hl7sr
HTML
reader
HTML (MIME)
htmlexport
htmsr
hwposr
hyphenation
,
I
I/O model
IBM DCA/RFT (Revisable Form Text) (DC)
ichatsr
icssr
index mode
and hyphenation
index template
initialization function
,
input streams
creating
extracting metadata
freeing
,
KVInputStream
installation directory structure
error messages
ISO
ISO-9660 CD Disc Image Format reader
XML Export SDK C Programming Guide
•
•
•
•
•
•
419
Index
420
•
•
•
•
•
• isosr
iwsssr
iwwpsr
K
kp3dwrld
kpagrdr
kpanirdr
kpbmprdr
kpbmpwrt
kpcdrrdr
kpcgmrdr
kpcgmwrt
kpchtrdr
kpdcxrdr
kpDWGrdr
kpDXFrdr
kpemfrdr
kpepsrdr
kpgifrdr
kpicordr
kpifcnvt
J
Java API extensible stylesheet language
using style sheets
Java archive
javadoc
JBIG2
JBIG2 reader
jp2000sr
JPEG
reader
writer
JPEG 2000
,
JPEG 2000 metadata reader
JPEG 2000 reader
jtdsr
JustSystems Ichitaro (JTD)
reader
kpifutil
kpIWPGrdr
kpJAVwrt
kpjbig2rdr
kpjp2000rdr
kpjpeg
kpjpgrdr
kpjpgwrt
kpmacrdr
kpmsordr
kpnbmprdr
kpODArdr
kpodfrdr
kpONErdr
kpp40rdr
kpp95rdr
kpp97rdr
kppctrdr
kppcxrdr
kppdf2rdr
kppdfrdr
kppicrdr
kppng
kppngrdr
kppngwrt
kpppxrdr
kpprerdr
kpprzrdr
kpsdwrdr
kpsgirdr
kpSHWrdr
kpsunrdr
kptgardr
kptifrdr
kpvsdrdr
kpwg2rdr
kpwmfrdr
kpwmfwrt
kpwpgrdr
kpxfdlrdr
KV_Bool
KV_ClipBoard
XML Export SDK C Programming Guide
K
KV_DateTime
KV_IEEE8
,
KV_Int4
KV_Other
KV_String
KV_Unicode
kv.lic
updating in existing installation
KVCFG_DELSOFTHYPHEN
KVCFG_DISABLEZONE
KVCFG_ENABLEPOSITIONINFO
KVCFG_INCLREVISIONMARK flag
KVCFG_INCLTRACKCHANGES flag
KVCFG_LOGICALPDF
KVCFG_LOGICALPDF flag
KVCFG_PG_HIDECOMMENT flag
KVCFG_PG_HIDEHIDDENSLIDE flag
KVCFG_PG_SHOWCOMMENTSSLIDE flag
KVCFG_PG_SHOWSLIDENOTES flag
KVCFG_SETPASSWORD flag
KVCFG_SETTEMPDIRECTORY
KVCFG_SETXMLCONFIGINFO
KVCFG_SS_SHOWCOMMENTS flag
KVCFG_SS_SHOWFORMULA flag
KVCFG_SS_SHOWHIDDENINFOR flag
KVCFG_SUPPRESSIMAGES
KVCFG_SUPPRESSTOCPRINTIMAGE
KVCFG_WP_NOCOMMENTS flag
KVCFG_WP_SHOWDATEFIELDCODE flag
KVCFG_WP_SHOWFILENAMEFIELDCODE flag
KVCFG_WP_SHOWHIDDENTEXT flag
KVCharSet
,
KVCredential
KVCredentialComponent
KVEPT_EMPTY
KVEPT_SUPPRESS
KVEPT_VERBOSE
KVERR_ADSNotFound
KVERR_ArchiveFatalError
KVERR_ArchiveFileNotFound
KVERR_AutoDetFail
KVERR_AutoDetNoFormat
KVERR_badInputStream
KVERR_badOutputType
KVERR_ChildTimeOut
KVERR_CreateOutputFileFailed
KVERR_CreateProcessFailed
KVERR_CreateTempFileFailed
KVERR_DLLNotFound
KVERR_ErrorWritingToOutputFile
KVERR_FormatNotSupported
KVERR_General
KVERR_NoReader
KVERR_OutOfCore
KVERR_PasswordProtected
KVERR_processCancelled
KVERR_ReaderInitError
KVERR_SUCCESS
KVERR_WaitForChildFailed
KVError_CompressionNotSupported
KVError_GPF
KVError_InputFileNotFound
KVError_InterfaceFunctionNotFound
KVError_InvalidArgs
KVError_InvalidOopDriverSignature
KVError_InvalidOopServiceSignature
KVError_IPCTimeOut
KVError_KVoopLogFailed
KVError_MemoryLeak
KVError_MemoryOverwrite
KVError_OopBadConfig
KVError_OopBrokenPipe
KVError_OopCore
KVError_OopPipeOEF
KVError_OpenOutputFileFailed
KVError_OpenStreamFailure
KVError_OutputFileExists
KVError_OverNestedFileLimit
KVError_PasswordRequired
KVError_PSTAccessFailed
KVError_ReaderUsageDenied
KVError_ZeroFile
XML Export SDK C Programming Guide
•
•
•
•
•
•
421
Index
422
•
•
•
•
•
•
KVErrorCode
KVErrorCodeEx
KVExtractInterface
KVExtractionFlag_CreateDir
,
KVExtractionFlag_ExcludeMailHeader
KVExtractionFlag_GetFormattedBody
KVExtractionFlag_Overwrite
,
KVExtractionFlag_SaveAsMSG
KVExtractSubFileArg
KVFileType_Main
KVGetExtractInterface()
,
KVGetSubFileMetaArg
KVGFX_CGM
KVGFX_GIF
KVGFX_JAVA
KVGFX_JPEG
KVGFX_PNG
KVGFX_WMF
kvgraph
kvgzsr
KVHC_CreateHeadingsAlways
KVHC_DocHeadingsOnly
KVHeadingCreateOptions
KVHPBT_EMPTY
KVHPBT_EMPTYID
KVHPBT_ID
KVHPBT_SUPPRESS
kvhqxsr
KVInputStream
,
,
KVMainFileInfo
KVMainFileInfoFlag_HasContent
,
KVMemoryStream
KVMetadata_Binary
KVMetadata_Bool
KVMetadata_DateTime
KVMetadata_Double
KVMetadata_Float
KVMetadata_Int4
KVMetadata_Int8
KVMetadata_String
KVMetadata_UInt4
KVMetadata_UInt8
KVMetadata_Unicode
KVMetadata_Unknown
KVMetadataElem
KVMetadataType
KVMetaName
kvolefio
KVOpenFileArg
,
KVOpenFileFlag_CreateRootNode
,
KVOutputStream
kvpie
kvradar
kvraster.class
,
KVSTR
KVStreamInfo
KVStructHead
KVStructInit
KVStyle
,
KVSTYLE_DELETECONTENT
KVSTYLE_HEADING[1-6]
KVSTYLE_ONCONSECUTIVEPARAGRAPHS
KVSTYLE_ORDERLIST
KVSTYLE_PRE
KVSTYLE_REDACT
KVSTYLE_UNORDEREDLIST
KVSubFileExtractInfo
KVSubFileExtractInfoFlag_CharsetConver ted
KVSubFileExtractInfoFlag_External
,
KVSubFileExtractInfoFlag_FileCreated
KVSubFileExtractInfoFlag_FolderCreated
KVSubFileExtractInfoFlag_NeedsExtracti on
KVSubFileExtractInfoFlag_NonFormattedB odyExtracted
KVSubFileInfo
KVSubFileInfoFlag_External
,
embedded objects in PowerPoint
KVSubFileInfoFlag_MailItem
XML Export SDK C Programming Guide
L
KVSubFileInfoFlag_NeedsExtraction
,
KVSubFileInfoFlag_Secure
KVSubFileInfoFlag_SMIME
KVSubFileMetaData
KVSubFileMetaInfoFlag_CharsetConverted
KVSubFileType_Attachment
KVSubFileType_Folder
KVSubFileType_Main
KVSubFileType_OLE2
KVSumInfoElemEx
KVSumInfoType
KVSummaryInfoEx
KVSumType
,
KVT_ZONE token
kvtypes.h
,
kvutil
kvVector.class
kvvector.jar
,
kvxconfig.ini
,
,
and xmlini sample program
KVXConfigInfo
kvxml
KVXML library
kvxml.h
KVXMLAnchorType
KVXMLCallbacks
KVXMLConfig
KVXMLConfig()
export password-protected files
KVXMLConvertFile()
callbacks
KVXMLEmptyParaType
KVXMLEndOOPSession()
KVXMLGetInterface
KVXMLGetInterface()
KVXMLGraphicType
KVXMLHardPageBreakType
KVXMLHeadingInfo
KVXMLInit()
KVXMLInterface
,
KVXMLOptions
,
KVXMLSetStyleSheet
KVXMLStartOOPSession()
KVXMLStyleSheetType
KVXMLTemplate
,
KVXMLTOCOptions
kvxpgsa
kvxsssa
kvxtract
kvxtract.h
,
kvxwpsa
kvzeesr
kwad
,
,
L
l123sr
language detection license information
lasr
lcbBlockSize
lcbFilesize
lcbMaxMemUsage
Legato EMailXtender Archive
Legato EMailXtender archive (EMX) reader
Legato Extender
Libraries
license information enabling a full version
kv.lic
Link Library (DLL)
ListenerPortList
ListenerTimeout
logical reading order direction flags
PDF file
Lotus
1-2-3
(123)
(WK4)
Charts (123)
V2 to 5 reader
V96/97/98 reader
AMI Draw Graphics (SDW)
XML Export SDK C Programming Guide
•
•
•
•
•
•
423
Index
424
•
•
•
•
•
•
AMI Pro (SAM)
reader
AMI Professional Write Plus
Domino XML (DXL) file extraction
Freelance Graphics (PRE)
96/97/98 reader
reader
Freelance Graphics (SDW)
Notes embedded image reader
Notes database license information
Notes database (NSF)
,
file extraction
installation and configuration
licensing
reader
system requirements
Pic (PIC)
SmartMaster (MWP)
Word Pro
Word Pro (LWP)
reader
LPDF_AUTO
,
LPDF_DIRECTION
LPDF_LTR
,
LPDF_RAW
,
LPDF_RTL
,
lVersion
,
lwpsr
M
Mac Disk Copy Disk Image
Mac Disk Copy Disk Image File reader
MacBinary
MacBinary reader
macbinsr
Macintosh Picture (PICT) reader
Macintosh Raster (PICT/PCT)
MacPaint (PNTG)
reader
Macromedia Flash (SWF)
reader
mail default list of metadata
extracting metadata
metadata
Mailbox license information
Mailbox (MBX)
,
file extraction
licensing
reader
main file get information
main URL token
MAPI
,
ATTACH_BY_REF_ONLY
ATTACH_BY_REF_RESOLVE
ATTACH_BY_REFERENCE
attachment methods
mapidefs.h
mapitags.h
PR_ATTACH_LONG_PATHNAME
PR_ATTACH_METHOD
PR_ATTACH_PATHNAME
property tag
supported property types
MAPI-based PST reader
mapping styles
MarkUpEnd
MarkUpStart
maximum memory
maxParaLen
mbsr
mbxsr
mdbsr
memory allocation
memory management
metadata
custom metadata in PDF
data types
extracting
XML Export SDK C Programming Guide
M extracting default mail metadata
,
extracting default mail metadata set
extracting from mail formats
,
extracting from PST files
extracting mail metadata as text
field names
non-standard
sample program
standard
token
,
metaNameArray
metaNameCount
Microsoft
Access
Access (MDB) reader
Drawing Objects reader
Excel
2007 XML reader
Binary Format
Charts (XLS)
Macintosh (XLS)
Windows (XLS)
Windows (XLSX)
Windows XML format (XLS)
Excel (XLS) converting formulas
reader
supported formula functions
OneNote
OneNote reader
Outlook
,
file extraction
metadata fields
Outlook (MSG) convert directly using the MSG reader
reader
Outlook Express
file extraction
Outlook Express (EML)
reader
Outlook Personal Folders
attachment methods
detect by extension
error codes
extracting metadata
file extraction
KVErrorPasswordRequired
license information
licensing
MAPI-based reader
native and MAPI-based reader
native reader
system requirements
Outlook Personal Folders (PST)
MAPI-based reader
native reader
pstnsr
pstsr.dll
PowerPoint
2007 XML reader
embedded objects
Macintosh (PPT)
PC (PPT)
Windows (PPT)
Windows (PPTX)
Project
Project (MPP) reader
Rich Text Format (RTF) reader
Visio
XML format (VDX)
Visio (VSD) reader
Wave Sound (WAV)
Windows Bitmap (BMP)
Windows Write (WRI)
Word
2007 XML reader.
6/95 reader
97, 2000, XP reader
DOS reader
XML Export SDK C Programming Guide
•
•
•
•
•
•
425
Index
426
•
•
•
•
•
•
Mac reader
Macintosh (DOC)
PC (DOC)
V2 reader
Windows (DOC)
Windows (DOCX)
Windows XML format (DOC)
Works
(WPS)
6, 2000 reader
Spreadsheet (S30,S40)
Spreadsheet reader
V1 and 2 reader
Write reader
Microsoft Backup File
Microsoft Backup File reader
Microsoft Cabinet format
Microsoft Cabinet format reader
Microsoft Compiled HTML Help
Microsoft Compiled HTML Help reader
Microsoft Compressed Folder
Microsoft Entourage Database
Microsoft Entourage Database Format reader
Microsoft Office 2007 Excel Binary Format reader
Microsoft Office Drawing
Microsoft OneNote
reader
Microsoft Outlook DBX
Microsoft Outlook Express DBX reader
Microsoft Outlook for Macintosh
Microsoft Outlook for Macintosh reader
Microsoft Outlook iCalendar
Microsoft Outlook iCalendar reader
Microsoft Outlook Offline Storage File
Microsoft Outlook Offline Storage File reader
Microsoft Outlook vCard Contact
Microsoft Outlook vCard Contact reader
Microsoft Publisher
,
Microsoft Publisher reader
Microsoft Visio reader
MIDI (MID)
mifsr
MIME HTML
minParaLen
misr
MP3 files
reader
mp3sr
MPEG-1
Audio layer 3 (MP3)
Video (MPG)
MPEG-2 Audio (MPEGA)
MPEG-4 Audio
mppsr
MSBLSB byte order
,
mscomctl.ocx
msgsr
mspubsr
msvbvm60
MSVCP60.dll
msvcrt
msw6sr
mswsr
multi-byte support
multimedia files supported
mw6sr
mw8sr
mwsr
mwssr.dll
mwxsr
N
namespace
native PST reader
nCompressionQuality
nElem
NeXT/Sun Audio (AU)
non-standard metadata
nRowsBeforeSplit
nsfsr
nSpaceAfter
nSpaceBefore
XML Export SDK C Programming Guide
O nTableBorderWidth
numSubFiles
O
oa2sr
OASIS
Open Document Format (ODP)
Open Document Format (ODS)
Open Document Format (ODT)
ODF presentation reader
ODF spreadsheets reader
ODF word processing reader
odfsssr
odfwpsr
oleaut32
olepro32
olesr
olmsr
Omni Graffle
Omni Outliner
Omni Outliner reader
oo3sr
Open Publication Structure eBook
Open Publication Structure eBook reader
OpenFile()
,
opening a file
,
OpenOffice
Calc
Impress
out of process configuration
conversions
"keep servant active" option
sample program
temporary files
output stream
KVOutputStream
output streams
auxiliary
XML Export SDK C Programming Guide creating
freeing
,
KVOutputStream
P
page number token
paragraph styles
parentIndex
password-protected files
export
extract
supported file types
PC Paintbrush (PCX)
reader
pCallbacks
pCallingContext
pcHTML
pcString
PDF file absolute positioning of text
,
–
configuration options
converting bi-directional text
converting PDFs with images
direction flags
enable logical order for PDF files in C API
enable logical order for PDF files in formats_e.ini
enabling logical order in C API
extracting custom metadata
file extraction
generating XLinks
graphic-based reader
high-fidelity graphic-based reader
logical reading order
pdfsr.ini
reader
specifying paragraph direction
specifying text flow in cnv2xml sample program
structured text stream
unstructured text stream
•
•
•
•
•
•
427
Index
428
•
•
•
•
•
• pdfsr
pdfsr.ini
pElem
pffsr
Pictor PC Paint format (PIC) reader
PKZIP (ZIP)
Portable Network Graphics (PNG)
reader
writer
PowerPoint
95 reader
97 reader
reader
presentations setting resolution
supported
process_images_with_min_height
process_images_with_min_width
pstnsr
pstsr.dll
,
pszBaseURL
pszChunkTemplate
pszDefaultOutputDirectory
pszEndBlock
pszExContent
pszExMeta
pszFirstH1End
pszFirstH1Start
pszH[2..6]XML
pszInAttribute
pszInContent
pszInMeta
pszJavaURL
pszLastH1End
pszLastH1Start
pszMainBottom
pszMainTop
pszMainURL
pszMiddleH1End
pszMiddleH1Start
pszPicPath
pszPicURL
pszRoot
pszStartBlock
pszStyleSheet
pszTOC_H[1..6]
pszTOCH[1..6]End
pszTOCH[1..6]LeafNode
pszTOCH[1..6]Start
pszUserSummary
pszXEndBlock
pszXFile
pszXStartBlock
Q
qpssr
Quattro Pro Spreadsheet reader
QuickTime Movie (QT/MOV)
R
RAR Archive (RAR)
reader
rarsr
RasterPictureAnchor
RasterPictureAnchorEx
reader initialization error
redacted (hidden) text
redistributable files
regsvr32.exe
resolution presentations
revision marks
revision tracking information
Rich Text Format (RTF)
root element
root node
,
creating
rtfsr
S
SA_BaseOnDocument
SA_Border
SA_NoBorder
XML Export SDK C Programming Guide
T sample program cnv2xml
cnv2xmloop
,
Export Demo
,
metadata
,
tstxtract
,
xmlcallback
,
xmlindex
,
xmlini
xmlmulti
xmlonefile
,
sample template for C API
secured NSF Files
secured PST Files
servant.exe
ServantName
SetStyleMapping
SetStyleMapping()
SGI RGB
Image
reader
ShutDown()
,
single file for presentation template
single file template
single file with TOC template
Skype Log
Skype log file reader
skypesr
sosr
spreadsheets converting
converting headers and footers
converting hidden rows and columns
standard metadata
StarOffice
Calc
Impress
stderr
streams
auxiliary output
input
XML Export SDK C Programming Guide
KVInputStream
KVOutputStream
output
structured access layer
style sheets
token
StyleName
styles mapping
STYLESHEET_DISABLED
sub file external path to
extract
,
extract metadata
get information
summary information
,
extracting
token
,
Sun Raster Image (RS)
reader
supported formats
suppressing graphics
,
swfsr
szExContentElement
szExMetaElement
szInAttribute
szInContentElement
szInMetaElement
szRoot
T
table border
table of contents generating
token
Tagged Image File Format (TIFF)
reader
Tape Archive (TAR)
reader
tarsr
TempFilePath
TempFileSizeMark
•
•
•
•
•
•
429
Index
430
•
•
•
•
•
• template
C sample
css
index
single file
single file for presentations
single file with TOC
template file
map styles
setting conversion options
temporary files out of process
terms
defined
Text Mail (MIME)
threads
tnefsr
token
,
–
anchor
base URL
character set
endnote
footer
footnote
header
main URL
metadata
page number
style sheet
table of contents
user callback
zone
token buffer
Track Changes
Transfer Neutral Encapsulation Format (TNEF)
Transfer Neutral Encapsulation Format reader
Truevision Targa (TGA)
,
reader
tstxtract sample program
txtcnv
U
ulAttributes
Unicode reader
text
Unicode HTML
Unicode HTML reader
unihtmsr
unisr
UNIX converting graphics on
UNIX Compress
reader
unzip
URL base
main
user callback function
token
UserCB()
,
uudsr
UUEncoding (UUE)
reader
V
ValidateTemplate()
vcfsr
vector graphics converting
VectorPictureAnchor
verbose markup
Verity Document Type Definition
Visio reader
vsdsr
W
W3C
WaitForConnectionTime
WaitForConvert
Windows
XML Export SDK C Programming Guide
Animated Cursor (ANI)
,
Bitmap (BMP) reader
writer
bitmap (BMP)
Icon Cursor
icon reader
Metafile (WMF) reader
writer
metafile (WMF)
,
Video (AVI)
Windows Scrap File
WinZIP (ZIP)
Wireless Markup Language
wkssr
WML
word processing files supported
WordPad
WordPerfect
6.x to 10.x reader
Graphics 1 (WPG)
Graphics 2 (WPG)
Graphics reader
Linux
Macintosh
MacIntosh reader
reader
Windows (WO)
wosr
wp6sr
wpmap
wpmsr
X
XHTML
detection
reader
xlsbsr
xlssr
xlsxsr
XML Export SDK C Programming Guide
X
XML and format ID
configuration flag
configuring custom document type
converting
converting using xmlini sample program
Expat XML parser
extracting elements
generic
kvxconfig.ini
modifying element extraction settings
namespace
Paper Specification
reader
root element
writers
XML Export API functions
XML Paper Specification reader
XML Style Language Transformation
xml_css.ini
xml_index.ini
,
xml1file_pg.ini
xml1file.ini
xml1filetoc.ini
xmlcallback sample program
,
xmlcnv
XMLConfig()
xmlexport
xmlindex sample program
xmlini sample program
xmlmulti sample program
xmlonefile sample program
xmlsh
xmlsr
xpssr
XSLT
XyWrite
reader
xywsr
Y
Yahoo! Instant Messenger
•
•
•
•
•
•
431
Index
Yahoo! Instant Messenger reader
yimsr
Z
z7zsr
Zip archive
reader
ZIP file extraction
zone disable creation of
elements
432
•
•
•
•
•
• XML Export SDK C Programming Guide
advertisement
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Related manuals
advertisement
Table of contents
- 27 Overview
- 28 Features
- 29 Platforms, Compilers and Dependencies
- 31 Package Contents
- 32 License Information
- 33 Enable Advanced Document Readers
- 33 Update License Information
- 34 Directory Structure
- 36 Definition of Terms
- 40 Architectural Overview
- 42 Memory Abstraction
- 42 Enhance Performance
- 42 File Caching
- 43 Convert Files Out of Process
- 44 Configure Out-of-Process Conversions
- 46 Run Export Out of Process—Overview
- 47 Run Export Out of Process in the C API
- 50 Convert Files
- 51 Sub File Extraction
- 52 Convert Outlook Email without Using the Extraction API
- 52 Set Conversion Options
- 53 Set Conversion Options Using the API
- 53 Set Conversion Options Using the Template Files
- 53 Templates
- 56 Use the Export Demo Program
- 57 Change Input/Output Directories
- 57 Set Configuration Options
- 58 Suppress Imagesn
- 58 Using PDF Position Information
- 58 Convert Files
- 59 Use the C-Language Implementation of the API
- 60 Input/Output Operations
- 60 Convert Files
- 62 Multi-threaded Conversions
- 63 Use the Verity Document Type Definition (DTD)
- 63 Use XML Style Language Transformation (XSLT)
- 64 Add Elements and Attributes to the DTD
- 64 Move the DTD
- 68 Introduction
- 69 Extract Sub Files
- 70 Recreate a File’s Hierarchy
- 70 Create a Root Node
- 71 Recreate a File’s Hierarchy—Example
- 72 Extract Mail Metadata
- 72 Default Metadata Set
- 73 Extract the Default Metadata Set
- 74 Microsoft Outlook (MSG) Metadata
- 75 Extract MSG-Specific Metadata
- 76 Microsoft Outlook Express (EML) and Mailbox (MBX) Metadata
- 76 Extract EML- or MBX-Specific Metadata
- 77 Lotus Notes Database (NSF) Metadata
- 77 Extract NSF-Specific Metadata
- 78 Microsoft Personal Folders File (PST) Metadata
- 78 MAPI Properties
- 79 Extract PST-Specific Metadata
- 80 Exclude Metadata from the Extracted Text File
- 80 Extract Sub Files from Outlook Files
- 80 Extract Sub Files from Outlook Express Files
- 81 Extract Sub Files from Mailbox Files
- 81 Extract Sub Files from Outlook Personal Folders Files
- 82 Use the Native or MAPI-based Reader
- 83 Use the Native PST Reader (pstnsr)
- 83 Use the MAPI Reader (pstsr)
- 84 MAPI Attachment Methods
- 85 Open Secured PST Files
- 85 Detect PST Files While the Outlook Client is Running
- 85 Extract Sub Files from Lotus Domino XML Language Files
- 86 Extract Sub Files from Lotus Notes Database Files
- 87 System Requirements
- 87 Installation and Configuration
- 89 Open Secured NSF Files
- 89 Format Note Sub Files
- 90 Extract Sub Files from PDF Files
- 90 Extract Embedded OLE Objects
- 91 Extract Sub Files from ZIP Files
- 91 Default Filenames for Extracted Sub Files
- 91 Default Filename for Mail Formats
- 92 Default Filename for Embedded OLE Objects
- 96 Extract Metadata
- 96 Extract Metadata Using the API
- 96 Extract Metadata Using a Template File
- 99 Extract File Format Information
- 99 Convert Character Sets
- 99 Determine the Character Set of the Output Text
- 100 Guidelines for Character Set Conversion
- 101 Examples of Character Set Conversion
- 102 Document Character Set Can be Determined
- 103 Document Character Set Cannot be Determined
- 103 Set the Character Set During Conversion
- 104 Set the Character Set During File Extraction from a Container
- 104 Map Styles
- 108 Use Style Sheets
- 108 Use Extensible Style Sheet Language (XSL)
- 108 Use Cascading Style Sheets (CSS)
- 109 Display Vector Graphics on UNIX and Linux
- 110 Convert Revision Tracking Information
- 112 Convert PDF Files
- 112 Convert PDF Files to a Logical Reading Order
- 112 Logical Reading Order and Paragraph Direction
- 113 Enable Logical Reading Order
- 115 Control Hyphenation
- 116 Improve Performance for PDFs with Many Small Images
- 116 Extract Custom Metadata from PDF Files
- 117 Convert Spreadsheet Files
- 117 Convert Hidden Text in Microsoft Excel Files
- 117 Convert Headers and Footers in Microsoft Excel 2003 Files
- 118 Specify Date and Time Format on UNIX Systems
- 118 Extract Microsoft Excel Formulas
- 120 Convert XML Files
- 121 Configure Element Extraction for XML Documents
- 122 Modify Element Extraction Settings
- 123 Modify Element Extraction Settings in the kvxconfig.ini File
- 125 Specify an Element’s Namespace and Attribute
- 125 Add Configuration Settings for Custom XML Document Types
- 126 Show Hidden Data
- 126 Hidden Data in Microsoft Documents
- 127 Toggle Word Comment Settings in the formats_e.ini File
- 128 Toggle PowerPoint Slide Note Settings in the formats_e.ini File
- 129 Show Hidden Data
- 129 Hidden Data in Microsoft Documents
- 130 Toggle Word Comment Settings in the formats_e.ini File
- 130 Toggle PowerPoint Slide Note Settings in the formats_e.ini File
- 133 Introduction
- 134 C Sample Programs
- 135 Compile the Visual Basic Sample Program
- 135 tstxtract
- 136 cnv2xml
- 137 cnv2xmloop
- 138 metadata
- 138 xmlindex
- 139 xmlini
- 140 Use Style Sheets with xmlini
- 141 xmlcallback
- 141 xmlonefile
- 142 xmlmulti
- 142 Export Demo
- 146 KVGetExtractInterface()
- 147 fpCloseFile()
- 148 fpExtractSubFile()
- 150 fpFreeStruct()
- 151 fpGetMainFileInfo()
- 153 fpGetSubFileInfo()
- 155 fpGetSubFileMetaData()
- 157 fpOpenFile()
- 160 KVCredential
- 161 KVCredentialComponent
- 162 KVExtractInterface
- 163 KVExtractSubFileArg
- 167 KVGetSubFileMetaArg
- 169 KVMainFileInfo
- 171 KVMetadataElem
- 172 KVMetaName
- 173 KVOpenFileArg
- 175 KVOutputStream
- 176 KVSubFileExtractInfo
- 178 KVSubFileInfo
- 181 KVSubFileMetaData
- 185 KVXMLGetInterface()
- 186 fpConvertStream()
- 189 fpFileToInputStreamCreate()
- 190 fpFileToInputStreamFree()
- 191 fpFileToOutputStreamCreate()
- 192 fpFileToOutputStreamFree()
- 193 fpGetAnchor()
- 195 fpGetConvertFileList()
- 196 fpGetStreamInfo()
- 197 fpGetSummaryInfo()
- 199 fpInit()
- 201 fpSetStyleMapping()
- 203 fpShutDown()
- 204 fpValidateTemplate()
- 205 KVXMLConfig()
- 207 Configuration Flags
- 214 KVXMLConvertFile()
- 217 KVXMLEndOOPSession()
- 219 KVXMLSetStyleSheet()
- 221 KVXMLStartOOPSession()
- 226 Introduction
- 227 Continue()
- 228 GetAnchor()
- 230 GetAuxOutput()
- 232 UserCB()
- 234 ADDOCINFO
- 235 KVInputStream
- 236 KVMemoryStream
- 237 KVOutputStream
- 238 KVSTR
- 239 KVStreamInfo
- 240 KVStructHead
- 241 KVStyle
- 243 KVSumInfoElemEx
- 244 KVSummaryInfoEx
- 245 KVXConfigInfo
- 247 KVXMLCallbacks
- 248 KVXMLHeadingInfo
- 251 KVXMLInterface
- 253 KVXMLOptions
- 262 KVXMLTemplate
- 267 KVXMLTOCOptions
- 270 Introduction
- 271 ENSATableBorder
- 271 KVCredKeyType
- 272 KVErrorCode
- 274 KVErrorCodeEx
- 277 KVXMLStyleSheetType
- 278 KVXMLAnchorType
- 279 KVXMLGraphicType
- 280 KVHeadingCreateOptions
- 281 KVXMLEmptyParaType
- 282 KVXMLHardPageBreakType
- 283 KVMetadataType
- 285 KVMetaNameType
- 285 KVSumInfoType
- 286 KVSumType
- 290 LPDF_DIRECTION
- 294 Supported Formats
- 295 Archive Formats
- 296 Binary Format
- 297 Computer-Aided Design Formats
- 298 Database Formats