DataFlux Integration Server

DataFlux Integration Server
Leader in Data Quality
and Data Integration
www.dataflux.com
877–846–FLUX
International
+44 (0) 1753 272 020
This page is intentionally blank
DataFlux Integration Server
User’s Guide
Version 8.2.1
January 20, 2010
This page is intentionally blank
DataFlux - Contact and Legal Information
Contact DataFlux
Corporate Headquarters
DataFlux Corporation
940 NW Cary Parkway, Suite 201
Cary, NC 27513-2792
Toll Free Phone: 1-877-846-FLUX (3589)
Toll Free Fax: 1-877-769-FLUX (3589)
Local Telephone: 1-919-447-3000
Local Fax: 1-919-447-3100
Web: www.dataflux.com
European Headquarters
DataFlux UK Limited
59-60 Thames Street
WINDSOR
Berkshire
SL4 ITX
United Kingdom
UK (EMEA): +44(0) 1753 272 020
Contact Technical Support
Phone: 919-531-9000
Email: techsupport@dataflux.com
Web: http://www.dataflux.com/Resources/DataFlux-Resources/Customer-CarePortal/Technical-Support.aspx
Legal Information
Copyright © 1997 — 2009 DataFlux Corporation LLC, Cary, NC, USA. All Rights Reserved.
DataFlux and all other DataFlux Corporation LLC product or service names are registered
trademarks or trademarks of, or licensed to, DataFlux Corporation LLC in the USA and other
countries. ® indicates USA registration.
Apache Portable Runtime License Disclosure
Copyright © 2008 DataFlux Corporation LLC, Cary, NC USA.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file
except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the
License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the specific language governing
permissions and limitations under the License.
Apache/Xerces Copyright Disclosure
The Apache Software License, Version 1.1
Copyright © 1999-2003 The Apache Software Foundation. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted
provided that the following conditions are met:
1.
Redistributions of source code must retain the above copyright notice, this list of
conditions and the following disclaimer.
DataFlux Integration Server User's Guide
i
2.
Redistributions in binary form must reproduce the above copyright notice, this list of
conditions and the following disclaimer in the documentation and/or other materials
provided with the distribution.
3.
The end-user documentation included with the redistribution, if any, must include the
following acknowledgment:
"This product includes software developed by the Apache Software Foundation
(http://www.apache.org/)."
Alternately, this acknowledgment may appear in the software itself, if and wherever
such third-party acknowledgments normally appear.
4.
The names "Xerces" and "Apache Software Foundation" must not be used to endorse or
promote products derived from this software without prior written permission. For
written permission, please contact apache@apache.org.
5.
Products derived from this software may not be called "Apache", nor may "Apache"
appear in their name, without prior written permission of the Apache Software
Foundation.
THIS SOFTWARE IS PROVIDED "AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE APACHE
SOFTWARE FOUNDATION OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
This software consists of voluntary contributions made by many individuals on behalf of the
Apache Software Foundation and was originally based on software copyright © 1999,
International Business Machines, Inc., http://www.ibm.com. For more information on the
Apache Software Foundation, please see http://www.apache.org/.
DataDirect Copyright Disclosure
Portions of this software are copyrighted by DataDirect Technologies Corp., 1991 2008.
Expat Copyright Disclosure
Part of the software embedded in this product is Expat software.
Copyright © 1998, 1999, 2000 Thai Open Source Software Center Ltd.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software
and associated documentation files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or
substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
ii
DataFlux Integration Server User's Guide
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
gSOAP Copyright Disclosure
Part of the software embedded in this product is gSOAP software.
Portions created by gSOAP are Copyright © 2001-2004 Robert A. van Engelen, Genivia inc. All
Rights Reserved.
THE SOFTWARE IN THIS PRODUCT WAS IN PART PROVIDED BY GENIVIA INC AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Microsoft Copyright Disclosure
Microsoft®, Windows, NT, SQL Server, and Access, are either registered trademarks or
trademarks of Microsoft Corporation in the United States and/or other countries.
Oracle Copyright Disclosure
Oracle, JD Edwards, PeopleSoft, and Siebel are registered trademarks of Oracle Corporation
and/or its affiliates.
PCRE Copyright Disclosure
A modified version of the open source software PCRE library package, written by Philip Hazel
and copyrighted by the University of Cambridge, England, has been used by DataFlux for
regular expression support. More information on this library can be found at:
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/.
Copyright © 1997-2005 University of Cambridge. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted
provided that the following conditions are met:
•
Redistributions of source code must retain the above copyright notice, this list of
conditions and the following disclaimer.
•
Redistributions in binary form must reproduce the above copyright notice, this list of
conditions and the following disclaimer in the documentation and/or other materials
provided with the distribution.
•
Neither the name of the University of Cambridge nor the name of Google Inc. nor the
names of their contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
DataFlux Integration Server User's Guide
iii
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
OF SUCH DAMAGE.
Red Hat Copyright Disclosure
Red Hat® Enterprise Linux®, and Red Hat Fedora™ are registered trademarks of Red Hat,
Inc. in the United States and other countries.
SQLite Copyright Disclosure
The original author of SQLite has dedicated the code to the public domain. Anyone is free to
copy, modify, publish, use, compile, sell, or distribute the original SQLite code, either in source
code form or as a compiled binary, for any purpose, commercial or non-commercial, and by
any means.
Sun Microsystems Copyright Disclosure
Java™ is a trademark of Sun Microsystems, Inc. in the U.S. or other countries.
Tele Atlas North American Copyright Disclosure
Portions © 2006 Tele Atlas North American, Inc. All rights reserved. This material is
proprietary and the subject of copyright protection and other intellectual property rights
owned by or licensed to Tele Atlas North America, Inc. The use of this material is subject to
the terms of a license agreement. You will be held liable for any unauthorized copying or
disclosure of this material.
USPS Copyright Disclosure
National ZIP®, ZIP+4®, Delivery Point Barcode Information, DPV, RDI. © United States
Postal Service 2005. ZIP Code® and ZIP+4® are registered trademarks of the U.S. Postal
Service.
DataFlux holds a non-exclusive license from the United States Postal Service to publish and
sell USPS CASS, DPV, and RDI information. This information is confidential and proprietary to
the United States Postal Service. The price of these products is neither established, controlled,
or approved by the United States Postal Service.
iv
DataFlux Integration Server User's Guide
Table of Contents
Overview ....................................................................................................... 1
What DIS Does ............................................................................................. 1
Where and How DIS Runs............................................................................... 1
DataFlux Integration Server 8.2.1 - What's New in This Release ..................2
Installation Notes .......................................................................................... 2
Microsoft Win64 Considerations ....................................................................... 3
Conventions Used in This Document ................................................................ 3
New Features ................................................................................................ 4
Problems Resolved ........................................................................................ 4
DataFlux Integration Server Usage .................................................................. 4
DIS Configuration Options .............................................................................. 6
System Requirements ................................................................................... 7
Supported Operating Systems ......................................................................... 7
Supported Databases ..................................................................................... 8
Bundled UNIX Drivers .................................................................................... 9
DataFlux Standard Integration Server ........................................................ 12
Key Benefits of Standard Integration Server ................................................... 12
Architecture of Standard Integration Server .................................................... 12
DataFlux Enterprise Integration Server....................................................... 13
Key Benefits of Enterprise Integration Server .................................................. 13
Architecture of Enterprise Integration Server .................................................. 13
Understanding Enterprise Integration Server Processes .................................... 14
Installing DataFlux Integration Server ........................................................ 16
Installing DIS for Windows ........................................................................... 16
Installing DIS for UNIX/Linux ........................................................................ 17
DataFlux Integration Server User's Guide
v
Existing ACL Files ........................................................................................ 19
Configuring DataFlux Integration Server .................................................... 20
Configuring a Data Source ........................................................................... 21
Setting up ODBC Connections ....................................................................... 21
Changes to How SAS Data Sets are Accessed Between Versions 8.1.x and 8.2.1 . 23
Configuring Saved Connections ..................................................................... 24
Configuring Licensing .................................................................................. 26
Windows .................................................................................................... 26
UNIX.......................................................................................................... 27
Annual Licensing Notification......................................................................... 28
Installing Enrichment Data ......................................................................... 29
Downloading and Installing Data Packs .......................................................... 29
Configuring Enrichment Data ........................................................................ 31
Installing Other DataFlux Products ............................................................. 37
Installing dfIntelliserver ............................................................................... 37
Installing dfPower Studio .............................................................................. 37
Installing Quality Knowledge Bases ................................................................ 38
Installing Accelerators .................................................................................. 38
Changing Configuration Settings ................................................................. 39
Windows .................................................................................................... 39
UNIX.......................................................................................................... 40
Windows and UNIX ...................................................................................... 41
Configuring DataFlux Integration Server to Use the Java Plugin ................. 46
Java Runtime Environment ........................................................................... 46
Java Classpath ............................................................................................ 46
Environment Variables ................................................................................. 47
Optional Settings ......................................................................................... 47
vi
DataFlux Integration Server User's Guide
Pre-loading Services ................................................................................... 49
Pre-loading all services ................................................................................ 49
Pre-loading one or more specific services ....................................................... 49
Complex configurations ................................................................................ 49
Multi-threaded Operation ............................................................................ 51
DataFlux Integration Server Connection Manager ....................................... 52
Using Connection Manager on Windows .......................................................... 52
Using Connection Manager on UNIX ............................................................... 52
Sharing Connection Information .................................................................... 53
Connection Manager User Interface ............................................................... 53
DataFlux Integration Server Manager ......................................................... 55
DataFlux Integration Server Manager User Interface ........................................ 55
Using DataFlux Integration Server Manager ............................................... 61
Uploading Batch Jobs and Real-Time Services ................................................. 61
Downloading Batch Jobs and Real-Time Services ............................................. 63
Running and Stopping Jobs ........................................................................... 64
Testing Real-Time Services ........................................................................... 64
Deleting Jobs and Services ........................................................................... 64
Monitoring Job Status .................................................................................. 64
Using Log Files ............................................................................................ 64
Command Line Options ............................................................................... 67
DIS Security Manager Concepts .................................................................. 69
Windows and UNIX ...................................................................................... 69
Security Administration ................................................................................ 72
Security Commands for UNIX ........................................................................ 76
Using Strong Passwords in UNIX ................................................................... 77
Security Policy Planning .............................................................................. 78
DataFlux Integration Server User's Guide
vii
DIS Security Tools ....................................................................................... 79
Using Security Manager............................................................................... 81
Security Manager User Interface ................................................................. 88
Toolbar ...................................................................................................... 93
IP-based Security ........................................................................................ 95
DIS with LDAP Integration .......................................................................... 96
Configuration File ........................................................................................ 99
LDAP Directives ......................................................................................... 101
DIS Security Examples .............................................................................. 103
Frequently Asked Questions ...................................................................... 105
General .................................................................................................... 105
Installation ............................................................................................... 111
Security ................................................................................................... 111
Troubleshooting ........................................................................................ 113
Error Messages.......................................................................................... 116
Installation and Configuration ..................................................................... 116
Security ................................................................................................... 117
Running Jobs and Real-Time Services .......................................................... 118
Appendix A: Best Practices ........................................................................ 121
Appendix B: Code Examples ...................................................................... 123
Java......................................................................................................... 123
C++ ........................................................................................................ 126
C#........................................................................................................... 129
Appendix C: Saving Profile Reports to a Repository .................................. 134
Appendix D: SOAP Commands ................................................................... 135
SOAP Commands ...................................................................................... 135
Enumeration Values ................................................................................... 135
viii
DataFlux Integration Server User's Guide
Appendix E: DIS Service ............................................................................ 137
Windows .................................................................................................. 137
UNIX........................................................................................................ 138
Appendix F: Configuration Settings ........................................................... 139
General DIS Configuration Directives ........................................................... 139
DIS Security Related Configuration Directives ............................................... 146
Architect Configuration Directives ................................................................ 149
Data Access Component Directives .............................................................. 155
Glossary .................................................................................................... 161
DataFlux Integration Server User's Guide
ix
Overview
DataFlux® Integration Server (DIS) addresses the challenges of storing consistent,
accurate, and reliable data across a network by integrating data quality and data integration
business rules throughout your IT environment. Using DIS, you can replicate your business
rules for acceptable data across applications and systems, enabling you to build a single,
unified view of the enterprise.
What DIS Does
DIS is available in two editions—the Standard Integration Server and Enterprise edition.
DataFlux Standard Integration Server supports the ability to run batch dfPower® Studio
jobs in a client/server environment. DIS also supports the ability to call discrete DataFlux
data quality algorithms from numerous native programmatic interfaces, including C, COM,
JAVA™, Perl. Standard Integration Server allows any dfPower Studio client user to offload
batch dfPower Profile and Architect jobs to a more scalable server environment. This
capability enables users by freeing up resources on their local system.
DIS Enterprise edition has added the capability to allow the calling of business services
designed in the dfPower Studio client environment, or to invoke batch jobs using service
oriented architecture (SOA 1).
Where and How DIS Runs
DIS can be deployed on Microsoft® Windows®, UNIX®, and Linux® platforms with
client/server communication using HTTP. dfPower Studio users can select the Run Job
Remotely option to have a dfPower client send a job to the standard server.
Also included with DIS is the ability to make API 2 calls to the same core data quality engine
by using the dfIntelliServer® interface. Discrete API calls are available through native
programmatic interfaces for data parsing, standardization, match key generation, address
verification, geocoding, and other processes. Standard Integration Server requires a
developer to code programmatic calls to the engine.
1
Service Oriented Architecture (SOA) enables systems to communicate with the master customer
reference database to request or update information.
2
An application programming interface (API) is a set of routines, data structures, object classes and/or
protocols provided by libraries and/or operating system services in order to support the building of
applications.
DataFlux Integration Server User's Guide
1
DataFlux Integration Server 8.2.1
- What's New in This Release
Review the following release notes for DataFlux® Integration Server (DIS) 8.2.1 for
information about installation, new features, usage, and more. For additional information
about this release, please refer to DataFlux dfPower Studio Online Help, What's New in This
Release. DataFlux Integration Server supports all features available in the corresponding
dfPower release.
Installation Notes
Win 64 Considerations
Conventions Used in This Book
New Features
DataFlux Integration Server Usage
DIS Configuration Options
Installation Notes
Once the installation process has been completed, modify the dfexec.cfg file to set directory
paths for any relevant reference data (this includes USPS 3, Canada Post, Geocoding, and
QKB 4). Set the default port that the server is listening on, if needed. A valid license file will
need to be copied into the license directory. If upgrading from DIS v7.0.x, those files will
need to be reconfigured in the 8.2.1 directory structure. The configurations will not be
carried forward from previous installations.
Important: Users currently employing DIS Security will have Access
Control List (ACL 5) files that control access to objects in DIS. These ACL files
were located under the security directory in the DIS installation. In DIS v8.1,
the location of ACL files has changed and it is necessary to move previous
ACL files to a new location. All ACL files must be placed in the .acl
subdirectory of the directory that corresponds with the object type, for
example:
3
The United States Postal Service (USPS) provides postal services in the United States. The USPS
offers address verification and standardization tools.
4
The Quality Knowledge Base (QKB) is a collection of files and configuration settings that contain all
DataFlux data management algorithms. The QKB is directly editable using dfPower Studio.
5
Access control lists (ACLs) are used to secure access to individual DIS objects.
2
DataFlux Integration Server User's Guide
•
ACL files with the suffix _archsvc.acl must be moved to the .acl subdirectory
under the directory where service files reside.
•
ACL files with the suffix _archjob.acl must be moved to the .acl subdirectory
under the directory where Architect job files reside.
•
ACL files with the suffix _profjob.acl must be moved to the .acl subdirectory
under the directory where Profile job files reside.
For more information on installation, see Installing DataFlux Integration Server.
Microsoft Win64 Considerations
Starting with v8.1.1, DIS supports 64-bit Microsoft® Windows Vista®, Windows® XP
Professional x64 Edition, and 64-bit Windows Server® 2003 operating systems, with the
following exceptions and notations:
•
Only 64-bit AMD® and Intel® chip sets are supported. Itanium® is not supported.
•
International address verification using QAS is not supported.
•
US and Canadian address verification (including Geocoding) is supported only through
distributed enrichment nodes. This means dfIntelliServer installation is required. This
could be a Microsoft Win32 installation on the same platform that is running Win64.
•
The only Enrichment node supported is Address Verification (World). All other
enrichment activities should be accomplished using dfIntelliServer.
Conventions Used in This Document
This reference uses several conventions for special terms and actions. The following
variables are used throughout this documentation:
Variable
hostname
servername
Description
Represents the name used to identify a particular host, and is annotated as
[hostname]. For example: [hostname]:port.
Represents the name of the server in which DIS is installed, and is annotated
as [servername]. For example:
http://[servername]:port/?wsdl
username
Represents the name of the user, and is annotated as [username]. For
example:
[username]::permissions
version
Represents the version of DIS. Appears in file names and directory paths, and
is annotated as [version]. For example: \Program
Files\DataFlux\DIS\[version]\etc.
DataFlux Integration Server User's Guide
3
New Features
The following new features have been added to DIS for this release:
•
Service Name versioned (for Windows) - Starting with this release, the name of
the DIS service will include the product version number. The new name of the DIS
service for version 8.2.1 is "DFIntegrationService_v8.2.1".
•
Service Queuing - New functionality has been added that allows service requests to
be queued by DIS. In earlier versions of DIS, when the maximum number of
processes handling real-time service requests (DFWSVCs) is reached and all processes
are processing data, any new service request will receive an error message. The error
states the request cannot be handled because the server has reached the maximum
number of service processes allowed. There is now an option to enable service
requests to be queued in the order they are received. When a DFWSVC becomes
available, the request will be processed. This configuration parameter, svc requests
queue, can be configured in the dfexec.cfg file.
•
Licensing enhancement - DIS now authenticates the following values from
the SAS® license file: OS, GRACE, RELEASE, and WARN.
•
Logging - The ability to log time in milliseconds has been added to DIS.
•
Macro handling - The DIS server now allows clients to pass macros to a service and
to get the final values of those macros.
Problems Resolved
The following problems have been resolved for this release:
•
Unable to run services via WLP on UNIX servers when SHM is used for child
connection
•
UNIX install breaks if PERL cannot be found
•
When DIS is killed, wlpslave does not exit
•
Cryptic error messages received when SAS license file cannot be found
•
DIS installer does not detect existing install directory
DataFlux Integration Server Usage
DIS runs as a Microsoft Windows service (called DataFlux Integration Server). You can start
and stop the service using the Microsoft Management Console. DIS uses the dfPower Studio
execution environment for Windows (dfexec) to execute real-time services, as well as
Architect and Profile jobs.
The following sections summarize some of the more common configuration settings and how
they are used. For a complete list of available configuration settings, refer to the DataFlux
Integration Server User's Guide.
4
DataFlux Integration Server User's Guide
Note that in the following sections, DFEXEC_HOME is used to represent the root directory of
the dfexec installation.
dfexec Configuration Options
The standard configuration options are as follows:
•
plugin dir - This is where dfexec looks for plugin libraries.
•
license dir - This is where dfexec looks for license files. You should place the license
file (studio.lic) you have received from DataFlux in this directory.
•
qkb root - The location of the Quality Knowledge Base (QKB) files. This must be set if
you are using steps that depend on the algorithms and reference data in the QKB
(such as the matching or parsing steps).
•
datalib path - The location of the Verify data libraries. This must be set if your jobs
use US address verification.
•
usps db; canada post db; geo db; world address db - The locations of the
different Verify address databases. The USPS database is required for US address
verification; Canada Post database for Canadian address verification; Geo/Phone
database for geocoding and coding telephone information; World Address database for
address verification outside the US and Canada.
•
verify cache - The cache level (0-100) for Verify. The greater the value, the more
aggressively the Verify steps cache address data.
•
enable rdi; enable dpv - Enable or disable RDI/DPV processing during US address
verification. This key should be set to yes or no. The default value is no.
•
world address license - License code to unlock the World Address database.
•
sort chunk - The amount of memory to use when sorting.
•
cluster memory - The amount of memory to use when clustering.
•
checkpoint - The amount time between log checkpoints. After this amount of time
elapses, messages are printed to the log, which contains the current status of each
step in the job currently being executed. The checkpoint timer is then reset. Values
can end in s, min, or h; for example, 30min.
•
mail command - The command used to mail alerts. The command may contain the
substitutions %T (To) and %B (Body). %T is replaced with the destination email
address, and %B with the path of a temporary file containing the message body. The
default command is mail %T < %B.
•
arch config - Location of the configuration file containing optional macro values for
Architect jobs.
For details on stopping and starting this service, see Appendix E: The DIS Server.
DataFlux Integration Server User's Guide
5
DIS Configuration Options
DIS reads configuration options from the configuration file. The installer creates a
configuration file (DFEXEC_HOME/etc/dfexec.cfg) with default values for the essential
options. This file is in a "key = value" format and can be edited with any text editor.
The standard configuration options are as follows:
•
server listen port - The TCP port number where the server listens for connections.
•
server read timeout - The amount of time the server waits to complete read/write
operations.
•
dfsvc max num - The maximum number of simultaneous dfsvc processes.
•
dfexec max num - The maximum number of simultaneous dfexec processes.
•
working path - The directory where the server creates its working files and
subdirectories.
•
restrict general access - The server can restrict access to functions by IP address.
The value of this option should be the word allow or deny followed by a list of IP
addresses or ranges, with each range or individual address separated with a space.
Ranges must be in the low-high format, for example: 192.168.1.1-192.168.1.255.
•
restrict post/delete access - If this is set, the server restricts access to connections
originating from the listed IP addresses for posting and deleting jobs only. It has the
same format as restrict general access.
LDAP Requirements for UNIX/Linux platforms
AIX® - You must have the ldap.client.rte package installed. Run lslpp -l ldap.client.rte to
see if it is installed. You can find this package on the installation media for AIX.
HP-UX® - You must have the LDAP-UX client installed. Run /usr/sbin/swlist -l product
LdapUxClient to see if it is installed. If you do not have LDAP-UX you can get it from the
Hewlett Packard Web site at
http://software.hp.com/portal/swdepot/displayProductInfo.do?productNumber=J4269AA.
Linux - You must have the OpenLDAP client installed. On an RPM-based system (such as
RedHat® or SuSe™) you can run pm -q openldap to see if it is installed. For other Linux
systems, consult your documentation for how to test the availability of software packages.
RedHat Enterprise Linux 4 or later also require the compat-openldap package. Run rpm -q
compat-openldap to see if it is installed. You can find this package on the installation
media or the Red Hat Network.
Solaris® - No additional requirements; the LDAP client library is part of the Solaris core
libraries.
For details on configuration options, see Configuring DataFlux Integration Server.
6
DataFlux Integration Server User's Guide
System Requirements
Supported Operating Systems
The following is a list of the minimum requirements for supported platforms for a DataFlux®
Integration Server (DIS) installation. The minimum operating system requirements may be
different if you are accessing SAS data sets. In some instances, you may be required to run
a more recent version of the operating system, as noted in parentheses:
Requirement
1
Minimum
Recommended
Platforms
[See table below]
N/A
Processor
[See table below]
N/A
Memory (RAM)
Disk Space
1 GB
2
2 GB per CPU core2
1 GB for Installation
1 GB for temp space
10 GB for Installation3
20 GB for temp space 3
Notes:
1.
Other platforms are available - contact DataFlux for a complete list.
2.
Actual requirements depend on configuration.
3.
Verification processes rely on reference databases to verify and correct address
data. The size of these reference databases varies. Check with DataFlux for
exact size requirements for this component.
Platform
AIX®
Bits
Operating System
Hardware
Architecture
64
IBM® AIX 5.2 (SAS: 5.3 Technology Level 6 or later;
64-bit environment)
POWER/Power
PC®
HP-UX (PA- 64
RISC)
HP-UX 11i Version 1.0 (11.11) (SAS: HP–UX 11.23 or
later)
PA-RISC 2.0
HP-UX
(Itanium)
64
HP-UX 11i Version 2.0 (11.23) (SAS: June 2007 patch Itanium® (IA64)
bundle)
Linux®
32
Linux 2.4 (glibc 2.3) (SAS: Red Hat Enterprise Linux 4 Intel® Pentium®
and above; SuSE Linux Enterprise Server 9 or later)
Pro (i686)
Linux
64
Linux 2.4 (glibc 2.3) (SAS: Red Hat Enterprise Linux 4 AMD AMD64 or
and above; SuSE Linux Enterprise Server 9 or later)
Intel EM64T
Solaris™
(SPARC)
64
Sun™ Solaris 8 (SAS: Solaris 9 or later with 9/05
update)
DataFlux Integration Server User's Guide
sparcv9
(UltraSparc)
7
Platform
Bits
Operating System
Hardware
Architecture
Solaris x86 64
Sun Solaris 10 (SunOS 5.10) (SAS: Solaris 10 1/06 or AMD AMD64 or
later; if using Solaris 10 and LDAP for authentication, Intel EM64T
then apply patch 118833-27 or later)
Win32
32
Microsoft Windows 2003 (NT 5.2)
Intel Pentium Pro
(i686)
Win64
64
Microsoft Windows 2003 (NT 5.2)
AMD AMD64 or
Intel EM64T
Linux Notes
DataFlux supports any distribution of Linux which meets the minimum requirements for
kernel and glibc versions mentioned above. We do not require a specific distribution like
RedHat® or SuSe. Following is a list of some of the more popular distributions and the
minimum version of each which meets these requirements and is still supported by the
vendor:
•
Red Hat® Fedora™: 7.0
•
Red Hat Enterprise Linux®: 3.0
•
Novell® SuSe® Linux Enterprise Server: 9.0
•
Canonical© Ubuntu©: 6.06
Supported Databases
The following databases are supported by DIS.
Database
Driver
ASCII Text Files
TextFile
Pervasive® Btrieve® 6.15
Btrieve
Clipper
dBASE File
DB2 Universal Database (UDB) v7.x, v8.1, and v8.2 for Linux, UNIX, DB2 Wire Protocol
and Windows
DB2 UDB v7.x and v8.1 for z/OS
DB2 Wire Protocol
DB2 UDB V5R1, V5R2, and V5R3 for iSeries
DB2 Wire Protocol
dBASE® IV, V
dBASE
Microsoft Excel® Workbook 5.1, 7.0
Excel
FoxPro 2.5, 2.6, 3.0
dBase
FoxPro 6.0 (with 3.0 functionality only)
dBase
FoxPro 3.0 Database Container
dBase
IBM Informix® Dynamic Server 9.2x, 9.3x, and 9.4x
Informix
8
DataFlux Integration Server User's Guide
Database
Driver
IBM Informix Dynamic Server 9.2x, 9.3x, and 9.4x
Informix Wire Protocol
Microsoft SQL Server 6.5
SQL Server
Microsoft SQL Server 7.0
SQL Server Wire
Protocol
Microsoft SQL Server 2000 (including SP 1, 2, 3 and 3a)
SQL Server Wire
Protocol
Microsoft SQL Server 2000 Desktop Engine (MSDE 2000)
SQL Server Wire
Protocol
Microsoft SQL Server 2000 Enterprise (64-bit)
SQL Server Wire
Protocol
Oracle® 8.0.5+
Oracle
Oracle 8i R1, R2, R3 (8.1.5, 8.1.6,8.1.7)
Oracle
Oracle 9i R1, R2 (9.0.1, 9.2)
Oracle
Oracle 10g R1 (10.1)
Oracle
Oracle 8i R2, R3 (8.1.6, 8.1.7)
Oracle Wire Protocol
Oracle 9i R1 and R2 (9.0.1 and 9.2)
Oracle Wire Protocol
Oracle 10g R1 (10.1)
Oracle Wire Protocol
Corel® Paradox® 4, 5, 7, 8, 9, and 10
ParadoxFile
Pervasive PSQL® 7.0, 2000
Btrieve
Progress® OpenEdge® Release 10.0B
Progress OpenEdge
Progress 9.1D, 9.1 E
Progress SQL92
Sybase® Adaptive Server® 11.5 and higher
Sybase Wire Protocol
Sybase Adaptive Server Enterprise 12.0, 12.5, 12.5.1, 12.5.2 and
12.5.3
Sybase Wire Protocol
XML
XML
This is a consolidated list of the drivers available for Windows, Linux, and various UNIX
platforms. Please consult with DataFlux for a complete and updated database version and
platform support list.
Bundled UNIX Drivers
The following list shows which UNIX platform bundled drivers are supplied for the specified
databases. Please consult with DataFlux for a complete database version and platform
support list.
AIX
•
DB2 Wire Protocol
•
Informix Wire Protocol
•
Oracle
DataFlux Integration Server User's Guide
9
•
Oracle Wire Protocol
•
SQL Server Wire Protocol
•
Sybase Wire Protocol
HP-UX (Itanium)
•
DB2 Wire Protocol
•
Informix Wire Protocol
•
Oracle
•
Oracle Wire Protocol
•
SQL Server Wire Protocol
•
Sybase Wire Protocol
•
Teradata
3
HP-UX (PA-RISC)
•
DB2 Wire Protocol
•
Informix Wire Protocol
•
Oracle Wire Protocol
•
SQL Server Wire Protocol
•
Sybase Wire Protocol
Linux
10
•
DB2 Wire Protocol
•
dBase
•
FoxPro3
•
Informix Wire Protocol
•
OpenEdge
•
Oracle Progress
•
Oracle Wire Protocol
•
Progress SQL92
•
SQL Server Wire Protocol
2
2
1,2
2
DataFlux Integration Server User's Guide
•
Sybase Wire Protocol
•
Teradata
•
Text
1,2
2
Solaris (SPARC)
•
DB2 Wire Protocol
•
Informix Wire Protocol
•
Oracle
•
Oracle Wire Protocol
•
SQL Server Wire Protocol
•
Sybase Wire Protocol
Solaris (x86)
•
DB2 Wire Protocol
•
Oracle Wire Protocol
•
SQL Server Wire Protocol
•
Sybase Wire Protocol
Notes
1.
requires 5.1 (or newer) drivers
2.
32-bit only
3.
requires 5.2 (or newer) drivers
DataFlux Integration Server User's Guide
11
DataFlux Standard Integration
Server
DataFlux Standard Integration Server supports native programmatic interfaces for C, C++,
COM, Java™, Perl, Microsoft® .NET, and Web services. The API engine runs in its own
process as a Microsoft Windows® service or UNIX®/Linux® daemon. The API engine
includes both a client installation and a server installation, with communication across the
network using Transmission Control Protocol (TCP). If the client and server are installed on
the same machine, they may be configured to communicate through inter-process
communication (IPC). The Standard Integration Server includes client-side failover support
for all API calls.
Key Benefits of Standard Integration Server
•
Supports the ability to run DataFlux® dfPower® Studio jobs in a client/server mode
by allowing users to offload dfPower Studio jobs onto a higher performance server.
•
Exposes core data quality algorithms through programmatic interfaces.
Architecture of Standard Integration Server
The following figure depicts integration architecture for the Standard Integration Server.
12
DataFlux Integration Server User's Guide
DataFlux Enterprise Integration
Server
DataFlux® Enterprise Integration Server offers an innovative approach to data quality that
drastically reduces the time required to develop and deploy real-time data quality and data
integration services. Through tight integration with the dfPower® Studio design
environment, the Enterprise Integration Server operates as a data quality and data
integration hub. Both batch and real-time services, which may include database access,
data quality, data integration, data enrichment, and other integration processes, can then
be called through a service-oriented architecture (SOA 6). This eliminates the requirement to
replicate data quality logic in native programming languages such as Java™ or C. Instead of
writing and testing hundreds of lines of code, you can design the integration logic visually
and then call from a single Web service interface.
The Enterprise Integration Server supports real-time deployment using SOA, as well as the
ability to run batch dfPower Studio jobs. The batch server capability is the same as the
Standard Integration Server, where dfPower Studio clients communicate with the server
through HTTP and the clients can process dfPower Profile and Architect jobs in the server
environment. Batch jobs may also be instantiated using a Web service call.
Key Benefits of Enterprise Integration Server
• Supports the ability to run dfPower Studio jobs in a client/server mode by allowing
users to offload dfPower Studio jobs onto a higher performance server.
•
Supports the ability to create data quality and integration processes visually instead of
locking the logic into native code.
•
Supports a SOA framework, enabling complete reuse of data quality and integration
business logic.
Architecture of Enterprise Integration Server
The following figure depicts integration architecture for the Enterprise Integration Server.
6
Service Oriented Architecture (SOA) enables systems to communicate with the master customer
reference database to request or update information.
DataFlux Integration Server User's Guide
13
Understanding Enterprise Integration Server
Processes
Activity on the Enterprise Integration Server is split into three general processes:
•
Receive SOAP requests
•
Monitor registered data quality and data integration services
•
Send SOAP responses
The Enterprise Integration Server runs as a daemon on UNIX and Linux or as a service on
Microsoft® Windows® platforms. The server is responsible not only for sending and
receiving SOAP requests, but also for monitoring the progress of all registered data
integration services. Once the server receives a request, the server sends the data to the
invoked Web service. If the service has not been invoked before, the server will load the
service into memory and send the data to the in-memory processes. If the service invoked
from the client application is busy, the server will instantiate a new service into memory and
pass the data off to the new service. Each service runs in its own process, which allows for
robust error recovery, as well as the ability to spread the processing load across multiple
CPUs.
14
DataFlux Integration Server User's Guide
More specifically, the server handles the following processes:
•
Query server to return the names of memory services
•
Return input/output parameters for a specified service
•
Pass data to a service and execute the service
Query Server to Return the Names of Memory Services
If the server receives a query request, the server simply queries the service configuration
directory and returns the name of each service. The service names are packaged up into a
single SOAP packet and sent back to the client.
Return Input/Output Parameters for a Specified Service
If the client queries the server for the input and output names of a given service, the server
will return to the client the names of the expected input fields, as well as the names of the
expected output fields.
Pass Data to and Execute a Service
When the server receives a request to process data from a client call, it identifies an idle
service, sends the data to the idle service, and listens for additional client requests. If no
idle service is identified, the server will load a new service into memory and pass the data
to the new service. Since each service runs in its own process, processing multiple services
can be spread across multiple CPUs, and the server is always available and listening for
additional client requests. The server monitors the service progress; as soon as the service
returns output, the server sends the output back to the client application. If the service fails
for any reason, the server will terminate the service process and return an error message to
the calling application.
DataFlux Integration Server User's Guide
15
Installing DataFlux Integration
Server
These sections explain the steps required to install DataFlux® Integration Server (DIS) on
Microsoft® Windows® and UNIX®.
Installing DIS for Windows
Download the latest version of DIS for Microsoft Windows from the download section of the
DataFlux Customer Care portal at http://www.dataflux.com/Customer-Care/.
Installation on a Windows platform requires running the DIS set up program. The set up
wizard helps with the installation process. During the installation process, you will be asked
to select additional components to install. ODBC Drivers are automatically selected for you.
You will be prompted for your licensing method. You may set up licensing now or by using
the License Manager after you have completed the installation.
Directory Layout for Windows-based DIS Installations
Directory
Description
DIS\[version]
Top-level installation directory
DIS\[version]\arch_job
Default directory to store Architect jobs
DIS\[version]\bin
Executable files for this platform. The wscode executable stored in this
directory will provide you with the product and machine codes needed
to unlock the full functionality of the product.
DIS\[version]\data
Data specific to this installation
DIS\[version]\etc
Configuration and license files
DIS\[version]\help
Help files
DIS\[version]\log
Default directory for log files
DIS\[version]\prof_job
Default directory to store Profile jobs
DIS\[version]\sample
DataFlux sample database
DIS\[version]\svc_job
Default directory to store real-time services
DIS\[version]\temp
Default temporary directory for input/output files
DIS\[version]\webclient
Default directory for the web client
DIS\[version]\work
Default location of working data for running processes
Once you complete the installation, you need to configure the server for your environment.
See Configuring DIS for more information.
16
DataFlux Integration Server User's Guide
Note: dfIntelliServer is a separate component that provides a simple,
scalable, customizable architecture which allows an organization to integrate
DataFlux's powerful data quality technology into its own applications. For
information on configuring and using dfIntelliServer, see DataFlux
dfIntelliServer Reference Guide.
Installing DIS for UNIX/Linux
Download the latest version of DIS for UNIX/Linux® from the download section of the
DataFlux Customer Care portal at http://www.dataflux.com/Customer-Care/.
This installation includes the dfPower® Architect and Profile execution environment for UNIX
systems. It also includes the server components of the DIS. Other components of dfPower
are not included.
If you have previously installed DIS in this directory, your dfexec.cfg configuration file and
odbc.ini file will be overwritten. If you have made changes to these files and would like to
preserve them, save the files to another location before installing DIS.
Follow these instructions to install DIS for UNIX/Linux:
1.
Copy the DIS installation and README.txt that corresponds to your operating system
(AIX®, HP-UX, Linux, or Solaris™) to an accessible directory.
2.
At the command prompt, connect to the location where you are loading DIS.
3.
Specify the directory where you will be loading DIS, and navigate to that directory.
Note: All files will be installed in a subdirectory called dfPower.
4.
Enter the following command to unzip the installation file. Replace PATH_TO in the
command with the directory where you copied the installation file:
gzip -c -d PATH_TO/dfpower-exec-[version].tar.gz | tar xvf -
DataFlux Integration Server User's Guide
17
Unzip the DIS installation file
5.
Execute the installation program by typing: perl dfpower/install.pl
The installation wizard will now take control of the installation process. Follow the onscreen
instructions to complete the installation.
Execute the installation program
18
DataFlux Integration Server User's Guide
Directory Layout for UNIX/Linux-based DIS Installations
Directory
Description
dfpower
Top-level installation directory
dfpower/bin
Executable files for this platform
dfpower/data
Data specific to this installation
dfpower/doc
This includes documentation, the file dfpower/doc/usage has some basic
instructions on the use of the dfexec.cfg file
dfpower/etc
Configuration, log, and license files
dfpower/lib
Library files for this platform, this is the default location of the Architect
plug-in libraries
dfpower/locale
Localization files for this platform
dfpower/share
Shared (not platform specific) data
dfpower/var
Default location of working data for running processes
dfpower/install.pl This is the installer; see Installing DIS to UNIX/Linux for more information
dfpower/README The README.txt file
Existing ACL Files
Users of earlier versions of DIS, and who currently employ DIS security, will have Access
Control List (ACL) files that control access to the various objects in DIS. These files were
formerly located under the security directory in the DIS installation. In version 8.1, the
locations for the ACL files have changed, and it is necessary to move your ACL files to a new
location. All ACL files should be placed in an .acl subdirectory within the directory for the
corresponding object type, as follows:
•
All ACL files with the suffix _archsvc.acl must be moved to the .acl subdirectory under
the directory where service files reside.
•
All ACL files with the suffix _archjob.acl must be moved to the .acl subdirectory under
the directory where Architect job files reside.
•
All ACL files with the suffix _profjob.acl must be moved to the .acl subdirectory under
the directory where Profile job files reside.
Once you complete the installation, you must configure the server for your environment.
See Configuring DIS for more information.
DataFlux Integration Server User's Guide
19
Configuring DataFlux Integration
Server
This section covers configuring DataFlux® Integration Server (DIS) for Microsoft®
Windows® and UNIX® operating systems. See Configuration Settings for a list of options.
To configure server software:
20
1.
Set up database connections
2.
Configure licensing
3.
Install enrichment data
4.
Install other DataFlux products
5.
Change configuration settings
6.
Configure DIS to use the Java Plugin
7.
Start the DIS service
DataFlux Integration Server User's Guide
Configuring a Data Source
To process a database with DataFlux® Integration Server (DIS), an ODBC 7 driver for the
specified database must be installed, and the database must be configured as an ODBC data
source. You can also access flat files and text files outside of the ODBC configuration
method if your dfPower® Architect or Profile job has specific nodes for those data sources.
Setting up ODBC Connections
Best Practice: Refer to Appendix A: Best Practices - Use a System data source rather than
a User data source for additional information about Configuration Settings.
Windows
To process a database in Architect, an ODBC driver for the specified database management
system (DBMS) must be installed, and the database must be configured as an ODBC data
source. To add a data source, use the ODBC Data Source Administrator provided with
Microsoft® Windows®.
ODBC Data Source Administrator Dialog
7
Open Database Connectivity (ODBC) is an open standard application programming interface (API) for
accessing databases.
DataFlux Integration Server User's Guide
21
To set up a new ODBC connection:
1.
Click Start > Settings > Control Panel.
2.
Double-click Administrative Tools > Data Sources (ODBC).
3.
In the ODBC Data Source Administrator dialog, select the driver that is appropriate
for your data source.
4.
Click Add.
5.
In the ODBC Driver Setup dialog, enter the Data Source Name, Description, and
Database Directory. These values are required, and can be obtained from your
database administrator.
6.
Select the Database Type.
If these steps have been completed successfully, the database name will display in the
database list found on the Connection Manager main screen in Windows.
UNIX
Use the interactive ODBC Configuration Tool (dfdbconf) to add new data sources to the
ODBC configuration.
1.
From the root directory of the dfexec.cfg installation, run: ./bin/dfdbconf.
2.
Select A. to add a data source. You can also use dfdbconf to delete a data source if it
is no longer needed.
3.
Select a template for the new data source by choosing a number from the list of
available drivers.
4.
You are prompted to set the appropriate parameters for that driver. The new data
source is then added to your odbc.ini file.
Once you have added all of your data sources, the interactive ODBC Viewer (dfdbview)
application can be used to test your connection. For example, if you added a data source
called my_oracle, run: ./bin/dfdbview my_oracle (from the installation root) to test the
connection. You may be prompted for a user name and password. If the connection
succeeds, you will see a prompt from which you can enter SQL commands and query the
database. If the connection fails, DIS displays error messages describing one or more
reasons for the failure.
Note: When configuring the new data source, it is critical that the
parameters (such as DSN, host, port, and sid) match exactly those used to
create the job on the client machine.
22
DataFlux Integration Server User's Guide
Changes to How SAS Data Sets are Accessed
Between Versions 8.1.x and 8.2.1
In v8.1.1, if you wanted to run a job using a SAS data set on an Integration Server you
would modify the primarypath parameter in the connect string (DSN in the Advanced
Properties) for the SAS Data Set input node or the SAS Data Set Target (Insert) output
node. The primarypath could either be hard-coded to point to the appropriate directory
where the data sets are located on the DIS, or it could be set as a macro and read from the
architect.cfg file. For example, a common connect string in v8.1.1 would look like this:
DRIVER=BASE;CATALOG=BASE;SCHEMA=(name='SAS';primarypath='C:\dfHome\demo
data\sasdata')
To modify the job to run on a UNIX host, the connect string had to be changed to look
something like this:
DRIVER=BASE;CATALOG=BASE;SCHEMA=(name='SAS';primarypath='/dfhome/demoda
ta/sasdata’)
Or this:
DRIVER=BASE;CATALOG=BASE;SCHEMA=(name='SAS';primarypath='%%path_to_data
sets%%')
where the macro variable %%path_to_datasets%% is defined as
/dfhome/demodata/sasdata in the architect.cfg file on a UNIX Integration Server.
This has changed for version 8.2.1. The connect string is no longer stored in the node.
Instead, it is stored in a configuration file that is referenced by the node. For example, the
DSN property of the input and output nodes now look something like this:
DSN=SAS tables;DFXTYPE=TKTS
At this point, the job references the connection configuration file located in the dftkdsn
directory in the etc directory of the DIS installation. In this example the file name is SAS
tables.dftk. The configuration file should look something like this:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<datafluxdocument class="dftkdsn" version="1.0">
<name>SAS tables</name>
<description>SAS data sets</description>
<attributes>
<attribute name="DRIVER">BASE</attribute>
<attribute name="CATALOG">BASE</attribute>
<attribute
name="SCHEMA">(name='SAS';primarypath='C:\dfhome\demodata\sasdata';LOCK
TABLE=SHARE)</attribute>
<attributes>
<datafluxdocument>
DataFlux Integration Server User's Guide
23
To modify the job to run on a UNIX host, complete these two steps:
1.
Copy the SAS tables.dftk from the dfPower client where the connection was created
to the Integration Server where the job will be run, placing it in the location
mentioned above.
2.
Modify the primarypath in the file so that it points to the correct location for the SAS
data sets, like this:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<datafluxdocument class="dftkdsn" version="1.0">
<name>SAS tables</name>
<description>SAS data sets</description>
<attributes>
<attribute name="DRIVER">BASE</attribute>
<attribute name="CATALOG">BASE</attribute>
<attribute
name="SCHEMA">(name='SAS';primarypath='/dfhome/demodata/sasdata';LOCKTA
BLE=SHARE)</attribute>
<attributes>
<datafluxdocument>
Configuring Saved Connections
After you configure the data sources and test the connection, you should store a saved
connection for that data source. Saved connections provide a mechanism for storing
encrypted authentication information (user name/password combinations) for a data source.
When a saved connection is used only, the DSN 8 is stored in the job file, not the entire
connection string. When the job is executed it refers to the connection file for the
authentication information for that DSN. In order to use a saved connection, the same
connection must also be saved on the client machine where the job was created.
Windows
When configuring a Microsoft SQL Server® connection, it is recommended that you use SQL
Server authentication rather than Windows authentication.
Note: When configuring the new data source, ensure the parameters
(such as DSN, host, port, and sid) match exactly those used to create the job
on the client machine.
8
A data source name (DSN) contains connection information, such as user name and password, to
connect through a database through an ODBC driver.
24
DataFlux Integration Server User's Guide
The Connection Manager used to administer the saved connections comes installed in the
DIS program group. When you run Connection Manager you will see a list of all available
connections that have been configured using ODBC. When you select a connection and click
Save, you are prompted for the user name and password. If the connection is successful,
the logon information is saved in a file. The user name and password are encrypted. To
enable users to share the same connection information, modify one or both of the following
registry keys and enter a valid directory path:
•
HKEY_CURRENT_USER\Software\DataFlux
Corporation\dac\[version]\savedconnectiondir
•
HKEY_LOCAL_MACHINE\Software\DataFlux
Corporation\dac\[version]\savedconnectiondir
where [version] indicates the version of DIS that you have installed.
If this key does not contain an entry, the connection file is stored in a dfdac subdirectory
under the user's home directory. For example, if the user qatest stored a connection for a
data source named mydatasource, it would reside in this file:
c:\Documents and Settings\qatest\dfdac\mydatasource
Best Practice: Refer to Appendix A: Best Practices - Use Connection Manager to Configure
Data Sources for additional information about Configuration Settings.
UNIX
To create a saved connection, from the root directory of the DIS installation run:
./bin/dfdbview —s —t
The information is saved in the user's home directory, within the .dfpower directory. The
user ID and password are encrypted.
For more information regarding dfdbview, see the usage.ODBC file in the /doc directory of
the DIS installation. For more information on configuring ODBC data sources, see the ODBC
Reference document that accompanies the dfPower Studio installation (click Start >
Programs > DataFlux dfPower Studio > Help > ODBC Reference).
DataFlux Integration Server User's Guide
25
Configuring Licensing
Windows
DataFlux® Integration Server (DIS) uses a file-based licensing model that takes the form of
a machine-specific license file. The license pool for executing jobs and services using DIS
has uncounted licenses (an infinite number of licenses) for each type of licenses purchased.
If DIS is packaged as part of SAS, you have the option of selecting SAS license file as your
licensing method.
Note: The license dir parameter in the dfexec.cfg file is no longer
supported on DIS for Microsoft® Windows®. In order to set or change the
license location, you must use the license manager application.
To configure your license for DIS, do the following:
1.
Run the DataFlux Host ID application to generate a Host ID for your Integration
Server. From the dfPower® Studio main menu, click Help > DataFlux Host ID.
DataFlux Host ID application
26
2.
Contact your DataFlux representative and provide the DataFlux Host ID to obtain your
license file.
3.
Save the license file to [installation drive]:\Program
Files\DataFlux\DIS\[version]\etc\[license file].
4.
Make note of the full path to the licensing location, including the file name. To specify
the licensing location by using the License Manager, click Start > Programs
> DataFlux Integration Server > License Manager. In the License Manager
dialog, select DataFlux license file, and enter the Location.
DataFlux Integration Server User's Guide
DataFlux License Manager Dialog
UNIX
DIS uses a file-based licensing model that takes the form of a machine-specific license file.
The license pool for executing jobs and services using DIS has uncounted licenses (an
infinite number of licenses) for each type of license purchased.
To configure your license file for DIS, do the following:
1.
To generate a Host ID, run ./bin/lmhostid. Write down the FLEXnet host ID that is
returned.
2.
Log onto the Customer Care Portal at http://www.dataflux.com/Customer-Care and
click Request License Unlock Codes. This opens the License Request Form page.
3.
Enter the requested information, including the Host ID generated in Step 1.
4.
When you receive your new license file, save it on the UNIX® server in the etc/license
directory. License files must have a .lic file name extension in order to be considered.
With file-based licensing, you should not change the license location setting in the
dfexec.cfg configuration file.
If you need to change the licensing method, run ./bin/dflm. The optional -m switch allows
you to change licensing methods. If you use this switch, you must restart DIS.
SAS License
If you have obtained a license from SAS, complete these steps:
1.
Set the license location setting in the dfexec.cfg configuration file to point to your
license file.
2.
Run ./bin/dflm -m.
3.
Set the license type to SAS license file.
DataFlux Integration Server User's Guide
27
Annual Licensing Notification
DIS uses an annual license process to allow users to access services and run jobs. The
system alerts the user when each feature's license is nearing expiration, using the following
process:
1.
Sixty days before a license is due to expire, a dialog will begin appearing daily in the
Integration Server Manager. It contains a list of the licensed features that are expiring
as well as the number of days remaining for each feature's license. To stop the dialog
from reappearing, click Do not display this warning again.
2.
When a license reaches its expiration date, another dialog begins displaying daily,
alerting the user that one or more features have expired, and that these features are
now operating within a thirty-day grace period. The dialog lists the number of days
left within the grace period for each feature, or if a feature has already expired and
can no longer be accessed. This dialog cannot be disabled; it will continue to appear
daily.
3.
After the thirty-day grace period, services or jobs that are requested through DIS, but
have expired, no longer run.
The server log files keep records of all notification warnings generated.
Contact your DataFlux sales executive to renew your DataFlux product licenses.
28
DataFlux Integration Server User's Guide
Installing Enrichment Data
If you are using external data, install USPS 9, Software Evaluation and Recognition Program
(SERP 10), Geocode/Phone, QuickAddress Software (QAS 11), World, or other enrichment
data. Make a note of the path to each data source. You will need this information to update
the dfexec.cfg configuration file.
Downloading and Installing Data Packs
If your DataFlux® dfPower® Studio installation includes a Verify license, you need to install
the proper USPS, Canada Post, and Geocode databases to do address verification. If you are
licensed to use QAS, you must acquire the postal reference databases directly from QAS for
the countries they support. For more information, contact your DataFlux representative.
Data Packs for data enrichment are available for download on the DataFlux Customer Care
portal at http://www.dataflux.com/Customer-Care. To download data packs, follow these
steps:
1.
Obtain a user name and password from your DataFlux representative.
2.
Log in to the DataFlux Customer Portal.
Note: You may also retrieve the data pack installation files through
FTP. Please contact DataFlux Technical Support at 919-531-9000 for
more information regarding downloading through FTP.
3.
Click Downloads > Data Updates.
9
The United States Postal Service (USPS) provides postal services in the United States. The USPS
offers address verification and standardization tools.
10
The Software Evaluation and Recognition Program (SERP) is a program the Canadian Post
administers to certify address verification software.
11
QuickAddress Software (QAS) is used to verify and standardize US addresses at the point of entry.
Verification is based on the latest USPS address data file.
DataFlux Integration Server User's Guide
29
Data Updates Page
4.
Select the installation file corresponding to your data pack and operating system to
download.
Close all other applications and follow the procedure that is appropriate for your operating
system.
Windows
Browse to and double-click the installation file to begin the installation wizard. If you are
installing QAS data, you must enter a license key. When the wizard prompts you for a
license key, enter your key for the locale you are installing.
UNIX
Installation notes accompany the download for each of the UNIX® data packs from
DataFlux. For Platon and USPS data, check with the vendor for more information.
Note: Be sure to select a location to which you have write access and
which has at least 430 MB of available space.
Note: Download links are also available from the dfPower Navigator
Customer Portal link in dfPower Studio.
30
DataFlux Integration Server User's Guide
Configuring Enrichment Data
If you are using external data, install USPS, SERP, Geocode/Phone, QAS, World, or other
enrichment data. You will need to specify the path to each data source in your configuration
file.
Configuring USPS
Windows
Download Windows Verify Data Setup from the DataFlux Customer Portal, and run the
installation file.
UNIX
Download UNIX Verify Data Setup from the DataFlux Customer Portal and install the file
on your DIS machine.
Setting
usps db
Description
This is the path to the USPS database, which is required for US address verification
(Architect batch jobs and real-time services).
# Windows Example
usps db = C:\Program Files\DataFlux\verify\uspsdata
# UNIX Example
usps db = /opt/dataflux/verify/uspsdata
Configuring DPV
Windows
Download Windows Verify DPV Data Setup from the DataFlux Customer Portal, and run
the installation file. Enable DPV by changing the enable dpv setting in the dfexec.cfg file.
UNIX
Download UNIX Verify DPV Data Setup, under USPS in the Data Updates section of the
customer portal. Enable DPV by changing the enable dpv setting in the dfexec.cfg file.
DataFlux Integration Server User's Guide
31
Setting
enable
dpv
Description
To enable Delivery Point Validation (DPV 12) processing (for US Address
Verification), set to yes. It is disabled by default (Architect batch jobs and realtime services).
# Windows or UNIX Example
enable dpv = yes
Configuring USPS eLOT
Windows
Download Windows Verify eLOT Data Setup from the DataFlux Customer Portal, and run
the installation file. Enable eLOT by changing the enable elot setting in the dfexec.cfg file.
UNIX
Download UNIX Verify eLOT Data Setup, under USPS in the Data Updates section of the
customer portal. Enable eLOT by changing the enable elot setting in the dfexec.cfg file.
Setting
enable
elot
Description
To enable USPS eLOT processing (for US Address Verification), set to yes. It is
disabled by default (Architect batch jobs and real-time services).
# Windows or UNIX Example
enable elot = yes
Configuring Canada Post (SERP)
Windows
Download the Microsoft® Windows® SERP data update from the DataFlux Customer Portal
and install the file on your DIS machine.
UNIX
Download the SERP data update that corresponds to your operating system from the
DataFlux Customer Portal and install the file on your DIS machine.
12
Delivery Point Validation (DPV) is a USPS database that checks the validity of residential and
commercial addresses.
32
DataFlux Integration Server User's Guide
Setting
canada
post db
Description
This setting indicates the path to the Canada Post database for Canadian address
verification (Architect batch jobs and real-time services).
# Windows Example
canada post db = C:\Program Files\DataFlux\dfPower
Studio\[version]\mgmtrsrc\RefSrc\SERPData
# UNIX Example
canada post db =
/opt/dataflux/aix/dfpower/[version]/mgmtrsrc/refsrc/serpdata
Configuring Geocode/Phone
Windows
Download the Windows Geocode Data Pack from the DataFlux Customer Portal and install
the file on your DIS machine.
UNIX
Download the UNIX Geocode Data Pack from the DataFlux Customer Portal and install the
file on your DIS machine.
Settin
g
geo db
Description
This sets the path to the database for geocoding and coding telephone information
(Architect batch jobs and real-time services).
# Windows Example
geo db = C:\Program Files\DataFlux\dfPower
Studio\[version]\mgmtrsrc\RefSrc\GeoPhoneData
# UNIX Example
geo db =
/opt/dataflux/hpux/dfpower/[version]/mgmtrsrc/fresrc/geophonedat
a
Configuring QAS Data
Windows
Contact QAS to download the latest data files for the countries you are interested in. Once
you have downloaded the data sets, run the installation file and follow the instructions
provided by the installation wizard.
UNIX
Run the installation file on a Windows machine to get the .dts, .tpx, and .zls files, then
transfer all of these to your UNIX environment.
DataFlux Integration Server User's Guide
33
Configure the following QAS files located in the /etc subdirectory of your DIS directory:
•
In the qalicn.ini file, copy your license key for the specific country. Each license key
must be entered on a separate line.
•
In the qaworld.ini file, you must specify the following information:
1.
Set the value of the CountryBase parameter equal to one or more country
prefixes for the countries you have installed. For example, to search using
Australian mappings, add the following line to your qaworld.ini file:
CountryBase=AUS
Additional country prefixes can be added to the CountryBase parameter.
Separate each prefix by a space. For a complete list of supported
countries, see the International Address Data lists at the QAS website.
2.
Set the value of the InputLineCount parameter. Add the country prefix to the
parameter name and set the count equal to the number of lines your input
addresses contain. For example, to define four lines for Australia:
AUSInputLineCount=4
3.
Set the value of the AddressLineCount parameter. Add the country prefix to
the parameter name and set the count equal to the total number of lines. Then,
specify which address element will appear on which line in the input address by
setting the value of the AddressLine parameter equal to a comma-separated
list of element codes. For example:
AUSAddressLineCount=4
AUSAddressLine1=W60
AUSAddressLine2=W60
AUSAddressLine3=W60
AUSAddressLine4=W60,L21
For more information on address elements and configuring the qaworld.ini file, see
QuickAddress Batch API Guide and the country-specific data guides.
•
In the qawserve.ini file, you must specify the following information for each
parameter. If more than one country prefix is added to the parameter, each
subsequent country prefix should be typed on a new line and preceded by a + (plus
sign). For a complete list of supported countries, see the International Address Data
lists at the QAS website.
1.
Set the value of the DataMappings parameter equal to the country prefix,
country name, and country prefix. Separate each value by a comma. For
example:
DataMappings=AUS,Australia,AUS
2.
Set the value of the InstalledData parameter equal to the country prefix and
installation path. Separate each value by a comma. For example:
InstalledData=AUS,C:\Program Files\QAS\Aus\
34
DataFlux Integration Server User's Guide
For more information on configuring the qawserve.ini file, see QuickAddress Batch API
Guide and the country-specific data guides.
Note: If you have existing Architect jobs that include the Address Verification (QAS)
node, your jobs will not work. You must reconfigure your existing jobs to work with the new
QAS 6.x engine.
Configuring AddressDoctor Data
Windows and UNIX
If you are using AddressDoctor data for address verification, download the address files for
the countries you are interested in from the DataFlux Customer Care portal. You will also
need the addressformat.cfg file included with the data files. The addressformat.cfg file must
be installed in the directory where the address data files reside.
Change the world address license and world address database settings in the dfexec.cfg file:
Setting
world
address
license
Description
This is the license key provided by DataFlux that is used to unlock the
AddressDoctor country data. The value must be enclosed in single quotes
(Architect batch jobs and real-time services).
# Example (same for Windows and Unix)
world address license = 'abcdefghijklmnop123456789'
world
This sets the path to where the AddressDoctor data is stored.
address db
# Windows Example
world address db= 'C:\world_data\'
# UNIX Example
world address db= '/opt/dataflux/linux/worlddata'
Configuring LACS and RDI Data
Windows and UNIX
Residential Delivery Indicator (RDI) and Locatable Address Conversion System (LACS) are
provided by the United States Postal Service®. If you are using these products, simply
download the data with your USPS data, and set the applicable settings in the dfexec.cfg
file:
DataFlux Integration Server User's Guide
35
Setting
enable
lacs
Description
To enable LACS processing, set to yes. It is disabled by default (Architect batch
jobs and real-time services).
# Windows or UNIX Example
enable lacs = yes
enable
rdi
This option enables or disables RDI 13 processing (for US Address Verification). By
default, it is set to no (Architect batch jobs and real-time services).
# Windows or UNIX Example
enable rdi = yes
13
Residential Delivery Indicator (RDI) identifies addresses as residential or commercial.
36
DataFlux Integration Server User's Guide
Installing Other DataFlux
Products
DataFlux® is a leader in data quality and data integration. The data cleansing and data
quality suite of applications encompassed by dfPower® Studio can be integrated into the
service-oriented architecture of DataFlux Integration Server (DIS). This architecture can be
customized to your own environment using applications like dfIntelliServer, Quality
Knowledge Bases, and Accelerators. DataFlux Accelerators provide data and workflows to
put common data quality initiatives to work in your organization.
Installing dfIntelliserver
dfIntelliServer is a separate component that provides a simple, scalable, customizable
architecture that allows an organization to integrate the DataFlux powerful data quality
technology into its own applications. To install dfIntelliServer:
1.
From the DataFlux Customer Care portal (http://www.dataflux.com/Customer-Care/),
click Downloads.
2.
Scroll down to dfIntelliServer, select the version corresponding to your operating
system, and download to your computer.
3.
Install dfIntelliServer.
Windows
Double-click the installation file and follow the on-screen instructions.
UNIX
Download the tar.gz file and follow the associated Installation Notes.
For information on configuring and using dfIntelliServer, see DataFlux dfIntelliServer
Reference Guide.
Installing dfPower Studio
dfPower Studio is a powerful suite of data cleansing and data integration applications. With
dfPower Studio, you have access to various applications that can help eliminate data quality
problems. dfPower Studio connects to virtually any ODBC database and can be run from an
intuitive graphical user interface, from the command line, or in batch operation mode. This
gives you flexibility in how your enterprise handles your data quality problems.
DataFlux Integration Server User's Guide
37
Windows
dfPower Studio is supported on the Microsoft® Windows® platform. To install dfPower
Studio, navigate to the DataFlux Customer Portal to download the software.
For information on installing, configuring, and using dfPower Studio, see DataFlux dfPower
Studio Getting Started Guide and DataFlux dfPower Studio online Help.
Installing Quality Knowledge Bases
A Quality Knowledge Base (QKB 14) is a collection of files that define rules, criteria, and data
by which data cleansing can be performed. To install the latest version of the Contact
Information QKB:
1.
From the DataFlux Customer Care portal at http://www.dataflux.com/CustomerCare/, click QKBs under Downloads.
2.
Select the version corresponding to your operating system, and download to your
computer.
3.
Install the QKB according to the operating system you are using.
Windows
Double-click the installation file and follow the on-screen instructions.
UNIX
Download the tar.gz file and follow the associated Installation Notes.
For information on configuring and using Quality Knowledge Bases, see DataFlux Quality
Knowledge Base Reference Guide.
Installing Accelerators
DataFlux Accelerators provide a wide range of pre-built workflows that encompass typical
data quality processes. You also get the tools necessary to effectively diagnose and manage
data quality over time. There are a number of DataFlux Accelerators available. Contact your
DataFlux representative for more information.
For more on configuring and using Accelerators, see DataFlux Accelerator Installation Guide.
14
The Quality Knowledge Base (QKB) is a collection of files and configuration settings that contain all
DataFlux data management algorithms. The QKB is directly editable using dfPower Studio.
38
DataFlux Integration Server User's Guide
Changing Configuration Settings
Once you have completed the installation process, modify the dfexec.cfg file to set the
directory paths for any relevant reference data, for example, United States Postal Service
(USPS 15), Canada Post, Geocoding, and Quality Knowledge Base (QKB 16). You can also
change the default port on which the server is listening. Other settings in the dfexec.cfg file
control memory allocation and enhance clustering performance.
Windows
Modifying Default Configuration Settings
After installing DataFlux® Integration Server (DIS), you must modify some default
configuration settings in order for the jobs and services to run correctly. The dfexec.cfg file
contains configuration settings for real-time services, as well as dfPower® Architect and
Profile jobs. This file is stored in the \etc directory of the DIS installation.
Refer to DIS Configuration Settings for a list of common settings that may need to be
modified before running the server. After making changes to the configuration file, you must
restart the server. For more information, see the DIS Server.
Note: There is an order of precedence for configuration settings. In
general, first a setting is determined by the Advanced Properties of a node in
the job or real-time service. In the absence of a setting, the value is set by
the corresponding entry in the Architect configuration file. If there is no
specific setting, DIS then obtains the setting from the dfexec.cfg file. If the
value has not been set, DIS will use the default value.
Using the Architect Configuration File to Define Macros in Windows
The Architect configuration file (architect.cfg) defines macro values for substitution into
Architect jobs, and overrides predefined values. This file is located in the \etc directory of
the DIS installation. Each line represents a macro value in the form key = VALUE, where the
key is the macro name and VALUE is its value. For example:
INPUT_FILE_PATH = C:\files\inputfile.txt
This entry sets the macro value INPUT_FILE_PATH to the specified path. This macro is
useful when you are porting jobs from one machine to another, because the paths to an
input file in different platforms may not be the same. By using a macro to define the input
file name you do not need to change the path to the file in the Architect job after you port
the job to UNIX®. Add the macro in both the Windows and UNIX versions of the Architect
configuration file, and set the path appropriately in each.
For more information on macros, refer to the dfPower Studio online Help topic, dfPower
Architect - Using Macros.
15
The United States Postal Service (USPS) provides postal services in the United States. The USPS
offers address verification and standardization tools.
16
The Quality Knowledge Base (QKB) is a collection of files and configuration settings that contain all
DataFlux data management algorithms. The QKB is directly editable using dfPower Studio.
DataFlux Integration Server User's Guide
39
Installing Supplemental Language Support
If you plan to use DIS for data that includes East Asian languages or right-to-left languages,
you must install additional language support. To install these packages:
1.
Click Start > Settings > Control Panel.
2.
Double-click Regional and Language Options.
3.
In the Regional and Language Options dialog, select the Languages tab.
4.
Check the boxes marked Install files for complex script and right-to-left
languages (including Thai) and Install files for East Asian languages, found
under Supplemental Language Support.
5.
The Microsoft® Windows® installer will guide you through the installation of these
languages packages.
UNIX
Modifying Default Configuration Settings in UNIX/Linux
After installing DIS, you need to modify some default configuration settings in order for the
jobs and services to run correctly. The dfexec.cfg file contains configuration settings for
real-time services as well as Architect and Profile jobs. This file is stored in the /etc
directory of the DIS installation.
Refer to DIS Configuration Settings for a list of common settings that may need to be
modified before running the server. After making changes to the configuration file, you must
restart the server (see the DIS Server).
Note: There is an order of precedence for configuration settings. In
general, a setting will be determined by (1) the Advanced Properties of a
node in the job or real-time service. In the absence of a setting, the value will
be set by the corresponding entry in (2) the Architect configuration file. If
there is no specific setting, DIS will obtain the setting from (3) the dfexec.cfg
file. If the value has not been set, DIS will use (4) the default value.
Using the Architect Configuration File to Define Macros in
UNIX/Linux
The Architect configuration file (architect.cfg) defines macro values for substitution into
Architect jobs, and overrides predefined values. This file is located in the /etc directory of
the DIS installation. Each line represents a macro value in the form key = VALUE, where
key is the macro name and VALUE is its value. For example:
INPUT_FILE_PATH = /home/dfuser/files/inputfile.txt
40
DataFlux Integration Server User's Guide
This entry sets the macro value INPUT_FILE_PATH to the specified path. This macro is
useful when you are porting jobs from Windows to UNIX®, because the paths to an input
file in those two environments would probably not be the same. By using a macro to define
the input file name you do not need to change the path to the file in the Architect job after
you port the job to UNIX. Simply add the macro in both the Windows and UNIX versions of
the Architect configuration file and set the path in each.
For more information on macros, refer to the dfPower Studio online Help, dfPower Architect
- Using Macros.
Windows and UNIX
Processing Power and Memory Allocation
There are several configuration settings in the dfexec.cfg file that affect system
performance. Specifically, the following settings relate to processing power and memory
allocation:
Memory Allocation
Setting
sort
chunk
Description
This setting allows you to specify the amount of memory to use while performing
sorting operations. Memory may be specified in KB or MB, but not GB (Architect
batch jobs and real-time services).
# Windows or UNIX Example
sort chunk = 128MB
working
This is the path where the server creates its working files and subdirectories. The
path
default directory is the Integration Server /var directory. The value must be
enclosed in single quotes. The location of the working path can affect system
performance.
# Windows Example
working path = 'C:\Program Files\DataFlux\DIS\[version]\var'
# UNIX Example
working path = '/opt/dataflux/solaris/dis/[version]/var'
sort bytes - There is an order of precedence for setting memory for sorting in nodes that
perform this operation. All settings should be in bytes except in the dfexec.cfg file, where
KB or MB can be explicitly specified. The order of precedence works as follows: If property 1
is not set or left as NULL, the VALUE is taken from property 2. When running a job or
service using DIS, the value in the dfexec.cfg file is used if the previous two locations do not
have the value set.
DataFlux Integration Server User's Guide
41
For Surviving Record Identification (SRI) node:
1.
SRI step in Architect - Select Advanced property > MEMSIZE_MBYTES
2.
architect.cfg - SORTBYTES
3.
dfexec.cfg - SORT CHUNK
For Sort node:
1.
Sort step in Architect - Select Advanced property > CHUNKSIZE
2.
architect.cfg - SORTBYTES
3.
dfexec.cfg - SORT CHUNK
Note: In Architect the sort option in the Tools > Options > Step
Specific dialog refers to the same sort value in the architect.cfg file. Setting
it here or directly by editing the architect.cfg file accomplishes the same task.
It is recommended that this memory allocation parameter be set to 75-80% of total
physical RAM to take advantage of the clustering performance in dfPower Studio.
Pre-loading and Clustering
Setting
cluster
memory
Description
The cluster memory is the amount of memory to use per cluster of match-coded
data. Use this setting if you are using clustering nodes in dfPower (Architect
batch jobs and real-time services). This setting can affect memory allocation.
Note: This setting must be recorded in megabytes. For
example, 1 GB should be set to 1024 MB.
# Windows or UNIX Example
cluster memory = 64MB
verify
cache
This number indicates an approximate percentage (0 - 100) of the USPS
reference data set that is cached in memory prior to an address verification
procedure (Architect batch jobs and real-time services). This setting can affect
memory allocation.
# Windows or UNIX Example
verify cache = 30
verify
preload
This option allows you to specify a list of states for which address data is
preloaded. Pre-loading causes an increase in memory usage but can significantly
decrease the time required to verify addresses in that state (Architect batch jobs
and real-time services). This setting can affect memory allocation.
# Windows or UNIX Examples
verify preload = NY TX CA FL
verify preload = ALL
42
DataFlux Integration Server User's Guide
verify cache and verify preload - If verify cache is not set using dfPower, DIS uses the
value set in the dfexec.cfg file. The verify cache variable indicates an approximate
percentage of how much of the United States Postal Service (USPS 17) reference data set will
be cached in memory prior to an address verification procedure. The verify preload setting
allows you to specify a list of states whose address data will be preloaded. Pre-loading will
cause an increase in memory usage but can significantly decrease the time required to
verify addresses in that state.
cluster memory - The DataFlux clustering engine allows developers to increase Architect
job efficiency by grouping similar data items together based upon the information defined in
the Quality Knowledge Base (QKB 18). The cluster memory is the amount of memory to use
per cluster of match-coded data. Use this setting if you are using clustering nodes in
dfPower.
There is an order of precedence for setting memory for clustering and sorting in nodes that
perform clustering operations. All settings should be in bytes. The default setting for
clustering memory allocation is 67108864 bytes (64MB).
The order of precedence is as follows: If property 1 is not set or is left as NULL then the
value is taken from property 2. In the batch and real-time clustering nodes, there is an
Override clustering memory size option. Choosing this to override the clustering value is
the same as setting the advanced property listed below:
For clustering and clustering update nodes:
1.
Clustering/Cluster Update step in Architect - Select Advanced property >
CLUSTER/BYTES.
2.
architect.cfg - CLUSTER/BYTES
3.
dfexec.cfg - CLUSTER MEMORY
For exclusive real time clustering and concurrent real time clustering nodes:
1.
Exclusive real time clustering/concurrent real time clustering in Architect - Select
Advanced property > MEMSIZE.
2.
architect.cfg - CLUSTER/BYTES
3.
dfexec.cfg - CLUSTER MEMORY
17
The United States Postal Service (USPS) provides postal services in the United States. The USPS
offers address verification and standardization tools.
18
The Quality Knowledge Base (QKB) is a collection of files and configuration settings that contain all
DataFlux data management algorithms. The QKB is directly editable using dfPower Studio.
DataFlux Integration Server User's Guide
43
All clustering nodes have an advanced property called DELAYMEMALLOC. This setting is
closely tied to the memory allocation properties for clustering. If DELAYMEMALLOC is set to
true, then memory is allocated for the clustering node at the instance where the first row is
about to be passed through this specific node. If it is set to false (the default value), all
clustering memory will be allocated prior to the first row passing through the entire job. In
the former, memory could be released and made available for later clustering node calls.
Keep in mind that if the memory is not freed and you have over-allocated memory in your
job or service, setting DELAYMEMALLOC to true may prevent you from discovering this until
Architect is already part of the way through processing the job or service. If it is set to false
and memory has been over-allocated, you will know prior to Architect running the job or
service.
Note: For all sorting and clustering memory settings, you can choose to
create macros to control memory settings. In this way, memory settings for
different types of jobs or services can be set independently of the default
macros that are used globally by all similar nodes.
fd table memory - This setting allows you to manually configure the amount of memory
being allocated per table column to the Frequency Distribution Engine (FRED). By default
FRED allocates 256 KB (512 KB if 64-bit) per column being profiled. This amount (the
number of columns * the amount specified per column) is subtracted from the total amount
configured by the user in the Job > Options menu of dfPower Profile Configurator
(Frequency Distribution memory cache size).
The amount of memory remaining in the available pool is used for other data operations.
For performance reasons the amount of table memory should always be a power of 2.
Setting this value to 1 MB (Note that 1 MB = 1024 * 1024 bytes, not 1000 * 1000 bytes)
yields optimal performance. Setting it to a value larger than 1 MB (again, always a power of
2) may help slightly with processing very large data sets (dozens of millions), but might
actually reduce performance for data sets with just a few million rows or fewer. If you set
the amount of table memory too high you may not be able to run your job because it will
not be able to initialize enough memory from the available pool.
Controlling Processing
accept timeout - The DIS loop, which runs continuously, is organized as follows:
•
wait for a new connection/request (ACCEPT)
•
process request
•
check on status of running services
•
check if any services have results ready to send back
•
check on status of running jobs
This setting allows the user to determine how long DIS waits before checking for new
requests. The default value is 0.5 seconds. It can be lowered to as low as a microsecond.
However, if this delay is lowered and DIS is sitting idle, it causes DIS to use more
resources. The frequency of loop iterations increases, so DIS performs more tasks in the
same period of time. If the delay is removed altogether and DIS is sitting idle, it will use
100% of available CPU processing power, because it will be continuously checking for
statuses or available results. If the delay is reduced to a few hundreds or thousands of
44
DataFlux Integration Server User's Guide
microseconds, when DIS is sitting idle it uses closer to 10% of the CPU power (depending
on the exact delay setting). You should be aware of this increased load on the CPU due to a
decrease in the delay setting before making adjustments to accept timeout. This does not
apply to cases when DIS is used heavily, because there is no delay if requests are coming in
frequently.
The amount of delay is configurable in the dfexec.cfg file. The format is:
accept timeout = <time value>
# A positive value measures seconds
# A negative value measures microseconds (10 ^ -6)
# If not set it defaultS to 0.5 seconds (value of -500000)
# Windows and UNIX example:
accept timeout = -500000
Troubleshooting Log
log packets — This setting can be added to the dfexec.cfg file to log all SOAP Packet
activity. The default is no.
Note: Be advised that enabling this feature can produce very lengthy log
files, and slow down DIS performance.
Generally, log packets is enabled only for troubleshooting. The format is:
log packets = <yes or no>
# Windows and UNIX example:
log packets = yes
Note: You must restart the server after you make changes to the
configuration settings. See DIS Server.
DataFlux Integration Server User's Guide
45
Configuring DataFlux Integration
Server to Use the Java Plugin
The dfPower® Architect Java™ Plugin node is a node for Windows®, Solaris®, Linux®, and
HP-UX Itanium. dfPower Studio or DataFlux® Integration Server (DIS) must be properly
configured to run jobs containing the Java Plugin node. The following sections explain the
configuration requirements.
Java Runtime Environment
Windows and UNIX
The primary configuration requirement is that the Java runtime environment (JRE™) must
be installed on your machine. The Java Plugin currently supports the JRE version 1.4.2 or
later. The actual location of the installation is not important, as long as the dfPower
Architect or DIS process can read the files in the installation. The dfexec.cfg file should
contain a setting called java vm that references the location of the Java Virtual Machine
JVM™ DLL (or shared library on UNIX® variants). In the Sun™ JRE, for example, the
location of this file is typically:
[JRE install directory]/bin/server/jvm.dll
If this setting is not configured properly when a job using the Java Plugin runs, you will
receive an error that the JVM could not be loaded. Also, your Java code must be compiled
using a Java Development Kit (JDK™) of the same version or earlier than the JRE version
you plan to use to run your job. For example, compiling your code using JDK 1.5 or later
and running the code in the Java Plugin using JRE 1.4.2 will generate an error that the class
file format is incorrect.
Java Classpath
Windows and UNIX
The location of your compiled Java code (as well as any code that it depends upon) must be
specified in the classpath setting in the dfexec.cfg file. The code must also be physically
accessible by the dfPower Architect or DIS process. The setting is called java classpath.
Note: On UNIX variants, you must separate the path components with a
colon (:).
If the java classpath setting is incomplete, Architect or DIS will report an error because
the code could not be loaded. Check to make sure your code and any dependencies are
accessible and specified in the classpath setting.
If the java classpath setting is empty, the only Java code that will be accessible to the
Java Plugin are the examples that ship with Architect and DIS. Refer to DataFlux dfPower
Studio Online Help, "Architect - Java Plugin - Examples" for information.
46
DataFlux Integration Server User's Guide
Environment Variables
UNIX
Using the Java Plugin on AIX
Before starting the server or running any jobs with dfexec, you must set the following
environment variables. The LIBPATH setting assumes you are using Java 1.4 with the classic
Java Virtual Machine (JVM). If you are using the J9 JVM, substitute j9vm for classic. If you
are using Java 5, substitute java5_64 for java14_64.
LIBPATH
export LIBPATH=/usr/java14_64/jre/bin:/usr/java14_64/jre/bin/classic
LDR_CNTRL
export LDR_CNTRL=USERREGS
UNIX
Using the Java Plugin on HP-UX PA-RISC
Before starting the server or running any jobs with the dfexec.cfg file, you must set the
following environment variable. The LD_PRELOAD example assumes you are using Java 1.4
with the Server JVM. If you are using a different JVM, set the path accordingly. In all cases,
the path should be the same as the path used for the java vm setting in the dfexec.cfg file.
LD_PRELOAD
export LD_PRELOAD=/opt/java1.4/jre/lib/PA_RISC2.0W/server/libjvm.sl
UNIX
Using the Java Plugin on Solaris and HP-UX Itanium
There is no need to set environment variables on Solaris, Linux, and HP-UX Itanium to use
the Java Plugin. Do note that the Java Plugin currently supports the Sun JRE version 1.4.2
or later.
Optional Settings
Windows and UNIX
There are two other settings in the dfexec.cfg file that affect the operation of the Java Plugin
node. They are not required for normal use but they are available for use by developers for
debugging purposes. The settings are java debug and java debug port.
DataFlux Integration Server User's Guide
47
Java debug should be set to Yes or No. When set to yes, debugging in the JVM used by
Architect or Integration Server is enabled. By default this setting is set to no.
Java debug port should be set to the port number where you want the JVM to listen for
debugger connect requests. This can be any free port on the machine. This setting has no
effect if java debug is set to no.
Note: The Java debugger cannot connect until dfPower Architect initializes
the JVM in process. This happens when a Java Plugin Properties dialog is
opened in Architect or when a Java Plugin node in the job is executed or
previewed. If you have multiple Architect or Integration Server processes
running concurrently on the same machine, only the first to respond to load
the JVM secures the debugging port. All subsequent processes will not
respond to the Java debugging connection requests.
48
DataFlux Integration Server User's Guide
Pre-loading Services
DataFlux® Integration Server (DIS) can preload selected services on startup. This is helpful
if you typically use the same services each time you run DIS and would like to have these
services available as soon as DIS is running.
There are two configuration directives available that cause DIS to preload services; these
can be set by the DIS administrator:
•
dfsvc preload all = [count]
•
dfsvc preload = [count]:[name of service] [count]:[name of service] ...
The two formats can work independently or together, depending on how you configure
them.
Pre-loading all services
The first directive, dfsvc preload all = [count], causes DIS to find and preload all services
[count] times. This includes services found in subdirectories. The number of instances of
each service (count) must be an integer greater than 0, or the directive is ignored.
For example, dfsvc preload all = 2 causes DIS to preload two instances of each service
that is available, including those found in subdirectories.
Pre-loading one or more specific services
The second directive, dfsvc preload = [count]:[name of service], lets you designate the
specific services, as well as the count for each service, that DIS is to preload on startup. Use
additional count/service elements [count]:[name of service] for each service, and separate
each element by one or more white space characters. All elements must be listed on a
single line. Using this format, you can configure a directive that starts a number of services,
with each service having a different count.
For example, dfsvc preload = 2:abc.dmc 1:subdir1\xyz.dmc loads two counts of abc
service, and one count of xyz service, which is located in subdirectory subdir1.
Complex configurations
By combining the two directives, you can configure more complex preloads. The two
directives add the counts arithmetically to determine how many services are actually
loaded. (Internally, DIS builds a list of all services it needs to preload and, for each service,
sets the total count.)
The following two example directives illustrate the logic of how this works:
dfsvc preload all = 2
dfsvc preload = 2:svc1.dmc -1:subdir1\svc2.dmc -2:svc3.dmc
DataFlux Integration Server User's Guide
49
The first directive instructs DIS to preload a total of two instances of all existing services.
The second directive modifies this in the following ways:
•
Two additional counts of svc1.dmc are added, for a total of four instances. The counts
are added together, and the total is the number of instances that DIS tries to preload.
•
Svc2.dmc, which is found in the subdir1 subdirectory, has a -1 count. This produces a
total count of one for svc2.dmc.
•
For svc3.dmc, there is a combined total count of zero, so this service is not loaded at
all. The value of [count] must be greater than zero for a service to be preloaded.
Some important points to remember:
50
•
DIS attempts to preload a single instance of all requested services before trying to
preload more instances (if more than one instance is specified).
•
The service name can include the service's path (relative to the root of the services
directory). Example: 1:subdir1\svc2.dmc specifies one instance of service svc2.dmc,
which is located in the subdirectory subdir1.
•
Count can be a negative value (meaningful only when both configuration directives
are used together).
•
Pre-loading stops when DIS has attempted to preload all required instances
(successfully or not), or if the limit on the number of services has been reached. The
limit can be specified by dfsvc max num =, and will default to 10 if not specified.
DataFlux Integration Server User's Guide
Multi-threaded Operation
DataFlux® Integration Server (DIS) and its components operate in a multi-threaded
configuration using two servers. Both servers are part of a single DIS process, but run in
independent threads on different ports and share the same thread pool. This thread pool
manages the process threads. When DIS creates this pool, it determines how many total
threads the pool is allowed to have, how many it should allow to stay idle, and how much
time should pass after a thread becomes idle before it is killed (to conserve system
resources).
The two servers are:
•
SOAP server, whose main thread (started by DIS) runs a loop that accepts clients'
connections and hands each one off to a thread pool (among other functions).
•
Wire Level Protocol (WLP) server, which accepts connections over TCP/IP.
Two configuration directives control whether the servers run:
•
svr run dis = [yes/no] (default is yes)
•
svr run wlp = [yes/no] (default is no)
Each request is handed off to a separate thread, so multiple requests can be processed in
parallel. Requests are handled as follows:
•
For non-real-time requests (such as running batch jobs, or listing jobs or services) the
thread handling the request also sends the response for that request. At that point,
the thread exits and goes back to the available thread pool.
•
For real-time requests (such as to get a service's metadata or to run a service) the
thread handling the request starts the process and passes any commands and data,
and then exits without sending the response.
There are three additional configuration directives that determine how the thread pool
operates:
•
svr max threads = [# of threads] If WLP server is to run, at least two threads are
used; if SOAP server is to run, at least four threads are used; DIS automatically
adjusts this value to the required minimum if the configured value is too low.
•
svr max idle threads = [# of threads] Will always be at least 1. This directive should
be treated as an advanced configuration, and should be used only when needed to
troubleshoot performance problems.
•
svr idle thread timeout = [# of microseconds] Defaults to 5 seconds if not set or if set
to less than 1 microsecond. This directive should be treated as an advanced
configuration, and should be used only when needed to troubleshoot performance
problems.
DataFlux Integration Server User's Guide
51
DataFlux Integration Server
Connection Manager
The DataFlux® Connection Manager provides the user the functionality to store the
necessary credentials once in order to easily connect to a data source in the future. This
allows better protection and management for security, confidentiality, and a more versatile
way to handle access to data sources that require authentication.
When DataFlux dfPower® is installed on Microsoft® Windows®, you can access the
Connection Manager using the Start menu. Select Start > Programs > DataFlux
Integration Server [version] > dfConnection Manager. In UNIX® there is a program
called dfdbview, which serves a similar purpose. The purpose of Connection Manager is to
save connection information with encryption so it does not need to be stored inside the job,
or entered at the time the job is run.
Using Connection Manager on Windows
The purpose of the Connection Manager is to save connection information with encryption so
it is not necessary to enter it when the job is run, or store it inside the job. When you run
Connection Manager in Microsoft Windows, you will see a list of all of the available
connections with either a yes or a no next to them. When you select a connection and click
Save, you are prompted for the user name and password. If the connection is successful,
this information is saved in a file in the \dac directory or the \dfdac subdirectory, depending
on whether or not a registry key was found. If one or both of the following registry entries
exist, the information is saved in the following directories:
•
HKEY_CURRENT_USER\Software\DataFlux
Corporation\dac\[version]\savedconnectiondir
•
HKEY_LOCAL_MACHINE\Software\DataFlux
Corporation\dac\[version]\savedconnectiondir
where [version] indicates the version of DIS that you have installed.
If the above does not exist, it is saved in the user's home directory in a subdirectory called
\dfdac. This file is a plain text file containing all the information that was used to connect.
The user name and password are encrypted.
Using Connection Manager on UNIX
In UNIX, the Connection Manager is a program called dfdbview. When you run dfdbview -t s [connection name], you are prompted for a user name and a password. If the connection
is successful, the connection information is saved to a file in the $HOME/.dfpower/dsn
directory. It is saved as a plain text file with all of the information used to connect, with the
user name and password encrypted.
52
DataFlux Integration Server User's Guide
Sharing Connection Information
When you save a connection, you can use the saved information with any of the DataFlux
applications. In Profile, if you create a job that connects to one or more data source, the
application recognizes that your connection information is saved. It saves the name of the
connection, and does not save it inside the job. When you run the job in UNIX, the system
recognizes that a connection name is present, and looks for a saved connection in the above
location.
Note: The connection names in Windows and UNIX must correspond. Use
the connection names in the Architect and Profile jobs to describe your data
sources. The odbc.ini file and the saved connection information on the target
system determines how to connect.
Connection Manager User Interface
DataFlux® Integration Server (DIS) Connection Manager allows you to save encrypted
connection information, so that it does not need to be stored inside the job or entered at the
time the job is run.
DIS Connection Manager
Select Make saved connections available to all users of this machine to allow all
users on the machine to access the saved connections.
DataFlux Integration Server User's Guide
53
The following table describes the options available on the DataFlux Saved Connection
Manager dialog:
Button
Name
Description
Save
Click to connect to the selected data source and save the
connection information. Enter the necessary user information
and logon credentials. Your entry is stored and used internally
the next time you log in to the desired data source.
Clear
Select the desired database and click Clear to delete the
existing authentication credentials and set new credentials.
Open ODBC
Administrator
Click to open the ODBC Data Sources Administrator dialog.
Open Connection Click to open the DataFlux Connection Administrator
dialog.
Administrator
Help
54
Click Help to open the Help for the Connection Manager.
DataFlux Integration Server User's Guide
DataFlux Integration Server
Manager
The DataFlux® Integration Server (DIS) allows users to run real-time dfPower® Architect
services as well as batch Profile and Architect jobs on a server from a remote client. DIS
runs on Linux®, Solaris™, HP-UX, AIX®, and Microsoft® Windows®. DataFlux provides a
Windows client application called DIS Manager that can be launched from the Windows
Start menu to manage the jobs and services.
Real-time Architect jobs are called Architect real-time services, while batch jobs are called
Architect jobs and Profile jobs. The common term for services and jobs is objects.
When a client application such as DIS Manager runs a service, it connects to DIS and tells it
which job to run. The client stays connected until the service finishes running and receives
status and result data. When a client application runs a job, it connects to DIS, tells it which
job to run, then disconnects. Determining status and job termination and accessing the log
files require a new connection to DIS.
Each time DIS is started, it generates a unique log file in the directory set in the dfexec.cfg
file. The log file contains detailed information on what commands DIS received and the
responses set, including errors. Log file names start with a time-stamp.
DataFlux Integration Server Manager User
Interface
The DataFlux® Integration Server (DIS) Manager is a Microsoft® Windows® client
application used to manage Architect real-time services and batch Profile and Architect jobs
on a server from a remote client.
DataFlux Integration Server User's Guide
55
DataFlux Integration Server (DIS) Manager
This section describes the options available from the drop-down menus.
File
Change User - If DIS security is enabled, this option lets you change the user name
and password for the Integration Server.
Exit - This option closes DIS Manager.
View
Toolbar - Toggles the toolbar on and off.
Status Bar - Toggles the Status Bar on and off.
Refresh - Refreshes the window by querying the server and returning the most
current information for the job status.
56
DataFlux Integration Server User's Guide
Actions
Upload
Architect Jobs - Displays the Upload Architect Job dialog which allows you to
choose the Architect job you want to post to the server.
Profile Jobs - Displays the Upload Profile Job dialog which allows you to
choose the Profile job you want to post to the server.
Real-Time Services - Displays the Upload Real-Time Service dialog which
allows you to choose the service you want to post to the server.
For all of the above options, if the Open jobs and services from the
Management Resources Directory option is checked under Tools >
Options, you can select only jobs stored in the dfPower® Studio vault. If that
option is not selected, you will see a standard dialog that lets you choose a job
from any location.
Note: DIS may appear unavailable to other clients trying to connect if
large files are being uploaded. This should not cause any issues with
normal job files but may cause problems if a user attempts to upload any
other file type that is not a DataFlux job file. If this is possible, the
DIS administrator (admin) should restrict access to the post/delete
command in the dfexec.cfg file.
Run Job - Submits a job for execution. The Run Job option is only available for
Profile and Architect jobs; you cannot run real-time services using this option. To
execute a real-time service, use the Test Real-time Service option available under
the Actions menu.
Stop Job - Terminates a job that is currently running.
Test Real-time Service - Displays the Real-Time Services Testing dialog, which
allows you to manually enter data to test real-time services.
Delete - Removes a job or service from the list of available jobs.
View Log - Displays the log file for the selected job ID. In order to enable this option,
you must first select a job in the list of available jobs, then select a Job ID from the
job status area at the bottom of the dialog.
Clear Log - Clears the log file for the selected job ID and removes the log file from
the job status area.
Help Topics - Opens the Help system in a Web browser.
DataFlux Integration Server User's Guide
57
Tools
Options - Select Options to display the Options dialog, which allows you to
configure DIS Manager to connect to any active Integration Server. In
addition to the server configuration parameters, you can also choose to load
your jobs and services from the Management Resources Directory.
DIS Manager Options Dialog
Options Dialog
Server Name - The name of the machine where the server is installed (use
localhost if it is running on your own machine).
Server Port - The port that was designated when the server was installed and
configured.
Refresh Interval - The amount of time (in seconds) between automatic
refreshes. Set this value to 0 to turn off the automatic refresh option.
Open jobs and services from the Management Resources Directory When selected, jobs and services can be loaded only from the Management
Resources Directory. When unchecked, jobs and services can be loaded from
any available directory. Additional fields appear on the upload dialog enabling
you to select the location of the file to be uploaded.
Help
Help Topics - Opens the DataFlux dfPower Studio online Help. Note that you might
receive a warning from Microsoft Internet Explorer® that it has blocked the help
content. Click the warning bar, then click Allow Blocked Content in order to view
the Help.
DataFlux Integration Server Version - Displays the DIS version you are currently
running.
About DataFlux Integration Server Manager - Displays a dialog containing the
version number, contact information, and copyright notices. It also displays a link that
opens another dialog that allows you to check the library and database versions.
58
DataFlux Integration Server User's Guide
DIS Manager Window - Other Elements
Real-Time Services - Select this tab to work with real-time services.
Architect Jobs - Select this tab to work with Architect jobs.
Profile Jobs - Select this tab to work with Profile jobs.
Item Name - The Item Name is the name of the job or service.
Status of All Jobs - Select this tab to view the status of all jobs by all users.
Status of My Jobs - Select this tab to view the status of the jobs associated with the
current users. To change users, select File > Change Users.
Status of Architect Jobs - This status tab is available when the Architect Jobs tab is
selected. It allows you to view the status of Architect jobs.
Status of Profile Jobs - This status tab is available when the Profile Jobs tab is selected. It
allows you to view the status of Profile jobs.
Job Name - This is the Remote Name assigned to the job when it was uploaded. Click the
column name to sort the jobs by name.
Request ID - A unique identifier for each run of a job. The request ID links the job to the
log file. Double-click the Request ID to launch the log file viewer.
Job Owner - The user ID associated with each job. Click the column name to sort the jobs
by user ID.
Status - Displays the current status of the job. Double-click Status to launch the log file
viewer.
Toolbar
The DIS Manager toolbar provides buttons to quickly access several of the commonly used
main menu options.
Button Name
Upload
Description
Opens the upload dialog for Real-Time Services, Architect Jobs, or
Profile Jobs, according to the tab selected before clicking the button.
Download Opens the download dialog for Real-Time Services, Architect Jobs, or
Profile Jobs, according to the tab selected before clicking the button.
Delete
To delete jobs or services, click the appropriate tab and Item Name,
then click Delete.
DataFlux Integration Server User's Guide
59
Button Name
Run
60
Description
To run a job, click the appropriate tab and Item Name, then click
Run.
Stop
To stop jobs or services, click the Job Name in the Status panel, then
click Stop.
Test
Service
To test a service, click the Real-Time Services tab and the Item
Name, then click Test Service.
Refresh
Click Refresh to refresh the screen.
View Log
Click on a job or service, then click View Log to bring up the log for
that job or service in the Log Viewer.
Clear Log
Click on a job or service, then click View Log to bring up the log for
that job or service in the Log Viewer. The Job Name will be removed
from the status panel.
Help
Topics
Click Help Topics to open the Help for dfPower Studio.
DataFlux Integration Server User's Guide
Using DataFlux Integration Server
Manager
Once DataFlux® Integration Server (DIS) is installed and running, you can use DIS
Manager to test the connection to the server. DIS Manager comes installed in both the
DataFlux dfPower® Studio and the DIS file groups on Microsoft® Windows® machines. The
DIS Manager allows users to perform the following actions:
•
Upload and download batch jobs and real-time services
•
Run jobs and stop jobs
•
Test real-time services
•
Delete jobs and services
•
Monitor job status
•
Use log files
Uploading Batch Jobs and Real-Time Services
Once an object (a batch job or real-time service) has been created in dfPower Studio, you
can upload it to the Integration Server for use by DIS.
If you checked Open jobs and services from the Management Resources Directory on
the Tools > Options dialog, then all object files are uploaded to the default directory
(folder) specified in the dfexec.cfg file. You can, however, create new subdirectory folders
under the default directories and place files within them. This can make it easier to group
objects by related tasks or other criteria.
Note: You cannot use backward or forward slashes (\ or /) in job file
names, as these characters are used for designating components within path
descriptions.
If you unchecked Open jobs and services from the Management Resources Directory
on the Tools > Options dialog, then you can upload the object files to any accessible
directory. This makes it easy to group jobs and services by any criteria you wish and place
them in a convenient location, including on other servers.
Uploading Objects to the Default Directory
To upload one or more objects to the default location:
1.
Make sure that Open jobs and services from the Management Resources
Directory on the Tools > Options dialog is checked.
2.
On the DIS Manager main window, click the tab corresponding to the type of upload
you want (Architect Jobs, Profile Jobs, or Real-time Services).
DataFlux Integration Server User's Guide
61
3.
Click Actions > Upload (or just click the upload button
). An upload dialog
appears with the available objects listed in the Available pane on the left.
4.
Select one or more objects to upload, and then click Add single
(or Add multiple
if adding multiple objects at one time). The selected objects are moved to the
Selected pane.
5.
Click OK.
If you wish to prevent an object from being added, select one or more object files in the
Selected pane and click Delete
.
To upload one or more files to a subdirectory under the default directory:
1.
Move the files to the Selected pane using the preceding procedure. Do not click OK.
2.
in the row containing the desired file. A Browse dialog
Click Remote Folder
appears that lists available subdirectory folders.
3.
Select the desired subdirectory (or click New to create a new subdirectory folder).
4.
Click OK to close the browse dialog.
5.
Click OK to close the upload dialog and upload the files.
Uploading Objects to a Specified Directory
To upload one or more objects to your chosen location:
1.
Make sure that Open jobs and services from the Management Resources
Directory on the Tools > Options dialog is unchecked.
2.
On the DIS Manager main window, click the tab corresponding to the type of upload
you want (Architect Jobs, Profile Jobs, or Real-time Services).
3.
). An Upload dialog appears with the
Click Actions > Upload (or click Upload
available objects listed in the Available pane on the left.
4.
Select the objects you want to upload, and then click Add single
(or Add All
if adding all objects at one time). The selected objects are moved to the Selected
pane.
5.
An additional Directory field at the top of the dialog allows you to choose the directory
where the files are to be uploaded. Manually enter the path for the directory, or click
Folder
and navigate to the desired folder. Note that if you cannot create a new
folder from this dialog; the folder must already exist.
62
6.
In the File field at the top of the dialog, select either the default file format or All
Files for the object type you chose in Step 2. This determines which object files
appear in the Available pane on the left of the dialog.
7.
Click OK to close the browse dialog.
DataFlux Integration Server User's Guide
8.
Click OK to close the upload dialog and upload the files.
If you wish to prevent an object from being added, select one or more object files in the
Selected pane and click Delete
.
To upload one or more files to a subdirectory under the selected upload directory:
1.
Move the files to the Selected pane using the preceding procedure. Do not click OK.
2.
in the row containing the desired file. A Browse dialog
Click Remote Folder
appears that lists the available subdirectory folders.
3.
Select the desired subdirectory (or click New to create a new subdirectory folder).
4.
Click OK to close the browse dialog.
5.
Click OK on the upload dialog and upload the files.
Downloading Batch Jobs and Real-Time
Services
Objects residing on the Integration Server can be downloaded to your local dfPower Studio
installation using the following procedure:
1.
On the DIS Manager main window, click the tab corresponding to the type of
download you want (Architect Jobs, Profile Jobs, or Real-time Services).
2.
Click Actions > Download (or click Download
). A download dialog appears with
the available objects listed in the Available pane on the left.
3.
Select one or more objects to download, and then click Add
(or Add All
if
adding all files at one time). The selected objects are moved to the Selected pane.
4.
in the row containing the desired files. A Browse dialog appears
Click Local Folder
that lists available subdirectory folders. Note that if you cannot create a new folder
from this dialog; the folder must already exist.
5.
Select the local folder where you want the downloaded file.
6.
Click OK to close the Local Folder selection dialog.
7.
Click OK on the download dialog and download the files.
If you wish to delete a file or files from being added, select one or more files in the Selected
pane and click Delete
.
DataFlux Integration Server User's Guide
63
Running and Stopping Jobs
From the DIS Manager main menu, click the tab corresponding to the type of object you
want to run (Architect jobs, Profile jobs, or Real-time Services). To run a job from
DIS Manager, right-click on the job name and select Run Job. The job appears in the job
status pane at the bottom of the screen. From there, you can right-click on the job name
and select Stop Job. If you are running a Profile job, you must specify either File output or
Repository output.
Note: The name of the output report will have a .pfo extension.
Testing Real-Time Services
To test real-time services, right-click on the name of the service in DIS Manager. The RealTime Service Testing dialog opens. Enter your test data here, and click Run Test.
Deleting Jobs and Services
To delete jobs and services, right-click on the name of the job or service, and select Delete.
Monitoring Job Status
In the bottom panel of DIS Manager, you can see the status of all jobs. Double-click on the
Job Name to view the Job Log file.
Using Log Files
Job, Real-Time Service, and Server Logs
Three log files are generated from DIS.
For...
Log Files Are Stored in...
Windows
The \log directory of the DIS installation, by default.
UNIX®/Linux®
The /etc directory, by default.
The log files are:
Job Log - When running a Profile or Architect batch job a file is created
named XXXXXXXX_archjob (or profjob) _JOBNAME.log (for example,
1164914416_1164915225_90_archjob_Arch_0.dmc.log). This is the log file
retrieved when you view the job status from the DIS Manager dialog.
DIS Log - Each time the server is restarted, a new log file is created named
XXXXXXXX_DIS.log (for example, 1165337041_0_002F1C_DIS.log). This log
tracks connections and requests to and from DIS, and stores some basic
configuration settings at the beginning of the log.
64
DataFlux Integration Server User's Guide
Real-Time Service Log - This log is named
XXXXXXXX_archsvc_SERVICENAME.log (for example,
1165354706_1165354976_21_archsvc_SVC_address_verification_us.dmc.log
). It is generated every time a service is executed if the dfsvc debug option
has been set to yes in the dfexec.cfg configuration file. If the dfsvc debug
option is not set or commented out, the files will not be created.
Using DAC Logging in Windows
You can also enable additional logging, known as Data Access Component (DAC 19) logging.
This log provides more information when users experience problems connecting to
databases.
1.
Go to the Windows Registry and click Start > Run.
2.
In the Open field, type regedit.
3.
From the Windows Registry, create one or both of the following strings:
•
HKEY_CURRENT_USER\Software\DataFlux Corporation\dac\[version]\logfile
•
HKEY_LOCAL_MACHINE\Software\DataFlux Corporation\dac\[version]\logfile
where [version] indicates the version of DIS that you have installed.
4.
Set logfile to the path and filename where logging output is to be sent.
Note: If this entry is empty or does not exist, no logging will occur.
To turn off DAC logging, repeat these steps. DAC Logging can lead to large log files, and can
decrease performance. Be sure to turn the log off once required information is captured.
Using DAC Logging in UNIX/Linux
DAC logging provides more information than the job log, DIS log, and real-time service log.
This information can aid you in troubleshooting.
1.
Add a file named sql_log.txt (all lowercase) to the working directory the dfexec.cfg file
is using to run jobs.
Note: The working directory is under the setting, working path in the
dfexec.cfg file, which is the configuration file the Integration Server reads to
obtain its settings. The dfexec.cfg file is located in the .$DFEXEC_HOME/etc
directory.
In most cases (utilizing the default paths) the directory where this file should
be placed is $DFEXEC_HOME/var/dis_job_io.
19
A data access component (DAC) allows software to communicate with databases and manipulate
data.
DataFlux Integration Server User's Guide
65
2.
After adding the file, stop and restart the server.
Important: DAC Logging can lead to very large log files, and can
decrease performance on the server. Be sure to turn off DAC logging once the
required information is captured.
dfexec Return Codes
When dfexec is called from an external program, such as a scheduler, it produces return
codes to indicate its status. The return codes and their meanings are:
Return Code
66
Description
0
Job is still running
1
Job has finished successfully
2
Job has finished with errors: Unspecified internal error
3
Job has finished with errors: Invalid command-line parameters
4
Job has finished with errors: Invalid configuration
5
Job has finished with errors: Failed during job execution
6
Job has finished with errors: Licensing error
7
Job has finished with errors: Invalid or unsupported locale
8
Job has crashed
9
Job was terminated
DataFlux Integration Server User's Guide
Command Line Options
DataFlux® dfPower® Studio Architect jobs and Profile jobs can be run from the command
line. Running uploaded jobs from the command line allows users to call these jobs from
their own scheduling software, or write scripts that call these jobs.
Use the following command line options as needed:
Windows
To run jobs from the command line on Microsoft® Windows® computers, use the following
string. Note that the input macros are optional:
set <macro1>=<value1> && set <macro2>=<value2> && <dfexec path>\dfexec
[options] <job path and name>
To run jobs without using the optional input macros, use:
<dfexec path>\dfexec [options] <job path and name>
UNIX
To run jobs from the command line on UNIX® systems, use the following string. Note that
the input macros are optional:
<macro1>=<value1> <macro2>=<value2> ./<dfexec path>/dfexec [options]
<job path and name>
To run jobs without using the optional input macros, use:
./<dfexec path>/dfexec [options] <job path and name>
Windows and UNIX Options
The following options for dfexec are used for both Windows and UNIX/LINUX, unless
otherwise noted:
Options
Description
-i
interactive mode
-q
quiet (no status messages)
-cfg FILE
use alternate configuration file
-env FILE
use file for environment variable (UNIX ONLY)
-log FILE
use FILE for logging output
--version
display version information
--help
display option information
DataFlux Integration Server User's Guide
67
Additional options for Architect jobs:
Options
-w
Description
write default target's output to terminal
-fs SEP
-m MODE
use SEP as field separator for terminal output
execution mode: d(efault); s(erial); p(arallel)
Additional options for Profile jobs:
Options
Description
-o OUTPUT
output file or repository name; file names must end with .pfo
-n NAME
report name
-a
append to existing report
-desc DESC
optional job description
68
DataFlux Integration Server User's Guide
DIS Security Manager Concepts
Windows and UNIX
Users
When DataFlux® Integration Server (DIS) security is enabled, the user must be
authenticated using a user name and password. When jobs are run with DIS security
enabled, note the following:
•
•
From the perspective of the operating system, the process owner is the account under
which DIS runs.
From the perspective of DIS, the process owner is the user who started the job.
In addition, when DIS security is enabled, DIS makes available the name of the user
executing the batch job or real-time service. For any requests received by DIS, the server
logs the name of the user who sent the request to a DIS log file. If the request is to run a
batch job, DIS sets the environment variable DFINTL_DIS_USER for that batch job, along
with the name of the user who is executing the job. If the request is to execute a real-time
service, DIS sets the macro value DFINTL_DIS_USER for that service, along with the name
of the user who is executing the service.
By default, DIS security is disabled. While DIS security is disabled, there is no user
authentication process and all jobs show the process owner as Administrator. When a
request is received by DIS, the macro value or environment variable DFINTL_DIS_USER will
show administrators as the owner.
The DIS security administrator (admin) can create users and assign users to groups. All
user accounts must be added to the users file located in the security path specified in the
dfexec.cfg file. A user name is case sensitive, can be up to 20 characters and can only
include alphanumeric characters as well as these symbols: . , - , or _. There are no
restrictions on which characters or words can be used in passwords. A password may be set
to blank. Passwords do not expire.
Groups
DIS security has two special group accounts: administrators and everyone. The everyone
group includes all users, present and future. If you create an account called everyone, DIS
will log an error and ignore that account. The administrators group has access to all
commands and objects regardless of explicitly set permissions.
The system does not require groups. However, for easier administration of DIS, the
administrator can create groups, assign users to groups, and assign groups to other groups.
All group accounts must be added to the groups file in the security path specified in the
dfexec.cfg file. A group name can be up to 20 characters and is case sensitive.
DataFlux Integration Server User's Guide
69
Command Permissions
DIS security supports per user command permissions. These are initially set when a new
user is created, and may be changed at any time. Changing user permissions does not
require a server restart. Command permissions are defined by setting Boolean flags to
enable (1) or disable (0) permissions for a given command. Permissions may be set for the
following commands:
Bit
Position
1
2
70
Command
Execute RealTime Service
Execute
Architect Job
Description
When this option is selected, the user can view Architect realtime service parameters and execute Architect real-time
services.
When enabled, allows the user to execute, terminate, get
status, get log, and delete log for Architect jobs.
3
Execute Profile
Job
When set to enabled, allows the user to execute, terminate,
get status, get log, and delete log for Profile jobs.
4
Post Real-Time
Service
When enabled, the user can post Architect real-time service
files.
5
Post Architect
Job
If this option is enabled, the user can post an Architect job
file.
6
Post Profile Job
If set to enabled, allows the user to post a Profile job file.
7
Delete RealTime Service
When enabled, allows the user to delete an Architect realtime service file.
8
Delete Architect When enabled, the user can delete an Architect job file.
Job
9
Delete Profile
Job
If this option is enabled, the user can delete a Profile job file.
10
List Real-Time
Services
If enabled, allows the user to see a list of Architect real-time
services.
11
List Architect
Jobs
When set to enabled, the user can view a list of Architect
jobs.
12
List Profile Jobs When enabled, allows the user to see a list of Profile jobs.
13
List All Statuses If this option is enabled, the user can get the status for all
Architect and Profile jobs.
DataFlux Integration Server User's Guide
The following are examples of command strings:
Command
Privileges
String
1111110001111 This user has privileges for all commands except deleting jobs and
services.
0000000001111 This user can post jobs and services as well as view job status.
0110110110111 This user can perform all actions on Architect and Profile jobs but cannot
post, execute, delete, or list real-time services.
Here is an example of an ACL with security permissions set for users and groups:
ACL Security Permissions
Access Control Lists
Access control lists (ACL 20s) are used to secure access to individual DIS objects. If an object
is copied to DIS instead of posted through the client, DIS automatically creates an ACL file
when the object is first accessed. The owner is automatically set to the administrators
group. A DIS administrator can manually create or edit an ACL for any object. Changes to
ACLs do not require a server restart.
If the ACL contains an unrecognized owner, DIS assumes the administrators group is the
owner. For unrecognized and duplicate users and groups, DIS ignores the access control
entry (ACE 21). If there are no valid ACEs, DIS uses the default setting for the everyone
group (as defined in the dfexec.cfg file).
When an object is deleted using DIS, the ACL file is deleted. If an object is deleted manually
and an object by the same name is later posted using DIS, a new default ACL file is created.
If an old ACL file exists, it will be overwritten.
Object Ownership
When a user posts an object using DIS, that user is automatically set as the owner of the
job. When a user creates an object by copying the file, ownership is automatically set to the
administrators group. The admin can change ownership to another user or group at any
time in the ACL file.
20
Access control lists (ACLs) are used to secure access to individual DIS objects.
An access control entry (ACE) is an item in an access control list used to administer object and user
privileges such as read, write, and execute.
21
DataFlux Integration Server User's Guide
71
The owner of an object will always be able to execute and delete an object, regardless of
user or group permissions.
DIS security supports a configuration setting to prevent automatic ownership assignment
for posted objects. This is the enable ownership setting in the dfexec.cfg file. If automatic
ownership assignment is disabled, any previous ownership entries in ACL files are ignored
and all objects are owned by the administrators group.
User, Group, and Command Permission Interactions
When user and group level permissions differ, user level permissions take precedence. For
example, if an object has an ACL with an ACE denying a user access to the object and
another allowing a group (where the user is a member) access to the object, the user is
denied access to the object.
When group level permissions differ, the most restrictive user permission takes precedence.
For example, if a user is a member of groups A and B and an object's ACL has an ACE
allowing access to group A but denying access to group B, that user is denied access to the
object.
In order to deny object-level access to all users but a few, an ACL should be set to a deny
everyone (0) ACE and then an allow (1) ACE for individual users or groups.
Command-level permissions are defined for user accounts only, while groups can only be
used in ACLs. When individual user command permissions differ from permissions granted
in an ACL, the most restrictive permission usually (except when everyone is set to allow
permissions) takes precedence. For example, if a user has command-level permission to
execute Architect jobs but a particular job has an ACL denying access to this user, the user
cannot access this particular job.
Security Administration
The admin is responsible for setting up users, groups, passwords, and permissions on
commands and, optionally, objects. You can also change ownership of existing objects,
configure default object permissions, or turn off DIS security.
Note: The admin must make sure the default DIS security directory
(etc\dis_security) is set to allow DIS to read, write, delete, and create files in
the security directory. In addition, set the read and modify permissions for
other users to deny.
Setting Up Integration Server Security
Follow these steps to enable security for your Integration Server installation:
72
1.
Set the security options in the dfexec.cfg configuration file.
2.
Add users to the users file.
3.
Optional: Add user groups.
DataFlux Integration Server User's Guide
The admin can either edit the required files manually (recommended), or use the supplied
administration command line utility. The command line utility, dsm_cmdln.exe, can be
found in the bin directory of the DIS installation. This utility uses a menu-driven approach
using MS-DOS® or UNIX® command lines. Most administrators will find editing the user
and group files is more efficient than using the command line utility.
Once you modify the security settings in the dfexec.cfg configuration file and add user
accounts to the users file, you will need to restart the server. At this point the new security
settings will take effect and users will be prompted for logon credentials when using DIS
Manager.
Configuration Options in the dfexec.cfg File
After planning your DIS security hierarchy, you are ready to set up the dfexec.cfg file.
Below is a list of security-related options that you must set in the dfexec.cfg file:
•
enable security — (yes/no) if set to yes, the DIS security is enabled. User
authentication is required to connect to the server and to perform actions. If set to
no, the security is disabled.
•
enable ownership — (yes/no) if set to yes, a user is assigned as the owner of an
object they post to the server. If set to no, ownership defaults to the administrators
group. Object ownership allows for implicit rights to execute or delete an object that
takes precedence over explicitly configured permissions.
•
allow everyone — (yes/no) if set to yes, all users (present and future) will have
access to all objects by default. This group is used to specify an object's permissions
(allow or deny) that apply to all users.
•
security path — (path to the security sub-directory) this is the path where all
security-related files are stored. These files include the users and groups. If no path is
specified the server looks in etc/dis_security.
In order for any changes made to the dfexec.cfg file to be implemented, you must restart
the server.
Creating Users
The first step in adding users is to create the users file. This file needs to be created in the
directory specified by the security directory setting in your dfexec.cfg file. The file should be
named users and have no file extensions. You can add users to this file with any text editor.
For more information on user file layout refer to Security Files.
Creating User Passwords
User passwords are not required, however they are recommended. To generate an
encrypted password for a user on Microsoft® Windows® platforms, you must run the
hashpassword utility or the HashPasswordStrong utility provided in the bin directory of the
Integration Server installation. Hashpassword creates a password hash, encrypting the
user's password. HashPasswordStrong adds the following requirements. Passwords must
contain at least: (1) six characters, (2) one numeric digit, (3) one uppercase letter, and (4)
one lowercase letter. Once the encrypted password is generated, you can copy it from the
utility into the users file for the given user.
DataFlux Integration Server User's Guide
73
HashPasswordStrong.exe
To generate an encrypted password for a user on UNIX platforms, refer to Security
Commands for UNIX. For more information on user file layout refer to the Security Files
section below.
Creating Groups
Adding user groups to your security model is optional. In order to create user groups you
must have a file named groups in the directory specified by the security directory settings in
your dfexec.cfg file. This file does not have a file extension. For more information regarding
the layout and contents of a groups file, refer to the Security Files section.
Security Files
DIS security files can reside in any directory accessible to DIS. The full path of the DIS
security directory can be configured using the security path setting in the dfexec.cfg file. If
such a path is not specified, DIS will look for security files in etc/dis_security. If DIS
Security is enabled and DIS cannot find a users file or load any users from the file, DIS
writes an error to the log but continues initializing.
The security files include:
users: This file contains user names, hashed passwords, and DIS command permissions.
Here is an example of the security permissions for users:
admin:d033e22ae348aeb5660fc2140aec35850c4da997:1111111111111
user1:b3daa77b4c04a9551b8781d03191fe098f325e67:1110000001111
user2:a1881c06eec96db9901c7bbfe41c42a3f08e9cb4:0010010010011
user3:0b7f849446d3383546d15a480966084442cd2193:1001001001000
user4:0000000001110
The preceding is a sample user's file. There should be one entry per line. The basic file
layout is as follows:
[username]:[password]:[permissions]
74
DataFlux Integration Server User's Guide
In the above example:
User
Name
Permissions
admin
The admin is granted all permissions.
user1
This user is authorized to execute all objects, and retrieve a list of all objects and
their statuses, but cannot post or delete objects.
user2
This user is authorized to list, post, execute, and delete Profile jobs but not
Architect jobs or real-time services.
user3
This user is authorized to list, post, execute, and delete real-time services but not
Architect or Profile jobs.
user4
This user is authorized to list all objects on the server but not to perform any
other actions. This user also does not require a password.
groups: This file contains group names and user names. Here is an example of the security
permissions for groups:
# sample groups file
administrators:admin
group1:user1:user2
group2:user4:admin
group3:group1:user3
group4:user4
Above is a sample groups file. A group can contain one or more users and one or more
groups. There should be one entry per line. The basic file layout is as follows:
[groupname]:[group or user]:[group or user]:[group or user], etc.
In this example:
Group Name
Permissions
administrators
This group contains one user: admin.
group1
This group includes user1 and user2.
group2
This group includes user4 and admin.
group3
This group includes all members of group1 and user3.
group4
This group includes only user4.
DataFlux Integration Server User's Guide
75
ACLs: This file contains the owner information and permissions for a job or service and is
named as follows:
[objectname]_[type].acl:
The values for [type] are:
archjob - Architect batch job
archsvc - Architect real-time service
profjob - Profile job
Here is an example of the security permissions set in the ACL file:
user1
everyone:1
user3:0
Above is a sample ACL file. ACLs should have one entry per line. The basic file layout is as
follows:
[owner][group or user]:[allow (1) or deny (0)]
In this example:
User
Name
Permissions
user1
This is the owner of the object. The first line of the file always denotes the
object owner.
everyone
Everyone in the users file has permission to execute or delete the object.
user3
This user is explicitly denied permission to execute or delete the object. This
explicit permission overrides any user or group permission settings.
Once you have finished setting up security, you must restart the DIS service in Windows.
Security Commands for UNIX
Use the following disadmin commands to manage users and groups if security is enabled.
Arguments listed in brackets are optional. If an optional argument is not provided, disadmin
prompts the user for the value.
Command
moduser [USERID [PASSWORD
[BITS]]]
Description and Example
Modify information for an existing user. Example:
./bin/disadmin moduser fred
secret_password 1110000001111
adduser [USERID [PASSWORD
[BITS]]]
Add a new user. Example:
./bin/disadmin adduser claudio
76
DataFlux Integration Server User's Guide
Command
deluser [USERID]
Description and Example
Delete a user. Example:
./bin/disadmin deluser claudio
passwd [USERID [PASSWORD]]
Set the password for a user. Example:
./bin/disadmin passwd fred
chperm [USERID [BITS]]
Change the permissions of a user. Example:
./bin/disadmin chperm fred 1111110001111
modgroup [GROUPID [MEMBER
[MEMBER...]]]
Modify information for an existing group. Example:
./bin/disadmin modgroup development fred
claudio
addgroup [GROUPID [MEMBER
[MEMBER...]]]
Add a new group. Example:
./bin/disadmin addgroup QA
delgroup [GROUPID]
Delete an existing group. Example:
./bin/disadmin delgroup QA
Using Strong Passwords in UNIX
The strong passwords setting in the dfexec.cfg configuration file is used by the disadmin
application in UNIX to enforce the following rules for passwords:
•
minimum length of six characters
•
require at least one number
•
require at least one uppercase letter
•
require at least one lowercase letter
This setting affects the following disadmin commands: adduser, moduser, and passwd.
You must restart the DIS daemon in UNIX if you have made changes to the dfexec.cfg
configuration file. You do not need to restart the daemon if you have made changes only to
the users or groups files.
For an overview of the four types of security available in DIS, see DIS Security Tools.
DataFlux Integration Server User's Guide
77
Security Policy Planning
A well-planned security model allows the DataFlux® Integration Server (DIS) security
administrator (admin) to control access to the application. DIS offers several security tools,
allowing the administrator to work with your existing security policy. As a resource on your
network, DIS usage can be defined based on your security model, which in turn is based on
usage policy, risk assessment, and response. Determining user and group usage policies
prior to implementation helps you minimize risk and expedite deployment.
Risk Assessment - Security policies are inevitably a compromise between risk and
necessary access. Users must access the application and data in order to perform necessary
tasks, but there is associated risk when working with information, particularly confidential
data. Consider the risks of compromised (unauthorized views or lost) data. The greater the
business, legal, financial, or personal safety ramifications of compromised data, the greater
the risk.
Usage Policy - Determine usage policy based on risk assessment. Take into account
individual and group roles within your organization. What policies are already in place? Do
these users or groups already have access to the data used by DIS? Are they dfPower®
Studio users? Generally, users will fall into one of the following categories: administrators,
power or privileged users, general users, partners, and guests or external users. The
approach to deny all, allow as needed will help you to implement security from the top
down. New users should have restricted access. Access for administrators and power users
could then be conferred manually or through explicit group permissions.
Security Response - Consider establishing a security response policy. If you have a
security response team, specify how they are to respond to and report violations of security
policy. Consider training all users on acceptable use prior to deployment of DIS.
For more information, see DIS Security Examples.
78
DataFlux Integration Server User's Guide
DIS Security Tools
DataFlux® Integration Server (DIS) offers security options that can be used alone or in
combination. Using settings in the dfexec.cfg file, the DIS security administrator (admin)
can restrict access based on IP address. The admin has the ability to control access based
on user, group, or job with the DIS Security Manager or by manually editing security files.
DIS Security Manager
The DIS security subsystem gives administrators the ability, in a very granular way, to limit
the way various users can access or execute Architect jobs and services. DIS Security
Manager enables the admin to secure DIS commands and objects 22 on a per-user basis, by
explicitly creating user accounts and setting user, group, and job level access. Control can
be administered by named user, by group, or can be explicitly assigned to jobs and services
themselves. The ability for users or groups to get job lists, post new jobs, delete existing
jobs, and query for job status can all be controlled with this subsystem.
For more information, see Using Security Manager.
DIS Security with LDAP
In addition to restricting access by IP address and setting up access rights for individual
users and groups, DIS can be integrated with Lightweight Directory Access Protocol (LDAP).
The password that allows DIS to bind with the LDAP server must be set in the DIS
configuration file in an encrypted format. An encryption utility is included for that purpose,
and is described in DIS with LDAP Integration.
LDAP users who do not exist on DIS are automatically added with default command
permissions on first access. The DIS administrator can change these permissions. When
security is enabled, each user must authenticate through DIS with a user name
and password. DIS then passes user credentials to the LDAP server to be authenticated.
After authentication, DIS authorizes the user request based on permissions set for the
command or resource. The admin also has the option of disabling DIS Security, in which
case no authentication is required and no authorization will be performed.
IP-Based Security
The admin can control access by IP address with configuration settings in the dfexec.cfg file.
The configuration setting restrict general access will default to allow all, which is suitable
for administrators. The configuration setting restrict get_all_stats access allows control
over who can view the status of all jobs, as opposed to viewing the status of one specific
job. Generally, access should be limited to administrators. The restrict post/delete
access setting allows control over who can post and delete jobs. For more information on
IP-Based Security, refer to Configuration Settings.
Remote Administration of Security
Remote administration functionality is available to the admin through new SOAP requests, in
order to administer DIS users and groups.
22
services and jobs
DataFlux Integration Server User's Guide
79
SSL
DIS now supports SSL for SOAP clients. You can use secure encryption anytime a server's
address is entered as https:// instead of http://. Due to U.S. export restrictions related
to encryption methods, SSL support is shipped as a separate package that is installed
following successful installation of DIS. Servers will need to be configured for SSL and
customers are expected to establish and maintain their own SSL environment and have it in
place prior to using it with DIS.
SOAP server configuration directives are:
soap over ssl = yes
A key file is required. If the key file is not password protected, the second configuration
directive of this pair can be commented out.
soap ssl key file = 'C:\Desktop\Key File\'
soap ssl key passwd = 'encrypted password'
The following directives are used if a Certificate Authority certificate file or path to a
directory with trusted certificates is needed. If they are not needed, comment them out.
soap ssl CA cert file = 'C:\Desktop\Certificate Authority
Folder\CAfile'
soap ssl CA cert path = 'C:\Desktop\Certificate File\'
Best Practice: Refer to Appendix A: Best Practices - Plan your security model based on
your organization's business needs.
80
DataFlux Integration Server User's Guide
Using Security Manager
Adding Users
The administrator creates users and can assign users to groups. All user accounts must be
added to the users file in the security path specified in the dfexec.cfg file. A user name is
case sensitive, can be up to 20 characters, and can only include alphanumeric and these
symbols: ., -, or _.
DIS Security Manager Add New User Dialog
Complete these steps to add a new user to your system:
1.
To add a new user, click Edit > Add. The New User Properties dialog opens.
2.
Under the General tab, type the User Name.
3.
Type the Password for the new user.
Note: There are no character or word restrictions on passwords. A
password may be set to blank. Passwords do not expire.
4.
Type the password in the Verify PW field.
5.
Select the permissions the user will have based on the information in the Command
Permissions section.
6.
When you are finished selecting the permissions, click OK. The new user is added to
the list.
DataFlux Integration Server User's Guide
81
Now you can create additional users, add the user to a group, or close the DIS Security
Manager.
You can also add users directly to the users file in one of the following formats:
[username]:[hashedPassword]:[permissions]
[username]::[permissions]
For more on hashed passwords and user permissions, see DIS Security Manager Concepts.
Adding Groups
DIS security has two special group accounts, administrators and everyone. The everyone
group includes all users, present and future. If you create an account called everyone, DIS
will log an error and ignore that account. The administrators group has access to all
commands and objects regardless of explicitly set permissions.
The system does not require groups. However, for easier administration of DIS, the admin
can create groups, assign users to groups, and assign groups to other groups. All group
accounts must be added to the groups file in the security path specified in the dfexec.cfg
file. A group name can be up to 20 characters and is case sensitive.
DIS Security Manager Add New Group Dialog
82
1.
To create a new group, click the Groups tab.
2.
Click Edit > Add. The New Group Properties dialog opens.
3.
Type the Group Name under General.
4.
Click the Users tab.
DataFlux Integration Server User's Guide
5.
Click Add
to add users to the new group. The Add Users dialog opens.
Add Users to Group Dialog
6.
You can select one or more users for the new group. To select more than one, click
the first user name then press CTRL + click on the other user names.
7.
Click OK. Your users are listed on the New Group Properties dialog.
New Group Properties Dialog
8.
Click the Sub-Groups tab.
9.
You can add any of the groups listed to the new group. To add an existing group, click
. The Add Groups dialog opens.
Add
DataFlux Integration Server User's Guide
83
Add Groups Dialog
10.
To add one group, click the group name.
11.
Click OK.
12.
If you need to create more groups, continue or click OK to close the New Group
Properties dialog.
13.
The new group appears under the Groups tab.
Adding a User to a Group
To add a user to a group, complete the following steps:
84
1.
To add a user to a group, click the user name.
2.
Click Edit > Item Properties. The User Properties dialog opens.
DataFlux Integration Server User's Guide
User Properties Dialog
3.
Click the Groups tab.
4.
Click Add
5.
Select one or more groups.
6.
To select more than one group, click the first group then press CTRL and select
additional groups.
7.
Click OK. The group you select now appears under Groups on the User Properties
dialog.
8.
Click OK.
. The Add Groups dialog opens with a list of existing groups.
Adding a User to the Administrators Group
The steps for adding a user to the Administrators group is similar to adding to a group.
1.
To add a user to a group, click the user name.
2.
Click Edit > Item Properties. The User Properties dialog opens.
3.
Click the Groups tab.
4.
Click Add. The Add Groups dialog opens with a list of existing groups.
5.
Select administrators.
6.
Click OK. The administrators group appears under Groups on the User Properties
dialog.
DataFlux Integration Server User's Guide
85
7.
Click OK.
Deleting Users or Groups
You can delete one or more users and groups.
1.
Select the users or groups you want to delete.
2.
Click Delete
. A warning message appears.
Delete Users Warning Message
3.
If you want to delete the selected users or groups, click OK.
Viewing ACL Properties
To view the properties for an ACL 23, complete the following steps:
1.
Locate the ACL under the appropriate tab.
2.
Right-click on the file.
3.
Select Properties, the ACL Settings dialog opens. Notice the Owner, the status for
the Everyone group, and the ACEs. You can make changes to the ACL Properties, add
or delete ACEs, or toggle ACE settings.
23
Access control lists (ACLs) are used to secure access to individual DIS objects.
86
DataFlux Integration Server User's Guide
ACL Settings Dialog
4.
Click OK to close the ACL Settings dialog.
DataFlux Integration Server User's Guide
87
Security Manager User Interface
DataFlux® Integration Server (DIS) Security Manager enables the DIS security
administrator (admin) to secure DIS commands and objects 24 on a per-user basis, by
explicitly creating user accounts and setting user, group, and job level access.
DataFlux Integration Server (DIS) Security Manager
File
Click File from the main menu for these options:
Open Configuration File
To open the dfexec.cfg configuration file, click File > Open Configuration File, the
Open dialog appears.
24
88
services and jobs
DataFlux Integration Server User's Guide
DIS Security Manager Open Configuration File Dialog
Open Security Directory
To open a security directory, click File > Open Security Directory, the Select
Directories dialog opens.
DIS Security Manager Open Security Directory Dialog
Save
Click File > Save to save the security settings in the configuration file.
Exit
Click File > Exit to close DIS Security Manager.
DataFlux Integration Server User's Guide
89
Edit
Click Edit in the main menu for these options:
Add
Click Add to add new users, groups, or Access Control List (ACL 25s) settings.
DIS Security Manager Add New User Dialog
Delete
Select the user, group, service, or job you want to delete. Click Edit > Delete to
delete the selected items.
Select All
Click Select All to select all items under that tab.
Item Properties
Select a user or group you want to view. Click Edit > Item Properties to view and
edit the properties.
Multiple User Permissions
Select more than one user from the list of Users. Click Edit > Multiple User
Permissions to make the same changes to the selected user names.
25
Access control lists (ACLs) are used to secure access to individual DIS objects.
90
DataFlux Integration Server User's Guide
Multiple Access Control List Properties
To make changes to multiple ACLs, select the ACLs then click Edit > Multiple ACL
Properties. Here, you can add or remove users and groups from more than one ACL.
Preferences
Click Edit > Preferences to set the options for saving files in DIS Security Manager.
DIS Security Manager Preferences Dialog
Users/Groups Save Preferences
Backup Users File
Select this option to create a backup of the users security file.
DataFlux Integration Server User's Guide
91
Location of Users Security File Backups
If this option is selected, when click Save, the current users file is named
users. Previous backup users files have a stamp following the file name.
Backup Groups File
Select this option to create a backup of the groups security file. The
current groups file will be named groups, and backup groups files will
have a stamp following the file name.
Overwrite groups file if changed
Select this option to overwrite the groups file each time group security is
changed.
ACL Save Preferences
Backup ACL Files
Select this option to create a backup of ACL files.
ACL Backup File
Warnings
Warn on save if group empty
If this option is selected, you will receive a warning message when a
group is empty.
92
DataFlux Integration Server User's Guide
Empty Group Warning Dialog
If you do not want to see this message in the future, you can select Do
not show this dialog again.
View
From the main menu, click View to change the way DIS Security Manager appears:
Toolbar
Select to view the toolbar under the main menu. The toolbar appears by default.
Status Bar
Select Status Bar to show the status bar and view the status of DIS Security
Manager.
Gridlines
Select Gridlines to see the horizontal and vertical lines in DIS Security Manager.
Help
Help Topics
Click Help > Help Topics, to access DIS online Help.
About Security Manager
To view the version of DIS you are running and DataFlux contact
information, click Help > About Security Manager.
Toolbar
The DIS Security Manager toolbar provides buttons to quickly access several of
the commonly used main menu options.
DataFlux Integration Server User's Guide
93
Button
Name
Open
Configuration
File
Description
Click Open Configuration File for the Open dialog.
Open Security Click Open Security File for the Select Directories dialog.
Directory
Save
Click Save to save the files that have been changed in DIS Security
Manager.
Add
To add users or groups, click the appropriate tab and then click
Add.
Delete
To remove users or groups, click the user or group you want to
delete, and then click Delete.
Note: To delete multiple users or groups, select
the users or groups (click the name + CTRL) then
click Delete.
94
Properties
Click Item Properties to view details for the item you are viewing.
For example, the User Properties dialog opens when you are on
the Users tab, the Group Properties dialog opens when you are
on the Groups tab, and the ACL Settings dialog opens when you
are on the Services or Jobs tabs.
Multiple User
Permissions
The Multiple User Permissions option is used to make changes to
multiple users at one time. Select the users then click Multiple
User Permissions. The Permissions - Multiple Users dialog
opens. Select the Command Permissions the users should have.
Multiple ACL
Permissions
Multiple ACL Permissions allows you to make user and group
permission changes to multiple ACLs. Select the ACLs you want to
change then click Multiple ACL Permissions; the ACL Settings
dialog opens.
DataFlux Integration Server User's Guide
IP-based Security
Through settings in the dfexec.cfg file, the DataFlux® Integration Server (DIS) security
administrator (admin) can control access by IP address. These settings, combined with DIS
Security Manager settings, control user access.
You can restrict access to DIS by specifying IP addresses of clients that are allowed or
denied access. There are two supported restriction groups, general access and access to
post and delete commands.
When configuring each restriction group you must specify either "allow" or "deny" (but not
both). This directive can be followed by lists of specific IP addresses and ranges. You can
also use "all" or "none" keywords, but in this case any explicitly defined IP addresses or
ranges are ignored. An IP address that is denied general access is implicitly denied access to
post and delete commands.
Configuration for each restriction group must be entered on a single line using the 'space'
character as a separator between entries. IP ranges must be specified using '-' character
with no spaces.
Setting
restrict general access
= (allow/deny)
Description and Example
Use this setting to restrict access to the server by IP address. If
this is not set, the default is to "allow all". For example:
restrict general access = allow 127.0.0.1
192.168.1.1-192.168.1.255
Another example:
restrict general access = allow 127.0.0.1
192.168.1.190
restrict get_all_stats
access = (allow/deny)
When the statuses of all jobs are requested, the client receives all
job IDs. If this is not set, the default is to "allow all". For example:
restrict get_all_stats access = deny all
Note: Only administrators should be allowed to
request status of all jobs.
restrict post/delete
access = (allow/deny)
This option restricts access to the server for posting and deleting
jobs. If this option is not set, the default is to "allow all". For
example:
restrict post/delete access = 127.0.0.1
DataFlux Integration Server User's Guide
95
DIS with LDAP Integration
DataFlux® Integration Server (DIS) can use the Lightweight Directory Access Protocol
(LDAP) to authenticate users on all platforms if desired. When using LDAP, no DIS users
need to be defined or managed on DIS. DIS can create users automatically by
authenticating each unique logon using LDAP and setting default permissions for that user.
When using Microsoft® Windows®, the client is based on the Microsoft Active Directory®
and Windows SSL support is based on Windows Crypt library. When using UNIX®, the client
is based on openLDAP, with SSL support based on openSSL. The client also supports
communication with LDAP servers in clear text.
By default, LDAP is disabled. LDAP options can be configured in the dfexec.cfg file. To
enable LDAP, add enable ldap = yes to the dfexec.cfg file.
This section applies to both Windows and UNIX/Linux®, except where noted:
•
For UNIX/Linux, refer to LDAP Requirements for UNIX/Linux Platforms
•
For Windows, refer to the LDAP Domain section under LDAP Directives
Note: For information about LDAP server setup and configuration refer to
your LDAP server documentation.
LDAP Requirements for UNIX/Linux Platforms
AIX® - Requires the ldap.client.rte package to be installed. Run lslpp -l ldap.client.rte to
check for a previous installation. This package is located on the AIX installation media.
HP-UX® - Requires the LDAP-UX client to be installed. Run /usr/sbin/swlist -l product
LdapUxClient to check for a previous installation. If it is not installed, download it by going
to the Hewlett Packard® Web site at
http://software.hp.com/portal/swdepot/displayProductInfo.do?productNumber=J4269AA.
Linux - Requires the OpenLDAP client to be installed. On an RPM-based system such as
RedHat® or SuSe™, run pm -q openldap to check for a previous installation. For other
Linux systems, consult the system documentation to test the availability of software
packages. RedHat Enterprise Linux 4 or later requires the compat-openldap package. Run
rpm -q compat-openldap to check for a previous installation. This package can be found
on the installation media or RHN.
Solaris® - No additional requirements are needed. The LDAP client library is part of the
Solaris core libraries.
Operation
Note: The LDAP implementation in DIS does not currently support LDAP
groups membership resolution, however group command permissions are
supported. If groups are needed, DIS administrators can define them on DIS
in terms of either LDAP, DIS users, or both.
96
DataFlux Integration Server User's Guide
When security is enabled and LDAP is being used, DIS then authenticates user credentials
with the LDAP server. Once a user is authenticated, DIS authorizes the user's request based
on configured permissions for the requested command or resource.
When a request comes from a user DIS does not recognize, credentials are passed to the
LDAP server for authentication. In the case of success, DIS appends a new LDAP user
account to the users file and sets the command permissions to the configured default value.
When a request comes from an LDAP user DIS recognizes, the user's credentials are passed
to the LDAP server for authentication. If the user already exists in the users file with a
password set to x, this indicates an LDAP user opposed to a DIS local user. Credentials are
then authenticated, and the user's existing command permissions, which are set in the
users file, are used.
In order to authenticate LDAP users, DIS must bind with the LDAP server using an
encrypted password. The encrypted password is then entered into the dfexec.cfg file using
the directive:
ldap bind pwd = [encrypted password]
For Windows, an encryption utility located in C:\Program Files\DataFlux\DIS\[version]\bin,
EncryptPassword.exe, is available to generate an encrypted password. To generate an
encrypted password:
1.
Launch the application EncryptPassword.exe.
2.
Enter and confirm the password.
3.
Generate the encrypted password.
4.
Copy and paste the encrypted password into the directive, ldap bind pwd = [new
encrypted password].
For UNIX, a command is available to encrypt passwords:
disadmin crypt
After running the command, the user is prompted to enter and confirm the password.
Permissions
When an LDAP user accesses DIS for the first time, DIS automatically creates an account for
the user in the users file and sets the default commands permissions. The DIS administrator
can change the user permission in the users file.
The configuration file permissions are set in the dfexe.cfg file. In this configuration file,
permissions are automatically set for new user entries. The default value is specified as:
default commands permissions = [permissions bits]
For example,
default commands permissions = 1111111111111
In the preceding example, the default grants all permissions using all 1s. Configure the
permissions as desired for your installation.
DataFlux Integration Server User's Guide
97
When an LDAP user already exists on DIS, the command permissions configured for that
user in the users file are applied. If the user is also a member of one or more DIS groups,
the group's permissions are a factor in the access level for the user.
The DIS administrator can allow some LDAP users to have access to DIS, which restricting
others. To do so, they must set all default command permissions to deny, which is 0. Then,
they have to add the necessary LDAP users to the DIS users file and set the appropriate
command permissions. If the password field is set to x, the user is an LDAP user.
Alternatively, if the DIS administrator wants to deny just a few LDAP users access to DIS,
the administrator can:
1.
Add the users to a DIS group with all command permissions set to deny, or
2.
Add the users to the DIS users file with all their command permissions set to deny,
while setting less restrictive default command permissions for the desired users.
If LDAP users do not need to be restricted from accessing DIS, no special steps are
necessary.
DIS does not maintain a history of connections and must authenticate every request
received. To reduce the amount of LDAP queries, a configuration in the dfexec.cfg file allows
the administrator to specify how long DIS may cache the user authentication information. If
set to 0, DIS communicates with LDAP for every request. Otherwise, when a user first
attempts to log on, DIS calls LDAP to authenticate the user, caches the user name and
password, stores it in the form of a SHA-1 hash, and finally caches the LDAP authentication
result for a configured amount of time. If the user sends more requests during that period,
DIS does not go to LDAP to re-authenticate that user. This is a useful option in
environments where changes to LDAP users are infrequent. The administrator also has the
option of disabling DIS Security completely. No authentication would be required and no
authorization would be performed.
If DIS is deployed in an environment where it needs to authenticate LDAP users from
multiple LDAP servers or Active Directory domains, it is the responsibility of LDAP/Active
Directory administrators to ensure there are no duplicate user accounts between LDAP
servers or Active Directory domains.
To configure DIS for Active Directory, set the following directives in the dfexec.cfg file, then
restart DIS:
# Enable LDAP/AD for authentication
enable ldap = yes
# Define LDAP/AD server and port
ldap host = yourhost:portnumber123
# Define AD domain
ldap domain = yourdomain
Note: This is the minimal configuration required for Active Directory
installations and may work for most cases.
98
DataFlux Integration Server User's Guide
To configure DIS for SSL, use the following directives in the dfexec.cfg file:
# If set to 0 (default), the client communicates with servers in clear
text. # If set to 1, all communication is over SSL. This setting is
optional: ldap use ssl = 1 # Used only if SSL is enabled # Do not use
with a self signed certificate, will not recognize): ldap ignore svr
cert = 0 # Used to set host address and port: ldap host =
XXX.XX.XXX.XXX:636
Configuration File
DataFlux Integration Server with LDAP and Active Directory sample implementation:
Setting
enable ldap = (yes/no)
Description and Example
Enables LDAP and Active Directory in DIS.
enable ldap = yes
ldap base dn = 'CN=Users,DC=[domain
name],DC=COM'
This setting is based on implementation of the
LDAP schema. For example:
ldap base dn =
'CN=Users,DC=domainname,DC=COM'
ldap bind dn = 'CN=Domain
This setting represents the bind for an
User,CN=Users,DC=[domainname],DC=CO
individual user, based on your implementation
M'
of the LDAP Schema. Here, substitute what
appears in the schema for Domain User. For
example:
ldap bind dn = 'CN=Domain
User,CN=Users,DC=domainname,DC=CO
M'
ldap bind pwd = 'password'
Enter your encrypted password for the
preceding user. For example:
ldap bind pwd =
'a1BCDEF/UVwxYZ=='
ldap cache timeout = [min]
Optional setting to place a time limit on DIS to
cache LDAP authentication in minutes.
ldap cache timeout = 30
ldap debug file = [path]
Specifies a location for the debug log. For
example:
ldap debug file = C:\Program
Files\DataFlux\DIS\[version]\log\
LDAP_Debug.txt
DataFlux Integration Server User's Guide
99
Setting
ldap domain = [domain name]
Description and Example
Required for Active Directory authentication.
ldap domain = domainname
ldap host = IP address:[default port value]
LDAP or Active Directory host IP address or
name and port number. For example:
ldap host = 127.0.0.1:389
Note: Port 389 is the typical
Active Directory port
assignment.
ldap ignore svr cert = 1
If a client machine is not configured to
recognize the specific certificate authority
behind the SSL certificate on the server, or the
server does not have an officially issued SSL
certificate, this option can be set to 1 so the
client will not reject the server and SSL
communication may continue. SSL must be
enabled for the setting to be active. The
default value is 1. This setting is optional.
ldap ignore svr cert = 1
ldap search attribute = [commonname]
This setting is based on your LDAP schema.
For example:
ldap search attribute = CN
ldap use ssl = 1
If set to 0, which is the default, the client
communicates with servers using clear text. If
set to 1, all communications are over SSL. This
setting is optional.
ldap use ssl = 1
100
DataFlux Integration Server User's Guide
LDAP Directives
Following is a list of supported configuration file directives for the LDAP client:
LDAP base dn - The distinguished name (DN) of the level/object of the LDAP directory tree
from which to start searching for a user account. An example of a value is
ou=People,dc=dataflux,dc=com. This value must be set if the client is to authenticate users
against LDAP servers. If Microsoft® Active Directory® is used, this value is ignored.
LDAP bind dn - Some LDAP servers may not allow anonymous binds, in which case this
value must be configured. It defines the DN of a user who is allowed to bind to the LDAP
server in order to do searches. An example of a value is
uid=RAM,ou=People,dc=dataflux,dc=com. This setting is optional.
LDAP bind pwd - Password for the user who is allowed to bind to the LDAP server to do
searches. If password is not set, the bind operation is treated as anonymous (the LDAP bind
dn setting is ignored). Note that an anonymous bind in LDAP implies no authentication of
the user is done.
LDAP cache timeout - The LDAP client can cache successfully authenticated users in
memory. This setting specifies the number of seconds to keep a cached user account in
memory. When a user/password are given to the client for authentication, LDAP first tries to
find that user in the cache. If found, LDAP checks if the cached account has expired and if
the cached and given passwords match. If either of the checks fail, the account is removed
from the cache and the received credentials are passed to the LDAP server for
authentication. The default value is 0, which means no caching is done and every
authentication request is always passed to the LDAP server. This setting is optional.
LDAP debug file - File name where the LDAP client will log its configuration and user
authentication activities, including any errors. The path must be valid. The file is opened (or
created, if needed) in append mode. It is your responsibility to delete the debug file. This
setting is optional.
LDAP domain - Default domain name for authenticating users. Setting this value indicates
to the LDAP client that it will be communicating to an Active Directory. If this value is not
set, the client assumes it is talking to a regular LDAP server. When users from this domain
enter their credentials, they do not need to fully qualify account names. Users from other
domains are also supported, but are required to enter fully-qualified account names, such
as user@domain or domain\user (the style depends on how you have Active Directory
configured). This setting applies to and must be set for only the Active Directory
environment. Otherwise, do not set this option.
LDAP host - LDAP server used to authenticate users. The format is [hostname]:port. An IP
address can be used instead of the hostname. Multiple servers can be specified, in which
case each host:port entry must be separated by a space. Entries are attempted in the same
order as configured. If one server cannot be contacted, the next one in the list is contacted.
This configuration option is required.
DataFlux Integration Server User's Guide
101
LDAP ignore svr cert - A client machine may not be configured to recognize the specific
certificate authority behind the SSL certificate on the server, or the server may not have an
officially issued SSL certificate. In these cases, this option can be set to 1 so the client
machine does not reject the server, whose certificate it may not recognize, and SSL
communication may continue. SSL must be enabled for the setting to be active. The default
value is 1. This setting is optional.
LDAP search attribute - An attribute in the LDAP schema to search for when looking for a
user entry. The default value is uid, which is used most commonly. However, an
organization might have users logging in using email addresses (for example), instead of
user account names. This configuration parameter allows this kind of flexibility. This setting
is optional.
LDAP search scope - The scope of the search of the LDAP server. A value of zero means
base search (to search the object itself). A value of one means one level search (to search
the object's immediate children). A value of two, which is the default, means subtree search
(to search the object and all its descendants). This setting is optional.
LDAP use ssl - If set to 0 (default), the client communicates with servers using clear text.
If set to 1, all communications are over SSL. This setting is optional.
Note: For information about LDAP server setup and configuration refer to
your LDAP server documentation.
102
DataFlux Integration Server User's Guide
DIS Security Examples
There are two types of security available with DataFlux® Integration Server (DIS), IP-based
security, and DIS Security Manager. IP-based security, configured in the dfexec.cfg file,
controls user access by IP address. The DIS Security Manager application is part of the
Integration Server. Through the Security Manager interface, user access can be controlled
based on user, group, and job level permissions. These security tools can be used
separately or together. Following are some scenarios employing different types of security:
Scenario 1: Users in a small, local group use a specific range of IP
addresses.
Scenario: Users have static IP addresses or draw dynamic addresses from a known range.
If the group is small, or licenses are restricted to only a few machines, this may be the
highest level of security needed by your organization.
Security plan: You can restrict access to DIS by specifying IP addresses of clients that are
allowed or denied access. Access can be restricted by general access, post/delete access,
and restrictions on requests for statuses of jobs.
Scenario 2: Your organization requires control over user and group
level access.
Scenario: Different users or groups require different levels of access, or certain files may
require different permissions.
Security plan: The DIS security subsystem provides this degree of control. User name and
password are passed using basic HTTP authentication to DIS. Information on that user's
user permissions, group permissions, and file permissions are kept in DIS security files. The
DIS security subsystem can be used alone or with IP-based security. The following is an
example of basic HTTP authentication:
Client request:
GET /private/index.html HTTP/1.0
Host: localhost
Server response:
HTTP/1.0 401 UNAUTHORIZED
Server: HTTPd/1.0
Date: Sat, 27 Nov 2004 10:18:15 GMT
WWW-Authenticate: Basic realm="Secure Area"
Content-Type: text/html
Content-Length: 311
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN""http://www.w3.org/TR/1999/REC-html40119991224/loose.dtd">
<HTML>
<HEAD>
<TITLE>Error</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859DataFlux Integration Server User's Guide
103
1">
</HEAD>
<BODY><H1>401 Unauthorised.</H1></BODY>
</HTML>
Client request:
GET /private/index.html HTTP/1.0
Host: localhost
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Server response:
HTTP/1.0 200 OK
Server: HTTPd/1.0
Date: Sat, 27 Nov 2004 10:19:07 GMT
Content-Type: text/html
Content-Length: 10476
Scenario 3: User authentication through LDAP adds an additional
layer of security.
Scenario: Your organization uses LDAP for user authentication.
Security plan: In this case, user name and password are still passed to DIS through basic
HTTP authorization, but DIS passes the information on to LDAP to authenticate the user. If
the user is not authenticated, the LDAP server returns an error.
Scenario 4: The DIS Security Administrator wants to remotely
administer a large number of users.
Scenario: The administrator wants to perform administrative tasks from the command
line.
Security plan: DIS security remote administration consists of SOAP commands to
administer DIS users and groups. This remote functionality allows the administrator to:
change passwords; list all users; list all groups; list user's groups; list group's members;
add user; set user's permission; add group; delete account; add account to group; and
delete account from group. DIS must be running and security enabled. Remote
administration can be used with or without LDAP. Note that error messages may change if
LDAP is integrated with DIS.
104
DataFlux Integration Server User's Guide
Frequently Asked Questions
General
What is an Integration Server?
An Integration Server is a service-oriented architecture (SOA 26) application server that
allows you to execute Architect or Profile jobs created using the DataFlux® dfPower®
Studio design environment on a server-based platform. This could be Microsoft®
Windows®, Linux®, or nearly any other UNIX® option.
By processing these jobs in Windows or UNIX, where the data resides, you can avoid
network bottlenecks and can take advantage of performance features available with higherperformance computers.
In addition, existing batch jobs may be converted to real-time services that can be invoked
by any application that is Web service enabled (for example: SAP®, Siebel®, Tibco®,
Oracle®, and more). This provides users with the ability to reuse the business logic
developed when building batch jobs for data migration or loading a data warehouse, and
apply it at the point of data entry to ensure consistent, accurate, and reliable data across
the enterprise.
What is the difference between DataFlux Standard Integration Server and
DataFlux Enterprise Integration Server?
The DataFlux Standard Integration Server supports the ability to run batch dfPower Studio
jobs in a client/server environment, as well as the ability to call discrete DataFlux data
quality algorithms from numerous native programmatic interfaces (including C, COM,
Java™, Perl, and more). The Standard Integration Server allows any dfPower Studio client
to offload batch dfPower Profile and Architect jobs into more powerful server environments.
This capability frees up the user's local desktop, while enabling higher performance
processing on larger, more scalable servers.
The DataFlux Integration Server (DIS) Enterprise edition has added capability allowing you
to call business services designed in the dfPower Studio client environment or to invoke
batch jobs using Service-Oriented Architecture (SOA).
How can I run multiple versions of DIS in Windows or UNIX?
The following procedure shows how to run multiple versions of DIS on a UNIX machine. This
procedure can also be applied to versions of DIS older than 8.0.
26
Service Oriented Architecture (SOA) enables systems to communicate with the master customer
reference database to request or update information.
DataFlux Integration Server User's Guide
105
UNIX
Multiple versions or instances of the same version of DIS can be installed on a UNIX server.
Different versions or instances must be installed in separate directories. Instead of creating
a single directory for the software, for example, /opt/dataflux, each version or instance
must have a separate directory, for example, /opt/dataflux/serv1disv8.0,
/opt/dataflux/serv2disv8.1, /opt/dataflux/serv2disv8.1a. The installer must run in each
directory and a different port must be designated for each server, otherwise the installations
can be configured identically.
WINDOWS
Multiple versions of DIS services can be installed on Windows systems with some
modifications. When installing any version of a DataFlux Windows service, the settings of
the currently installed version are overwritten, because the same names are used,
preventing multiple versions from existing concurrently. The solution is to rename older
versions of the services, so that installing or reinstalling the latest version does not affect
the older versions.
Note: Once an older version of a service is renamed, reinstalling that
version or applying an update to the version will require some user
intervention, such as stopping and starting the service manually, and editing
the registry so the current version of the service is pointing to the correct
executable file.
The following procedure shows how to run both the 8.0 and 8.1 versions of the DIS service
on a Windows machine. This procedure can also be applied to versions of DIS older than
8.0.
Make sure the older (8.0) DIS service is the active service
1.
To open the Services management console, click Start > Settings >Control Panel.
2.
Double-click Administrative Tools >Services.
3.
Double-click the DataFlux Integration Server service entry to display the
Properties dialog.
4.
Notice the Path to executable property. If that property uses the 8.0 bin directory,
then skip to Rename the older (8.0) Batch Scheduler service, otherwise continue with
this procedure. The remainder of this procedure assumes the Path to executable
property uses the 8.1 bin directory. If not, substitute the appropriate directory name.
5.
Stop the DIS service by clicking Stop, or select Action > Stop from the menu.
6.
Close the Properties page.
7.
Exit the Services management console.
8.
Open a command prompt window and change the directory to the DIS 8.1 bin
directory.
9.
Type dfintgsvr.exe -u and press Enter. A message will appear, confirming the
removal of the service.
10.
106
Change to the DIS 8.0 bin directory.
DataFlux Integration Server User's Guide
11.
Type dfintgsvr.exe -i and press Enter. A message will appear, confirming the
installation of the service.
12.
Close the command prompt window.
Rename the older (8.0) Batch Scheduler service
1.
To open the Services management console, click Start > Settings >Control Panel.
2.
Double-click Administrative Tools > Services.
3.
If the DataFlux Integration Server service, DFIntegrationServiceservice, is started,
right-click on it and select Stop. Services can also be started and stopped by opening
a command prompt window and typing NET START [service name] or NET STOP
[service name].
4.
Exit the Services management console.
5.
Open the registry editor by clicking Start >Run, typing regedit, and pressing Enter.
6.
Right-click on
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DFIntegrationService
and select Rename.
7.
Change the key name from DFIntegrationService to DFIntegrationService80.
8.
Double-click the DisplayName property, listed on the right.
9.
Change the name from DataFlux Integration Server to DataFlux Integration Server
8.0.
10.
Close the registry editor.
11.
Reboot the computer.
Note: Windows XP and Windows Vista® have a Services Control Manager
(SCM) that manages all services running on the system. The SCM cannot be
easily reset or restarted while Windows is running, making a system reboot
necessary.
Reinstall the latest (8.1) Batch Scheduler service
1.
Open a command prompt window and change the directory to the DIS bin directory.
2.
Type dfintgsvr.exe -i and press Enter. A message confirming installation of the
service will appear.
3.
Close the command prompt window.
4.
Open the Services management console.
5.
Right-click on the DataFlux Integration Server service and select Start. Both DIS
services, DataFlux Integration Server and DataFlux Integration Server 8.0, should
now be in the Services management console with a status of Started.
6.
Exit the Services management console.
DataFlux Integration Server User's Guide
107
How do I move an Architect or Profile job to UNIX so it can be processed by an
Integration Server?
With DIS Manager installed as part of the base dfPower Studio, connect to the desired
server and select the job or service to be uploaded. You can also use DIS Manager to test
real-time services on your server.
How do I save Profile reports to a repository using DIS?
In order for DIS to store Profile reports in a repository, the profrepos.cfg configuration file
must exist in the \etc directory of the DIS installation. The profrepos.cfg file contains the list
of available repositories and specifies one of them to be the default. The format for the
profrepos.cfg file is:
$defaultrepos='Repos1 Name'
Repos1 Name='ODBC DSN' 'Table Prefix'
Repos2 Name='ODBC DSN' 'Table Prefix'
Repos3 Name='ODBC DSN' 'Table Prefix'
where:
$defaultrepos: Indicates the default repository.
Repos1 Name: User-defined name for the repository.
ODBC DSN: The Data Source Name defined for the ODBC connection.
Table Prefix: Prefix that was given for the repository when it was created.
In the following example there are three repositories configured for Profile reports: TEST,
dfPower Sample Prod, and dfPower Sample Cust. The two dfPower Sample examples are
stored in the same database but use table prefixes (Prod_ and Cust_) to create a unique set
of tables for each repository. The default repository is dfPower Sample Prod.
$defaultrepos='dfPower Sample Prod'
TEST='TEST DATABASE' ''
dfPower Sample Prod='DataFlux Sample' 'Prod_'
dfPower Sample Cust='DataFlux Sample' 'Cust_'
Profile repositories on any supported platform can be managed by using the Profile
Repository Administrator that can be accessed from the Start menu.
For more information on creating and maintaining Profile repositories see the dfPower
Studio Online Help topic, "dfPower Profile - Profiling Repositories."
108
DataFlux Integration Server User's Guide
How do you enable/disable DAC logging?
To enable/disable DAC 27 logging in a Windows environment:
From the Windows Registry, create one or both of the following string values:
•
HKEY_CURRENT_USER\Software\Dataflux Corporation\dac\[version]\logfile
•
HKEY_LOCAL_MACHINE\Software\DataFlux Corporation\dac\[version]\logfile
where [version] indicates the version of DIS that you have installed.
Set logfile to the filename where logging output is sent. If this entry is empty or does not
exist, no logging occurs. This is also how to turn off logging.
Note: Make sure you turn off logging once required information is
captured.
To enable/disable DAC logging in a UNIX/Linux environment:
Add a file named sql_log.txt (all lowercase) to the var/dis_job/io directory. The dfexec.cfg
file is located in the dfpower /etc directory.
To enable/disable DAC logging from the command line:
When running dfexec from the command line using [your_jobname.dmc] as input, the DAC
log will be created inside the current working directory of the dfexec executable. You must
create the file sql_log.txt inside that working directory to enable DAC logging.
What SOAP commands are recognized by DIS?
For a complete list of SOAP 28 commands recognized by DIS, refer to the Soap Commands
topic.
How do I add an additional driver for the data sources?
DIS is compatible with most ODBC compliant data sources. DataFlux recommends using
supplied ODBC drivers instead of client-specific drivers provided by the manufacturer.
Limited support will be available for implementation and problem resolution when a client
implements a driver not supplied by DataFlux.
For a complete list of supported drivers, see Supported Databases.
I can't see my saved job even though it's saved in a directory I have access to.
Where is it?
In Windows, a job that will be run through DIS must be saved in a location that does not
use mapped drives. Win32 service is not able to access mapped drives, even if the service is
started under a user account that has those drives mapped.
27
A data access component (DAC) allows software to communicate with databases and
manipulate data.
28
Simple Object Access Protocol (SOAP) is a Web service protocol used to encode requests and
responses to be sent over a network. This XML-based protocol is platform independent and can be
used with a variety of internet protocols.
DataFlux Integration Server User's Guide
109
Are there restrictions on which characters are allowed in job names?
Job names can include alphanumeric characters only. If any of the following characters are
included in a job name, DIS will not list that job name and will not allow any operations on
that file: , , . , ' , [ , ] , { , } , ( , ) , + , = , _ , - , ^ , % , $ , @ , or !.
How can I be automatically notified of new releases of DataFlux products?
To arrange to receive update notification, visit the DataFlux Customer Care Portal at:
http://www.dataflux.com/Customer-Care/index.asp. From there you can select User
Profile. Then select to receive both data update notification and the DataFlux newsletter.
How do I know which log file is mine?
All DIS log file names start with the date that corresponds to when the current instance of
DIS was started. The beginning of the file name will be in the following format: YYYYMMDDHH.MM_. The date portion of the file name is followed by "00_" for the purpose of sorting
the directory so that all log files related to a particular instance of DIS are grouped
together. The DIS log itself will either be the first or last log file. After that there are some
random characters the operating system creates to guarantee unique file names. The name
ends with "_DIS.log."
Service and job log files begin with: MMDD-HH.MM.SS_ representing the date when the
request to load this job or service was received by DIS. This is followed by a request
number (since multiple requests can be received in the same second), followed by the job
or service name.
For example:
20070215-12.13_00_0210B0_DIS.log
20070215-12.13_0215-12.13.44_4_archsvc_ram.dmc.log
20070215-12.13_0215-12.14.04_10_archjob_100Kclust.dmc.log
Can I run a UNIX shell command from an Integration Server Architect job?
Yes, the execute() function allows you to run a program or file command from the shell. For
example, the following code allows you to modify the default permissions of a text file
created by Architect.
To execute the command directly, type:
execute("/bin/chmod", "777", "file.txt")
or to execute from the UNIX/Linux shell, type:
execute("/bin/sh", "-c", "chmod 777 file.txt")
Why is my job failing to read SAS Data Sets on AIX?
In order to access SAS Data Sets on AIX, you must have AIX 5.3 with patch level 6 installed
on your system.
110
DataFlux Integration Server User's Guide
Installation
What is the default temp directory, and how do I change it?
DIS uses the system temporary directory which is determined by the TMP environment
variable in Windows and the TMPDIR environment variable in UNIX.
In UNIX/Linux, you can redirect temporary files, by doing the following:
1.
Export TMPDIR=[your new temp path].
2.
Edit the .profile file to include this new path.
3.
Restart the DIS daemon.
In Windows, the default setting for TEMP is C:\Windows\Temp. You may set the value of
TEMP Environment Variable for that user or set the TEMP System variable to a different
location. To change environment variables in Windows, do the following:
1.
Right-click My Computer, and select Properties.
2.
Click the Advanced tab.
3.
Click Environment variables.
4.
Click one of the following options, for either a user or a system variable:
5.
•
Click New to add a new variable name and value.
•
Click an existing variable, and then click Edit to change its name or value.
•
Click an existing variable, and then click Delete to remove it.
Restart the DIS service.
How do I connect to a database?
DIS connects to databases through ODBC. To add a data source, use the ODBC Data Source
Administrator provided with Windows, or use the dfdbconf command in UNIX. In Windows,
click Start > Programs > DataFlux dfPower Studio [version] > DataDirect Connect
ODBC Help. You may require assistance from your network administrator to install this
ODBC Connection, as it requires site-specific information. For more information, see
Configuring a Data Source.
Security
What are the requirements for user names?
A user name is case sensitive, can be up to 20 characters, and can only include
alphanumeric characters and these symbols: . , - , or _.
DataFlux Integration Server User's Guide
111
Are there any restrictions for passwords?
There are no restrictions on the characters or words that can be used for passwords. A
password may be set to blank. Passwords do not expire. This is also true for passwords for
which you create password hashes using HashPassword.exe. If you are using
HashPasswordStrong.exe to generate password hashes, your password will need to
contain a minimum of: six characters, one number, one uppercase letter, and one lowercase
letter.
We do not need additional security, is it required?
No, you are not required to use the DIS security subsystem. It is available to fit your
business needs. Security is disabled by default.
Is there a limit to the number of users and groups I can add using DIS security?
There is no limit when using the DIS security command line option to add users and groups
however, if you are using DIS Security Manager to create users and groups, there is a limit
of 10,000 users and 10,000 groups.
Can we use OpenSSL to restrict the users in one or more domains?
Enabling LDAP access over SSL only encrypts LDAP network traffic, including all DIS LDAP
traffic to all configured servers. SSL has no impact on the number of LDAP servers or
domains a DIS instance can use.
How can I configure DIS to use one or more LDAP servers?
You can configure DIS to use any number of LDAP servers by entering them on the same
line in dfexec.cfg file, separated by a white-space. The LDAP servers will be queried in the
order in which they are specified in the dfexec.cfg file.
How do I restrict the LDAP servers that are used by DIS?
If you want to restrict DIS to use one or more specific LDAP servers, configure those servers
in dfexec.cfg file.
112
DataFlux Integration Server User's Guide
Troubleshooting
I saved a job in dfPower® Studio, but now I don't see it.
Check the job name. Job names can include alphanumeric characters only. If any of the
following characters are included in a job name, DataFlux® Integration Server (DIS) will not
list that job name and will not allow any operations on that file: , , . , ' , [ , ] , { , } , ( , ) ,
+ , = , _ , - , ^ , % , $ , @ , or !.
The DIS service failed to start.
When attempting to start DIS from the Microsoft® Windows® services dialog, the following
error appears:
DIS Windows Service Error
Alternatively, if you try to start DataFlux Integration Server Manager without starting the
Windows service, you will get the following error message:
DIS Timeout Error
Check to see if DIS created a log file. If so, you would expect to see the following two lines
in the log:
Syntax error at line 34.
Initialization failed.
If not, then the error is likely to be in the dfexec.cfg file itself. Look at the Windows
application log for an error message containing the reason the DIS service failed to start.
Some of the settings should be enclosed by single quotes. In this example, the license dir
setting in the dfexec.cfg file was changed so that it was no longer in the proper format.
DataFlux Integration Server User's Guide
113
The service is running, but I'm still getting the SOAP-ENV:Client: Timeout connect failed in tcp_connect() error.
Check the bottom edge of the Integration Server Manager main screen for the server and
port.
Verify that you are able to connect to the server listed. Make sure the Port value matches
the value of server listen port in the dfexec.cfg file. The default value is 21036. If these
values do not match, you can change server listen port in dfexec.cfg, or change server
port under Tools > Options from the Integration Server Manager main menu.
I can connect to the machine running the license server, but I cannot get a
license.
Typically, no license server port number needs to be explicitly specified. The license server
automatically selects an available port between 27000 and 27009. If a client machine can
connect to the license server, but the license client process cannot connect to the license
server process, an explicit port number must be specified.
In the license file on the server, a port number can be explicitly specified as the last item in
the SERVER line as shown below. The port number can be added or changed in the existing
license file without the need to regenerate it. The license server has to be restarted once a
port number is added or changed. For example:
SERVER [servername].com 001125c43cba 27000
On the client side, a port number can be explicitly specified by prepending it to the server
name, for example:
27000@[servername].com
I get one or both of the following error messages:
[time of error] (DATAFLUX) UNSUPPORTED: "DFLLDIAG2" (PORT_AT_HOST_PLUS )
phamj4@UTIL0H4GHXD1 (License server system does not support this feature. (18,327))
[time of error] (DATAFLUX) UNSUPPORTED: "DFLLDIAG1" (PORT_AT_HOST_PLUS )
phamj4@UTIL0H4GHXD1 (License server system does not support this feature. (18,327))
These error messages refer to the licenses for two built-in features that are used internally
by DataFlux for debugging. These licenses are not distributed, so the license checkout
request process for these two licenses fails and produces the errors noted. This is normal
and should occur only once, when the license checkout request is made for the first time.
114
DataFlux Integration Server User's Guide
When I try opening a job log from DIS Manager, I get the following error:
Error occurred while attempting to retrieve job log: SOAP-ENV:Client:UNKNOWN
error [or Timeout]
This occurs on some configurations of Microsoft Windows Server® 2003 when the log file is
greater than 32KB. A workaround for this problem is to set the following configuration value
in the dfexec.cfg file. This should only be necessary for DIS running on Windows Server
2003, and only if you experience this problem.
server send log chunk size = 32KB
DataFlux Integration Server User's Guide
115
Error Messages
Installation and Configuration
No Valid Base License Found
If you do not have a valid license file on your machine, you will get an error when
attempting to run a job with nodes that require that license. Following is an example of an
error you might expect in this case:
dtengine :: warning :: No valid base license found
dtengine :: error :: Job contains node with unknown id
'SAMPLEFIELDNAME'. The node may not be licensed, or the plugin for the
node may be unavailable.
dfexec :: error :: unable to load job
dfexec :: error :: aborted due to errors
See instructions for obtaining a license file for Windows or UNIX.
Data Source Connection Errors
When configuring a new data source, it is critical that parameters (such as DSN 29, host,
port, and sid) match exactly those used to create the job on the client machine. If the
connection fails, DataFlux® Integration Server (DIS) displays error messages describing the
reasons for the failure.
Common World Address Verification Error Codes
Error Code
Description
-100
Time for testing is expired.
156
Too many results after validation. Only the first 20 results will be presented.
157
No certification for this country.
204
Country not recognized.
205
Country database not found.
206
Country database in the wrong format or data corrupt.
207
Country database access denied. License may be missing.
300
No country rule could be loaded.
-1004
Country is locked.
-9999
Call encountered an error. The reason for the error is unknown.
29
A data source name (DSN) contains connection information, such as user name and password, to
connect through a database through an ODBC driver.
116
DataFlux Integration Server User's Guide
Security
401 Unauthorized
If the user is not authenticated, there will be an HTTP error, 401 Unauthorized. This could
mean that you have entered invalid user name and password credentials, or your user
account has not been set up. Contact your DIS Security Administrator for assistance.
403 Forbidden
When a user receives the HTTP error, 403 Forbidden, you have entered the system but do
not have permissions to execute a particular DIS command. Contact your administrator for
assistance.
Logging
An owner of an object may view the log and status for that object, but users must have
permissions to view other user logs.
Job with Custom Scheme Fails to Run
A job with a custom scheme that fails to run will produce an error similar to the following:
dtengine :: error :: Blue Fusion load scheme 'CUSTOM_SCHEME_1.sch'
failed: Blue Fusion error -801: file i/o error
dfexec :: error :: unable to initialise step: Standardization 2
dfexec :: error :: aborted due to errors
You must ensure that: (1) the Quality Knowledge Base (QKB) you are using on the
Integration Server is an exact copy of the QKB used on dfPower, and (2) the name of the
scheme is typed correctly, as it is case sensitive. To copy the QKB from Microsoft®
Windows® to UNIX®, use FTP or Samba mappings. You must restart the DIS service, and
retry the job. On some UNIX systems, there is a case sensitivity issue with the schemes.
Once you copy the QKB over to the UNIX server, make sure that the name of the scheme is
modified to all lowercase. It is located in the qkb directory, under /scheme.
Active X Control Required to View Help Files
In Internet Explorer® 6.0 and later, your network administrator can block ActiveX®
controls from being downloaded. Security for ActiveX content from CDs and local files can
be changed under Internet Options.
In Internet Explorer, click Tools > Internet Options. On the Advanced tab, under
Security, select Allow active content from CDs to run on My Computer, and Allow
active content to run in files on My Computer.
Locale Not Licensed
If your job has a locale selected that you do not have listed in your license, you will get an
error message similar to the following:
Error message DT engine: 2::ERROR::-105:Local
licensed
DataFlux Integration Server User's Guide
English [US] not
117
You must contact DataFlux Customer Support to update your license with the new locale.
Also verify that the data file for that locale is located in the /locale folder of your dfPower
installation.
Node Not Licensed
An error message similar to the following can occur when the user has more than one copy
of the license file:
Failed to create step: Couldn't instantiate step 'SOURCE_ODBC'. It is
not an available step. The node may not be licensed, or the plugin for
the node may be unavailable.
Check the \license directory of the dfPower Studio installation. Remove any extra copies of
studio.lic file from that folder.
Running Jobs and Real-Time Services
Error When Connecting to Excel Database
Because Microsoft Excel® is not a true database, you may occasionally experience problems
making the ODBC connection. This will produce an error similar to the following:
Failure while connecting to data source (Excel files) :
[HY000][Microsoft][ODBC Excel Driver] Cannot open database '(unknown)'.
It may not be a database that your application recognizes, or the file
may be corrupt. (=1028)
Naming the spreadsheet can fix this issue. To eliminate the error, highlight all of the cells in
the spreadsheet. Select Insert > Name and enter the name of the spreadsheet.
Error While Connecting to Source Repository
This repository error takes the following form:
Error occurred while connecting to source repository: Repository does
not exist, is corrupted, or not unified
There are two possible causes for this error. First, you may need to update your unified
repository. Or, when attempting to connect to a repository on Sybase® or Oracle®, you
may see this error message. You may see this error because the driver reports the incorrect
column size. To implement the option, add the string ColumnsAsChar to the data source in
the registry and set the value to 1. On UNIX machines, you should modify the data source
in the odbc.ini file, adding the line ColumnsAsChar=1.
For example:
On Microsoft Windows, HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI, edit the data
source in question and add the following string value and give it a value of 1:
ColumnSizeAsCharacter
118
DataFlux Integration Server User's Guide
On UNIX, edit the ODBC.ini file, find the entry for the data source in question and add this
line:
ColumnsAsChar=1
Functionality Disabled Error When Creating a CASS PS Form 3553
Why did I get a "functionality disabled" error message when creating a CASS PS
Form 3553?
The PS Form 3553 for CASS certification can be created only on Microsoft® Windows®,
Solaris™, and AIX™ systems.
Low Disk Space on UNIX
The following error indicates that temporary disk space on the machine running DIS is low:
dfexec :: error :: unable to read row 0
dtengine :: error :: File I/O operation failed. Possibly out of disk
space.
dfexec :: error :: unable to execute step: Data Joining 1
dfexec :: error :: aborted due to errors
Data source 1 contains 583,166 rows
Data source 2 contains 10,125,806 rows
You may be able to free up resources by increasing the efficiency of the job. Check the
conditions of the Join and see if you are running a many-to-many Join. Also, check to see if
there is a high number of NULL values in the two tables you are joining, which can also
cause a huge join. You also need to check the memory limitations on the UNIX system
where the DIS process is running to ensure that there is enough room for DIS to work
properly.
By default, DIS uses /tmp directory for temporary files. You need to redirect the TMPDIR
environment variable by issuing the export TMPDIR=/[newTempPath] command. You can
check this environment variable and change the settings with the following steps:
1.
Stop the DIS service.
2.
From the UNIX shell, type: env | grep TMPDIR. If results come back blank that means
you do not have your TMPDIR redirected as an environment variable.
3.
Type: export TMPDIR=/[newTempPath].
4.
Start the DIS service.
5.
Edit the .profile file and add export TMPDIR=/[newTempPath] to .profile for the user
who starts DIS.
Errors Using the Java Plugin for UNIX
The architect.cfg file contains a setting called java vm that references the location of the
Java™ Virtual Machine (JVM) DLL. If this setting is not configured properly, you will receive
an error that the JVM could not be loaded.
DataFlux Integration Server User's Guide
119
Make sure you compile your JAVA code using a Java Development Kit of the same version or
earlier than the JRE version you are running on your DIS machine.
If the java classpath setting is incomplete, Architect or DIS will report an error because the
code could not be loaded. Check to make sure your code and any dependencies are
accessible and specified in the classpath setting.
If the java classpath setting is empty, the only Java code that will be accessible to the Java
Plugin are the examples that ship with Architect and DIS. Refer to DataFlux dfPower Online
Help, "Architect - Java Plugin - Examples" for information.
120
DataFlux Integration Server User's Guide
Appendix A: Best Practices
Use a System Data Source Rather Than a User Data Source
Add the new data source as a System data source name (DSN 30), rather than a User DSN,
so it will be available to all users of the machine, including Microsoft® Windows NT®
services.
Use Connection Manager to Configure Data Sources
When developing DataFlux® dfPower® Architect jobs or services, use the Connection
Manager to set up and store connection information to any Open Database Connectivity
(ODBC 31) data source.
Use global variables within Architect jobs and services to accept or retrieve data. Using
global variables increases the flexibility and portability of dfPower Architect jobs and
services between data sources.
In this way, the connection information to the master reference database is made
permanent and independent of the Architect job or service. The saved information does not
have to be entered each time the job is run, and that information can be used by any
DataFlux application.
Windows - This connection information is stored in the directory specified in one or both of
the following registry entries:
•
HKEY_CURRENT_USER\Software\DataFlux
Corporation\dac\[version]\savedconnectiondir
•
HKEY_LOCAL_MACHINE\Software\DataFlux
Corporation\dac\[version]\savedconnectiondir
where [version] indicates the version of DIS that you have installed.
If neither of these entries exist, connection strings are saved in the \dac folder on the job's
run-time machine.
UNIX - The connection information is saved to a file in the /$HOME/.dfpower/dsn directory.
Plan Your Security Model Based on Business Needs
The DataFlux Integration Server (DIS) application is a network resource that is used to
access and modify your data. A well-planned security model is based on usage policy, risk
assessment, and response. Determining user and group usage policies prior to
implementation helps you to minimize risk, maximize utilization of the technology, and
expedite deployment.
30
A data source name (DSN) contains connection information, such as user name and password, to
connect through a database through an ODBC driver.
31
Open Database Connectivity (ODBC) is an open standard application programming interface (API) for
accessing databases.
DataFlux Integration Server User's Guide
121
For more information, see Security Policy Planning.
Consider DIS Performance when Modifying Configuration
Settings
Changes to several of the configuration settings in DIS can affect performance. For
example, large temporary files, log files, and memory clusters can slow down the server.
Several settings that are in the dfexec.cfg file or can be added to dfexec.cfg can alter
memory allocation or processing power.
For more information, see Configuration Settings.
122
DataFlux Integration Server User's Guide
Appendix B: Code Examples
The following content instructs you on how to create and connect to the DataFlux®
Integration Server (DIS). Zip files are available with files for the examples. The
integrationserversamples.zip file is located in the DIS\[version] directory for Microsoft®
Windows® operating system installations.
The DataFlux Web Service Definition Language (WSDL) file contains the set of definitions to
describe the Web service. You can point directly to this file using either the directory path,
such as C:\Program Files\DataFlux\DIS\[version]\share\arch.wsdl, or the URL, using the
following syntax:
http://[servername]:port/?wsdl
Using an XML command, you can edit and view the arch.wsdl file that is installed on your
DIS. Update the SOAP:address location to reflect the hostname and port number of the DIS.
For example:
<SOAP:address location="http://localhost:21036"/>
Additionally, you can view the WSDL file via a web browser. From this view, the value of
SOAP:address location will reflect your actual hostname and port number.
<SOAP:address location="http://[hostname]:21036"/>
There are coding examples of these operations in each language listed below: Get Object
List, Post Object, Delete Object, Get Architect Service Params, Execute Architect Service,
Run Architect Job, Run Profile Job, Get Job Status, Get Job Log, Terminate Job, Clear Log.
Java
Use wscompile, supplied with the Java™ Web Services Developer Pack, to build Java classes
that wrap the DIS interface. This creates all of the classes required to interface with DIS for
any application that has the ability to use these classes.
Examples
Following are examples using the Java classes constructed from the WSDL.
////////////////////////////////////////////////////////
// Imports
////////////////////////////////////////////////////////
import arch.*;
////////////////////////////////////////////////////////
// INITIALIZATION
////////////////////////////////////////////////////////
ArchitectServicePortType_Stub stub;
// get the stub
stub =(ArchitectServicePortType_Stub)new
DQISService_Impl()).getDQISService();
// optionally set to point to a different end point
DataFlux Integration Server User's Guide
123
stub._setProperty(javax.xml.rpc.Stub.ENDPOINT_ADDRESS_PROPERTY,
"http://MY_SERVER:PORT");
////////////////////////////////////////////////////////
// 1) Get Object List example
////////////////////////////////////////////////////////
String[] res;
res=stub.getObjectList(ObjectType.ARCHSERVICE);
////////////////////////////////////////////////////////
// 2) Post Object example
////////////////////////////////////////////////////////
byte[] myData; ObjectDefinition obj = new ObjectDefinition();
obj.setObjectName("NAME");
obj.setObjectType(ObjectType.fromString("ARCHSERVICE"));
// read the job file in from the h/d
myData = getBytesFromFile(new File(filename));
// post the job to the server
String res=stub.postObject(obj, myData);
////////////////////////////////////////////////////////
// 3) Delete Object
////////////////////////////////////////////////////////
ObjectDefinition obj = new ObjectDefinition();
obj.setObjectName("MYJOB.dmc");
obj.setObjectType(ObjectType.fromString("ARCHSERVICE"));
String res = stub.deleteObject(obj);
////////////////////////////////////////////////////////
// 4) Get Architect Service Params
////////////////////////////////////////////////////////
GetArchitectServiceParamResponse resp;
FieldDefinition[] defs;
resp=stub.getArchitectServiceParams("MYJOB.dmc","");
// Get Definitions for Either Input or Output
defs=resp.getInFldDefs();
defs=resp.getOutFldDefs();
//Loop through Defs
defs[i].getFieldName();
defs[i].getFieldType();
defs[i].getFieldLength();
////////////////////////////////////////////////////////
// 5) Execute Architect Service
////////////////////////////////////////////////////////
FieldDefinition[] defs;
DataRow[] rows;
String[] row;
GetArchitectServiceResponse resp;
// Fill up the Field Definitions
defs=new FieldDefinition[1];
defs[0] = new FieldDefinition();
defs[0].setFieldName("NAME");
defs[0].setFieldType(FieldType.STRING);
defs[0].setFieldLength(15);
// Fill up Data matching the definition
124
DataFlux Integration Server User's Guide
rows = new DataRow[3];
row=new String[1];
row[0] ="Test Data";
rows[i] = new DataRow();
rows[i].setValue(row[0]);
resp=stub.executeArchitectService("MYJOB.dmc", defs, rows, "");
// Get the Status, Output Fields and Data returned from the Execute Call
String res = resp.getStatus();
defs=resp.getFieldDefinitions();
rows=resp.getDataRows();
// Output Field Definitions
defs[i].getFieldName();
defs[i].getFieldType();
defs[i].getFieldLength();
// Output Data
row=rows[i].getValue();
res=row[j];
////////////////////////////////////////////////////////
// 6) Run Architect Job
////////////////////////////////////////////////////////
ArchitectVarValueType[] vals;
vals=new ArchitectVarValueType[1];
vals[0]=new ArchitectVarValueType();
vals[0].setVarName("TESTVAR");
vals[0].setVarValue("TESTVAL");
// Returns JOBID
String res=stub.runArchitectJob("MYJOB.dmc", vals, "");
////////////////////////////////////////////////////////
// 7) Run Profile Job
////////////////////////////////////////////////////////
String res=stub.runProfileJob(
"MYJOB.pfi",
/* Job Name */
"",
/* Output file to create (not used in this case) */
"repos",
/* Repository name to write results to */
"New Report",
/* Report name to create */
"Description",
/* Description of run */
0,
/* Append to existing (false) */
vals,
/* var/values */
""
/* reserved */
);
////////////////////////////////////////////////////////
// 8) Get Job Status
////////////////////////////////////////////////////////
JobStatusDefinition[] defs;
// if you wanted the status for a single job, you would
// pass the jobid returned from runArchitectJob or runProfileJob
defs=stub.getJobStatus("");
ObjectDefinition obj;
obj=defs[i].getJob();
defs[i].getJobid();
DataFlux Integration Server User's Guide
125
defs[i].getStatus();
obj.getObjectName()
obj.getObjectType()
////////////////////////////////////////////////////////
// 9) Get Job Log
////////////////////////////////////////////////////////
GetJobLogResponseType resp;
FileOutputStream fo;
resp=stub.getJobLog(jobId,0);
// write it to a file
fo = new FileOutputStream (resp.getFileName());
fo.write(resp.getData());
fo.close();
////////////////////////////////////////////////////////
// 10) Terminate Job
////////////////////////////////////////////////////////
String res=stub.terminateJob(jobId);
////////////////////////////////////////////////////////
// 11) Clear Log
////////////////////////////////////////////////////////
String res=stub.deleteJobLog(jobId);
C++
The client API consists of three header files and one .lib file. The headers include all
necessary type enumerations. All required .dlls are provided within the dfPower® Studio
installation. A connection handle should be initialized before use. and freed by the terminate
function when no longer needed.
////////////////////////////////////////////////////////
// Imports
////////////////////////////////////////////////////////
#include "arscli.h"
#include "acjob.h"
#include "acrta.h"
Also requires arscli11.lib
////////////////////////////////////////////////////////
// INITIALIZATION
////////////////////////////////////////////////////////
acj_handle_t *pHandle = acj_initialize(sServer, nPort);
////////////////////////////////////////////////////////
// DESTRUCTION OF HANDLE at end of use
////////////////////////////////////////////////////////
acj_terminate(pHandle);
126
DataFlux Integration Server User's Guide
////////////////////////////////////////////////////////
// ERROR MESSAGES
////////////////////////////////////////////////////////
const char *err_code, *err_text, *err_detail;
err_code = acj_get_error(pHandle, &err_text, &err_detail);
////////////////////////////////////////////////////////
// 1) Get Object List example
////////////////////////////////////////////////////////
int nNumJobs;
char **job_list;
job_list = acj_joblist(pHandle, RTARCHITECT /*ARCHITECT or PROFILE*/,
&nNumJobs);
////////////////////////////////////////////////////////
// 2) Post Object example
////////////////////////////////////////////////////////
rc = acj_post_job(pHandle, "JOB_NAME", "FILE", RTARCHITECT/*ARCHITECT,
PROFILE*/);
////////////////////////////////////////////////////////
// 3) Delete Object
////////////////////////////////////////////////////////
rc = acj_delete_job(pHandle, "JOB_NAME",RTARCHITECT/*ARCHITECT, PROFILE*/);
////////////////////////////////////////////////////////
// 4) Get Architect Service Params
////////////////////////////////////////////////////////
int nNumInputs, nNumOutputs;
const char *err_code, *err_text, *err_detail;
rc = acj_rt_io_info(pHandle, mJobName, &nNumInputs, &nNumOutputs);
int i, nColSize;
rta_data_type nColType;
const char *sColName;
rc = acj_rt_input_fieldinfo(pHandle, i, &sColName, &nColType, &nColSize);
rc = acj_rt_output_fieldinfo(pHandle, i, &sColName, &nColType, &nColSize);
////////////////////////////////////////////////////////
// 5) Execute Architect Service
////////////////////////////////////////////////////////
// Set up the input columns
int i, rc = 0;
CString sColName;
// This data is set when getting the parameter info
int *mColSizes new int[nNumInputs];
rta_data_type *mColTypes = new rta_data_type[nNumInputs];
//Loop and Load inputs FIELD info
rc = rta_set_infield_info(pHandle, i, sColName, mColTypes[i], mColSizes[i]);
//Loop and Add the input data
rc = rta_add_row(pHandle);
//For Each row add all the data for the columns/fields
rc = rta_set_data_value(pHandle, j, "VALUE");
// Run the test
rc = rta_run(pHandle);
DataFlux Integration Server User's Guide
127
// Get the number of output columns
int nNumCols;
nNumCols = rta_output_numfields(pHandle);
// Get the output column information
int nOutSize;
rta_data_type nOutType;
const char *sOutName;
for (i = 0; i < nNumCols; i++)
rc = rta_output_fieldinfo(pHandle, i, &sOutName, &nOutType, &nOutSize);
// Get the number of output rows
int nNumRows;
nNumRows = rta_output_numrows(pHandle);
// Get The output
const char *sOutVal;
for (i = 0; i < nNumRows; i++)
for (j = 0; j < nNumCols; j++)
sOutVal = rta_output_data(pHandle, j, i);
acj_terminate(pHandle);
////////////////////////////////////////////////////////
// 6) Run Architect Job
////////////////////////////////////////////////////////
int rc;
int mVarCount = 1;
acj_arch_var_value *mVarArray;
mVarArray = new acj_arch_var_value[mVarCount];
//LOAD ARRAY
CString sTemp = "Test Data";
mVarArray[0].var_name = new char[sTemp.GetLength() + 1];
strcpy(mVarArray[0].var_name, sTemp);
sTemp = "Test Value";
mVarArray[0].var_value = new char[sTemp.GetLength() + 1];
strcpy(mVarArray[0].var_value, sTemp);
char sJobID[ACJ_JOBID_SIZE];
CString sJobName = "JOB_NAME";
acj_job_type nType = ARCHITECT;
rc = acj_run_arch_job(pHandle, sJobName, mVarArray, mVarCount, sJobID);
////////////////////////////////////////////////////////
// 7) Run Profile Job
////////////////////////////////////////////////////////
int rc;
int mVarCount = 1;
acj_arch_var_value *mVarArray;
mVarArray = new acj_arch_var_value[mVarCount];
//LOAD ARRAY
CString sTemp = "Test Data";
mVarArray[0].var_name = new char[sTemp.GetLength() + 1];
strcpy(mVarArray[0].var_name, sTemp);
sTemp = "Test Value";
mVarArray[0].var_value = new char[sTemp.GetLength() + 1];
128
DataFlux Integration Server User's Guide
strcpy(mVarArray[0].var_value, sTemp);
char sJobID[ACJ_JOBID_SIZE];
CString sJobName = "JOB_NAME";
// REPORT FILE
rc = acj_run_prof_job(pHandle, sJobName, "FileName", 0,
1/*Append - 1, Truncate - 0*/, 0,
"Description", mVarArray, 1, sJobID);
// Repository
rc = acj_run_prof_job(pHandle, sJobName, 0, "ReposName",
1/*Append - 1, Truncate - 0*/, "ReportName",
"Description", mVarArray, 1, sJobID);
////////////////////////////////////////////////////////
// 8) Get Job Status
////////////////////////////////////////////////////////
int nNumStats;
int rc = acj_get_job_status(pHandle, ""/*or "JobID"*/, &nNumStats);
acj_job_type nType;
char *sName, *sJobID, *sStatus;
for (int i = 0; i < nNumStats; i++)
rc = acj_get_job_status_item(pHandle, i, &sName, &sJobID, &nType, &sStatus);
////////////////////////////////////////////////////////
// 9) Get Job Log
////////////////////////////////////////////////////////
char sLogFile[MAX_PATH];
GetTempFileName(dfReadIniFile("Environment", "WorkingPath"), "ISM", 0,
sLogFile)
int rc = acj_get_job_log(pHandle, "JOBID", sLogFile);
////////////////////////////////////////////////////////
// 10) Terminate Job
////////////////////////////////////////////////////////
int rc = acj_terminate_job(pHandle, "JOBID");
////////////////////////////////////////////////////////
// 11) Clear Log
////////////////////////////////////////////////////////
int rc = acj_delete_job_log(pHandle, "JOBID");
C#
Using the DataFlux WSDL file, import a web reference into your project. This builds the
object required to interface with the DIS.
////////////////////////////////////////////////////////
// Imports
DataFlux Integration Server User's Guide
129
////////////////////////////////////////////////////////
// Add Web reference using the DataFlux supplied WSDL
////////////////////////////////////////////////////////
// INITIALIZATION
////////////////////////////////////////////////////////
DQISServer.DQISService mService= new DQISServer.DQISService();
mService.Url = "http://MYDISSERVER" + ":" + "PORT";
////////////////////////////////////////////////////////
// 1) Get Object List example
////////////////////////////////////////////////////////
string[] jobs;
jobs=mService.GetObjectList(DQISServer.ObjectType.ARCHSERVICE);
////////////////////////////////////////////////////////
// 2) Post Object example
////////////////////////////////////////////////////////
DQISServer.ObjectDefinition def = new DQISServer.ObjectDefinition();
def.objectName = "MYJOB";
def.objectType = DQISServer.ObjectType.ARCHSERVICE;
// Grab Bytes from a job file
byte[] data = new byte[short.MaxValue];
FileStream fs = File.Open(@"c:\Develop\SoapUser\DISTESTRT.DMC",
FileMode.Open, FileAccess.Read, FileShare.None);
fs.Read(data,0,data.Length);
DQISServer.SendPostObjectRequestType req= new
DQISServer.SendPostObjectRequestType();
req.@object = def;
req.data = data;
mService.PostObject(req);
////////////////////////////////////////////////////////
// 3) Delete Object
////////////////////////////////////////////////////////
DQISServer.SendDeleteObjectRequestType req = new
DQISServer.SendDeleteObjectRequestType();
DQISServer.ObjectDefinition def = new DQISServer.ObjectDefinition();
def.objectName = "MYJOB";
def.objectType = DQISServer.ObjectType.ARCHSERVICE;
req.job = def;
mService.DeleteObject(req);
////////////////////////////////////////////////////////
// 4) Get Architect Service Params
////////////////////////////////////////////////////////
DQISServer.GetArchitectServiceParamResponseType resp;
DQISServer.SendArchitectServiceParamRequestType req;
req=new DQISServer.SendArchitectServiceParamRequestType();
req.serviceName="MYJOB";
130
DataFlux Integration Server User's Guide
resp=mService.GetArchitectServiceParams(req);
string val;
int i;
DQISServer.FieldType field;
// loop through this data
val = resp.inFldDefs[0].fieldName;
i = resp.inFldDefs[0].fieldLength;
field = resp.inFldDefs[0].fieldType;
val = resp.outFldDefs[0].fieldName;
i = resp.outFldDefs[0].fieldLength;
field = resp.outFldDefs[0].fieldType;
////////////////////////////////////////////////////////
// 5) Execute Architect Service
////////////////////////////////////////////////////////
DQISServer.SendArchitectServiceRequestType req = new
DQISServer.SendArchitectServiceRequestType();
DQISServer.GetArchitectServiceResponseType resp;
////////////////////////////////////////////////////////
DQISServer.GetArchitectServiceParamResponseType respParam;
DQISServer.SendArchitectServiceParamRequestType reqParam;
reqParam=new DQISServer.SendArchitectServiceParamRequestType();
reqParam.serviceName="ServiceName";
respParam=mService.GetArchitectServiceParams(reqParam);
////////////////////////////////////////////////////////
DQISServer.FieldDefinition[] defs;
DQISServer.DataRow[] data_rows;
string[] row;
defs=new DQISServer.FieldDefinition[respParam.inFldDefs.Length];
for(int i=0; i < respParam.inFldDefs.Length; i++)
{
// Fill up the Field Definitions
defs[i] = new DQISServer.FieldDefinition();
defs[i].fieldName = respParam.inFldDefs[i].fieldName;
defs[i].fieldType = respParam.inFldDefs[i].fieldType;
defs[i].fieldLength = respParam.inFldDefs[i].fieldLength;
}
DataTable table = m_InputDataSet.Tables["Data"]; // externally provided data
// Fill up Data matching the definition
data_rows = new DQISServer.DataRow[Number of Rows];
for(int i=0;i < table.Rows.Count;i++)
{
System.Data.DataRow myRow = table.Rows[i];
row=new String[table.Columns.Count];
for(int c=0;c < table.Columns.Count;c++)
{
row[c] = myRow[c].ToString();
}
// Loop and create rows of data to send to the service
data_rows[i] = new DQISServer.DataRow();
data_rows[i].value = new string[table.Columns.Count];
DataFlux Integration Server User's Guide
131
data_rows[i].value = row;
}
req.serviceName = "ServiceName";
req.fieldDefinitions = defs;
req.dataRows = data_rows;
resp=mService.ExecuteArchitectService(req);
////////////////////////////////////////////////////////
// 6) Run Architect Job
////////////////////////////////////////////////////////
DQISServer.SendRunArchitectJobRequest req = new
DQISServer.SendRunArchitectJobRequest();
DQISServer.GetRunArchitectJobResponse resp;
DQISServer.ArchitectVarValueType[] varVal = new
DQISServer.ArchitectVarValueType[1];
varVal[0] = new DQISServer.ArchitectVarValueType();
varVal[0].varName = "TESTVAR";
varVal[0].varValue = "TESTVAL";
req.job = "JOB_NAME";
req.varValue = varVal;
resp = mService.RunArchitectJob(req);
string jobid = resp.jobId;
////////////////////////////////////////////////////////
// 7) Run Profile Job
////////////////////////////////////////////////////////
DQISServer.SendRunProfileJobRequestType req = new
DQISServer.SendRunProfileJobRequestType();
DQISServer.GetRunProfileJobResponseType resp;
req.jobName = "JOB_NAME";
req.reportName = "REPORT_NAME";
// use this: req.repositoryName = "REPOSNAME";
// or this:
req.fileName = "FILE_NAME";
req.description = "DESCRIPTION";
req.append = 0;//No - 0; Yes - 1
resp = mService.RunProfileJob(req);
string jobid = resp.jobId;
////////////////////////////////////////////////////////
// 8) Get Job Status
////////////////////////////////////////////////////////
DQISServer.SendJobStatusRequestType req = new
DQISServer.SendJobStatusRequestType();
DQISServer.JobStatusDefinition[] resp;
132
DataFlux Integration Server User's Guide
req.jobId = "";
resp = mService.GetJobStatus(req);
DQISServer.ObjectDefinition def = resp[0].job;
string jobid = resp[0].jobid;
string jobstatus = resp[0].status;
////////////////////////////////////////////////////////
// 9) Get Job Log
////////////////////////////////////////////////////////
DQISServer.SendJobLogRequestType req = new
DQISServer.SendJobLogRequestType();
DQISServer.GetJobLogResponseType resp;
req.jobId = "SOMEJOBID";
resp = mService.GetJobLog(req);
string fileName = resp.fileName;
byte []data = resp.data;
////////////////////////////////////////////////////////
// 10) Terminate Job
////////////////////////////////////////////////////////
DQISServer.SendTerminateJobRequestType req = new
DQISServer.SendTerminateJobRequestType();
DQISServer.GetTerminateJobResponseType resp;
req.jobId = "SOMEJOBID";
resp = mService.TerminateJob(req);
string fileName = resp.status;
////////////////////////////////////////////////////////
// 11) Clear Log
////////////////////////////////////////////////////////
DQISServer.SendDeleteJobLogRequestType req = new
DQISServer.SendDeleteJobLogRequestType();
DQISServer.GetDeleteJobLogResponseType resp;
req.jobId = "SOMEJOBID";
resp = mService.DeleteJobLog(req);
string fileName = resp.status;
DataFlux Integration Server User's Guide
133
Appendix C: Saving Profile
Reports to a Repository
In order for DataFlux® Integration Server (DIS) to store Profile reports in a repository, the
profrepos.cfg configuration file must exist in the \etc directory of the DIS installation. If you
have already configured a Profile repository on a Microsoft® Windows® machine running
dfPower® Studio, you can copy the profrepos.cfg file from the \etc directory of dfPower®
Studio to the \etc directory of your DIS installation.
The profrepos.cfg file contains the list of available repositories and specifies one of them to
be the default. The format for the profrepos.cfg file is:
$defaultrepos='Repos1 Name'
Repos1 Name='ODBC DSN' 'Table Prefix'
Repos2 Name='ODBC DSN' 'Table Prefix'
Repos3 Name='ODBC DSN' 'Table Prefix'
where:
$defaultrepos = the default repository.
Repos1 Name = the user-defined name for the repository.
ODBC DSN = the data source name (DSN 32) defined for the ODBC connection.
Table Prefix = the prefix that was given for the repository when it was created.
The following example shows three repositories configured for Profile reports: TEST,
dfPower Sample Prod, and dfPower Sample Cust. The two dfPower Sample examples are
stored in the same database and use table prefixes, Prod_ and Cust_, to create a unique set
of tables for each repository. The default repository is dfPower Sample Prod.
The following is an example of the profrepos.cfg file:
$defaultrepos='dfPower Sample Prod'
TEST='TEST DATABASE' ''
dfPower Sample Prod='DataFlux Sample' 'Prod_'
dfPower Sample Cust='DataFlux Sample' 'Cust_'
Manage Profile repositories on any supported platform using the Profile Repository
Administrator, which can be accessed from the Start menu of any DataFlux dfPower Studio
for Windows installation.
For more information on creating and maintaining Profile repositories, see the dfPower
Studio Online Help topic, "dfPower Profile - Profiling Repositories."
32
A data source name (DSN) contains connection information, such as user name and password, to
connect through a database through an ODBC driver.
134
DataFlux Integration Server User's Guide
Appendix D: SOAP Commands
SOAP Commands
DataFlux® Integration Server (DIS) supports commands using SOAP 33. Many of these
commands have pre-defined enumeration values that are used in conjunction with various
commands. These enumeration values are listed, and their association with the various
SOAP commands is shown in the commands table.
Enumeration Values
These are the options that are pre-defined for many of the SOAP commands listed in the
SOAP Commands Table.
ObjectType
•
ARCHSERVICE
•
ARCHITECT
•
PROFILE
FieldType
•
UNKNOWN
•
BOOLEAN
•
INTEGER
•
REAL
•
STRING
•
DATE
AccountType
•
USER
•
GROUP
Other commands require string input, and are so identified.
33
Simple Object Access Protocol (SOAP) is a Web service protocol used to encode requests and
responses to be sent over a network. This XML-based protocol is platform independent and can be
used with a variety of internet protocols.
DataFlux Integration Server User's Guide
135
Following are the commands recognized by DIS:
Command
Description
AddToGroup
Add an account (user or group) to a group.
DeleteFromGroup
Remove an account (user or group) from a group.
DeleteJobLog
Deletes a job log and status. This command essentially removes
any reference DIS has for this job, so the client cannot make
any queries for this job after making this call. This command
works only for jobs that have completed.
DeleteObject
Deletes an existing object of a particular type from the server.
ExecuteArchitectService
Runs an Architect service. The client stays connected until DIS
sends back a status response and result data.
GetArchitectServiceParams Gets required input and produces output fields of an Architect
service.
GetArchitectServicePreload Preload a service or list of services.
GetArchitectServiceUnload Unload a service or list of services.
GetJobLog
Returns the log file for a Profile or Architect job that was
previously started. The job does not have to be finished for this
command to work.
GetJobStatus
Returns the status of a Profile or Architect job that was
previously started, or can be used to return status for all jobs if
a Job ID is not passed in.
GetLoadedObjectList
Retrieve a list of currently loaded services.
GetObjectsList
Returns a list of objects of a particular object type. The types
are: Architect real-time services, Architect jobs, and Profile jobs.
GetObjFile
Get an object file (a job or a service).
ListAccounts
Retrieve a list of user or group accounts.
ListGroupMembers
Retrieve a list of accounts (user or group) in a group.
ListUserGroups
Retrieve a list of groups that a user belongs to.
MaxNumJobs
Dynamically set the max number of services allowed to run
concurrently.
PostObject
Posts a new object to the server of an object type: service/archjob/prof-job. If such a service/job already exists, the client will
get an error.
RepositoryList
Get a list of Profile repositories.
RunArchitectJob
Starts running an Architect job. The client has to make new
connections to check on its status.
RunProfileJob
Starts running a Profile job. The client has to make new
connections to check on its status.
ServerVersion
Get the server version.
TerminateJob
Terminates a running job. The client can still get the status and
log after the job has been terminated.
136
DataFlux Integration Server User's Guide
Appendix E: DIS Service
DataFlux® Integration Server (DIS) runs as a Microsoft® Windows® service (called
DataFlux Integration Server). You can start and stop the service using the Microsoft
Management Console (MMC 34). DIS uses the DataFlux dfPower® Studio execution
environment for Windows to execute real-time services, as well as Architect and Profile
jobs.
In UNIX, DIS runs as a daemon administered from a command line. The disadmin
application is used to start and stop the daemon. Real-time services, Architect, and Profile
jobs associated with DIS are administered through the dfPower environment.
Windows
Starting and Stopping DIS in Windows
When installed in a Microsoft Windows environment, DIS runs as a Windows service (named
DataFlux Integration Server).
Start and stop the service using the MMC. The MMC hosts administrative tools that you can
use to administer networks, computers, services, and other system components.
1.
Click Start > Settings > Control Panel.
2.
Double-click Administrative Tools > Computer Management. This brings up the
MMC.
3.
Expand the Services and Applications folder.
4.
Click Services.
5.
Click DataFlux Integration Server.
6.
Click either Stop the service or Restart the service.
Modifying DIS Windows Service Log On
When DIS is installed, it creates a service named DataFlux Integration Server. By default,
this service is started using the local system account.
Note: Because this account may have some restrictions (such as
accessing network drives) we suggest that you modify the service
properties to have the service log on using a user account with the
appropriate privileges, such as access to required network drives and
files. For security reasons, you should assign administrative privileges
only if necessary.
34
The Microsoft Management Console (MMC) is an interface new to the Microsoft Windows 2000
platform which combines several administrative tools into one configurable interface.
DataFlux Integration Server User's Guide
137
To modify the DIS log on:
1.
Select Control Panel > Administrative Tools.
2.
Double-click Services, and select the DataFlux Integration Server service.
3.
Select the Log On tab, select This account, and enter Account and Password
credentials for a user with administrative privileges.
UNIX
Starting and Stopping DIS Daemon in UNIX/Linux
Start and stop the daemon using the disadmin application included in the installation. This
application can be run using the command-line command: ./bin/disadmin [yourcommand]
from the installation root directory.
[yourcommand] should be one of the following:
Command
Description
start
Starts the Integration Server.
stop
Stops the Integration Server.
status
Checks that the Integration Server is running.
help
Displays this message.
version
Displays the version information.
For example:
./bin/disadmin start — Starts the server
./bin/disadmin stop — Stops the server
138
DataFlux Integration Server User's Guide
Appendix F: Configuration
Settings
The following is a list of DataFlux® Integration Server (DIS) and dfArchitect configuration
settings or directives. These may need to be modified prior to running the server. Some
examples of the settings can be found in the dfexec.cfg configuration file.
•
General DIS Configuration Directives
•
DIS Security Related Configuration Directives
•
Architect Configuration Directives
•
Data Access Component Directives
Best Practice: Refer to Appendix A: Best Practices - Consider DIS Performance when
Modifying Configuration Settings for additional information about Configuration Settings.
General DIS Configuration Directives
The following table lists the configuration settings for DIS:
Setting
arch job path
Description
Refers to the location of the dfPower Architect batch job files. If a value is
not set here, it will default to a new directory, arch_job, created under the
working directory (Integration Server). All values containing special
characters or spaces must be enclosed in single quotes.
# Windows Example
arch job path = 'C:\Program
Files\DataFlux\DIS\[version]\arch_job'
# UNIX Example
arch job path =
'/opt/dataflux/aix/[version]/dfpower/etc/arch_job'
DataFlux Integration Server User's Guide
139
Setting
arch svc path
Description
Sets the path to Architect real-time services. If not configured, it will
default to a new directory, svc_job, created under the working directory
(Integration Server). All values containing special characters or spaces
must be enclosed in single quotes.
# Windows Example
arch svc path = 'C:\Program
Files\DataFlux\DIS\[version]\svc_job'
# UNIX Example
arch svc path = '/opt/dataflux/aix/DIS/[version]/svc_job'
dfexec max
num
Maximum number of dfexec processes that can run simultaneously. The
default is 10 (Integration Server).
# Windows or UNIX Example
dfexec max num = 5
dfexec exe path
Path to the dfexec.cfg executable. It defaults to the bin directory
(Integration Server).
# Windows Example
dfexec exe path = C:\Program
Files\DataFlux\dis\[version]\bin
# UNIX Example
dfexec exe path = /opt/dataflux/aix/dis/[version]/bin
dfsvc max errs
Sets the maximum number of errors that a dfwsvc process is allowed to
encounter before it is terminated by DIS. The default is -1, which disables
the function and sets no limit on the number of errors. Any number less
than one will default to -1.
# Windows or UNIX Example
dfsvc max errs = 10
dfsvc max
requests
Sets the number of requests a dfwsvc process can handle before being
terminated by DIS. The default is -1, meaning no limit is set. If a number
less than one is entered, the value defaults to -1.
# Windows or UNIX Example
dfsvc max requests = 5
dfsvc preload
Designates specific services and the count for each service that DIS
preloads during startup. This can be used in conjunction with dfsvc preload
all. For more information and formatting guidelines, see Pre-loading
Services.
# Windows or UNIX Example
dfsvc preload = 2:svc_1.dmc -1:subdir1\svc_2.dmc 3:svc_3.dmc
dfsvc preload
140
Causes DIS to find and preload all services a specified number of times.
DataFlux Integration Server User's Guide
Setting
all
Description
This includes services found in subdirectories. The number of instances
specified must be an integer greater than zero, or the directive is ignored.
This can be used in conjunction with dfsvc preload. For more information,
see Pre-loading Services.
# Windows or UNIX Example
dfsvc preload all = 2
dfwsvc debug
Specifies whether dfwsvc should run in debug mode. If set to yes, dfwsvc
always creates a log file regardless of the dfwsvc log setting. The default is
no.
# Windows or UNIX Example
dfwsvc debug = yes
dfwsvc exe path
Specifies the path to the dfwsvc executable. If not set, it defaults to the
directory containing the DIS executable. The dfwsvc executable is used
when WLP DIS client access is required. The server child listen and server
wlp listen options must be specified as well.
# Windows Example
dfwsvc exe path = C:\Program
Files\DataFlux\DIS\[version]\bin
# UNIX Example
dfwsvc exe path = /opt/dataflux/aix/dis/[version]/bin
dfwsvc log
Specifies whether dfwsvc should create a log file. The default is no.
# Windows or UNIX Example
dfwsvc log = yes
dfwsvc max
num
Specifies the maximum number of dfwsvc instances, or Architect services,
that may be running at any given time.
# Windows or UNIX Example:
dfwsvc max num = 3
job io path
Directory containing Architect and Profile job input and output files. If a job
does not specify a path to input or output files, DIS will use the directory
specified with this configuration. If not set, the default location will be a
new directory created under the working directory. All values containing
special characters or spaces must be enclosed in single quotes.
# Windows Example
job io path = 'C:\Program
Files\DataFlux\DIS\[version]\temp'
# UNIX Example
job io path = '/opt/dataflux/linux/dis/[version]/temp'
DataFlux Integration Server User's Guide
141
Setting
log dir
Description
This is the log path for the DIS, dfsvc, and dfexec.cfg log files. If not set, it
defaults to the directory containing the DIS executable in Microsoft®
Windows® or to $DFEXEC_HOME/etc/ in UNIX® (Integration Server). All
values containing special characters or spaces must be enclosed in single
quotes.
# Windows Example
log dir = 'C:\Program Files\DataFlux\DIS\[version]\log'
# UNIX Example
log dir = '/opt/dataflux/dis/[version]/log'
log
Controls whether or not to log object list requests (Integration Server).
get_objects_list
The default is 0. Some clients may automatically send certain types of
requests
requests which result in an unnecessarily long and difficult to read log file.
The following directive can instruct a server not to log particular requests.
Note: Setting this option to yes may cause large log files
to be generated.
# Windows or UNIX Example
log get_objects_list requests = 1 [1=yes/0=no]
log get_status
requests
Controls whether or not to log status requests (Integration Server). The
default is zero. Some clients may automatically send certain types of
requests which result in an unnecessarily long and difficult to read log file.
The following can instruct a server not to log particular requests.
Note: Setting this option to yes may cause large log files
to be generated.
# Windows or UNIX Example
log get_status requests = 1 [1=yes/0=no]
odbc ini
Where the odbc.ini file is stored (Architect batch jobs, Profile jobs,
Integration Server).
# Windows Example
odbc ini = C:\Windows
# UNIX Example
odbc ini = /opt/dataflux/solaris
priv log packets
Controls whether DIS generates a log file of all SOAP packets that are sent
and received. The generated file will have the word packets in the name
and reside in the same directory as the DIS log file. Due to a decrease in
system performance while enabled, this setting should be disabled unless
necessary while debugging. It is disabled by default.
# Windows or UNIX Example
priv log packets = no
142
DataFlux Integration Server User's Guide
Setting
prof job path
Description
Path for Profile batch jobs (Integration Server). If not specified, a new
directory will be created under the working directory by default. All values
containing special characters or spaces must be enclosed in single quotes.
# Windows Example
prof job path = 'C:\Program
Files\DataFlux\DIS\[version]\prof_job'
# UNIX Example
prof job path =
'/opt/dataflux/solaris/dis/[version]/profjob'
server child
listen
Defines the communication method between dfwsvc and DIS. This option
takes precedence over the server child shm dir and server child tcp port
options. If you do not configure this option, it will default to the shared
memory provider and the shared memory files will be created in the work
directory.
# Windows Example
server child listen = 'type=shm;path=C:\Program
Files\DataFlux\DIS\[version]\var'
server child
shm dir
Directory in which to create SHM files. This is used when server child listen
is not set and the default method is SHM. If the default method is not
SHM, do not specify this option, as it will select the SHM method. This
option takes precedence over the server child tcp port option. The default
directory is the DIS work directory.
# Windows Example
server child shm dir = 'C:\Program
Files\DataFlux\DIS\[version]\var'
server child tcp
Port number for TCP on localhost. This is used when server child listen is
port
not set and the default method is TCP. If the default method is not TCP, do
not specify this option, as it will select the TCP method. The default port
number is 21035.
# Windows or UNIX Example
server child tcp port = 21035
server listen
port
Selects the port that the server is to use. The default is port 21036
(Integration Server).
# Windows or UNIX Example
server listen port = 20125
server log file
max size
Maximum size of a log file at which it will be rotated out and saved with an
index included in its name. An empty log file with the original name will be
created and used for further logging. The default is 0, meaning no log file
rotation.
# Windows or UNIX Example
DataFlux Integration Server User's Guide
143
Setting
Description
server log file max size = 512MB
server read
timeout
Indicates the time to wait for server-client read/write operations to
complete. A positive value indicates seconds. A negative value indicates
microseconds. The default value is 0.5 seconds (Integration Server).
Note: If errors are encountered while uploading jobs or
services, try increasing this value.
# Windows or UNIX Example
server read timeout = -300000
The server read timeout example above sets a limit of 0.3 seconds.
server send log
Indicates the size of log file chunks to send when a client requests a log file
chunk size
from DIS. The default is 512KB (Integration Server).
Note: By breaking up the log file into chunks of a
specified size, DIS is able to respond quicker to other
requests while sending the log file.
# Windows or UNIX Example
server send log chunk size = 256KB
server wlp
listen
Identifies the connection information that clients will use to contact the
WLP listen server when WLP DIS client or SAS WLP Batch client access is
required. The svr run wlp parameter must be set to yes for this option to
take effect.
# Windows or UNIX Example
server wlp listen =
'type=tcp;host=myhost.mycompany.com;port=21037'
server wlp
listen port
Port number for TCP on the host machine. This is used when server wlp
listen is not set and server run wlp is enabled. The default is 21037.
# Windows or UNIX Example
server wlp listen port = 21037
soap return
nulls
svc requests
queue
Determines whether DIS will return an empty string instead of null when
the real-time service output field has a null value. The default value is no.
If this parameter is set to yes, DIS will return null, instead of an empty
string.
Enables service requests to be queued in the order they were received and
to be processed as soon as a dfwsvc becomes available. If not set, when
the maximum number of dfwsvc (processes handling real-time service
requests) are reached, an error will occur, stating the request cannot be
handled.
# Windows or UNIX Example
svc requests queue = yes
144
DataFlux Integration Server User's Guide
Setting
Description
Note: This directive should be treated as an advanced
configuration and should only be used when troubleshooting
performance problems.
svr idle thread
timeout
Determines the length of time an existing thread remains idle before it is
terminated. Defaults to five seconds if not set, or if set to less than one
microsecond.
# Windows or UNIX Example
svr idle thread timeout = 5
Note: This directive should be treated as an advanced
configuration, used only when troubleshooting performance
problems.
svr max idle
threads
Determines the maximum number of idle threads that are allowed to exist.
It will always be at least one.
# Windows or UNIX Example
svr max idle threads = 1
Note: This directive should be treated as an advanced
configuration, used only when troubleshooting performance
problems.
svr max path
len
Sets the maximum number of bytes that are allowed when entering a path
as part of a configuration directive. If a longer path is entered, DIS will not
initialize. The default is 8KB.
# Windows or UNIX Example
svr max path len = 8KB
svr max
threads
Determines the maximum number of threads that exist. If a WLP server is
to run, at least two threads are used. If a SOAP server is to run, at least
four threads are used. DIS automatically adjusts the value to the required
minimum if the configured value is too low.
# Windows or UNIX Example
svr max threads = 2
svr run dis
Determines if the SOAP (DIS) server is running. The default is one.
# Windows or UNIX Example
svr run dis = 1 [1=yes/0=no]
svr run wlp
Determines whether the WLP server is running. The default is zero. When
set to one, DIS starts the WLP listen specified in the server wlp listen
parameter.
DataFlux Integration Server User's Guide
145
Setting
Description
# Windows or UNIX Example
svr run wlp = 0 [1=yes/0=no]
working path
Path where the server creates temporary input/output files to transfer to
the real-time service. The default path is to the directory containing the
DIS executable in Windows and $DFEXEC_HOME/var/ in UNIX (Integration
Server). The value must be enclosed in single quotes. Location of the
working path can affect system performance.
# Windows Example
working path = 'C:\Program
Files\DataFlux\DIS\[version]\work'
# UNIX Example
working path = '/opt/dataflux/solaris/dis/[version]/work'
DIS Security Related Configuration Directives
The following table lists the security settings for DIS:
Setting
allow
everyone
Description
Allows all users in the present and future to have access to all objects. This
setting is part of the DataFlux Integration Server security subsystem
(Integration Server) and defaults to zero.
# Windows or UNIX Example
allow everyone = 0 [1=yes/0=no]
default
commands
permissions
Users who do not exist on DIS are automatically added using the default
command permissions the first time accessed. The DIS administrator can
later change these permissions using the following directive in the dfexec.cfg
file.
# Windows or UNIX Example
default commands permissions = 1111111111111
In this example, the default grants all permissions using all 1s. Configure the
permissions as needed.
default
umask
Changes the user file-creation mode mask or umask, which is the set of
default permissions for created files. If not specified, the umask defaults to
the shell's umask (output of the umask command). Users that need to
change file permissions from the default can specify a different umask
number.
# Windows or UNIX Example
# remove write bit for others (create 0664 files)
default umask = 002
146
DataFlux Integration Server User's Guide
Setting
Description
# remove all bits for others (create 0660 files)
default umask = 007
# remove write bit for group and others (create 0600 files)
default umask = 022
enable ldap
Configuration setting in the dfexec.cfg file that enables LDAP in DIS.
# Windows or UNIX Example
enable ldap = yes
enable
ownership
Enables ownership rights for objects posted by users. The default is yes
(Integration Server). Ownership implicitly allows a user to execute or delete
an object, regardless of explicitly declared permissions.
# Windows or UNIX Example
enable ownership = 1 [1=yes/0=no]
enable
security
Enables the DIS security subsystem. The default is zero. Three related
settings are ignored unless this directive is enabled: enable ownership,
allow everyone, and security path (Integration Server).
# Windows or UNIX Example
enable security = 1 [1=yes/0=no]
restrict
general
access
Restricts general access by IP address. Access to DIS is restricted by
specifying IP addresses of clients that are allowed or denied access. The
default is allow all. Other options include: deny all, allow none, and deny
none (Integration Server).
When configuring each restriction group, either allow or deny must be
specified with the directive. The two cannot be used together. The directive
can then be followed by lists of specific IP addresses and ranges of IP
addresses. IP address ranges must be indicated with a dash and no spaces.
Each individual address or range of addresses must be separated with a
space. Each group must be entered on a single line. If the keywords all or
none are used, explicitly defined IP addresses or ranges are ignored. A client
that is denied general access is implicitly denied access to post and delete
commands. Only IPv4 addresses are supported.
# Windows or UNIX Example
restrict general access = deny all
# Windows or UNIX Example
# The lines below allow access for addresses 127.0.0.1 and
192.168.1.1 # through 192.168.1.255
restrict general access = allow 127.0.0.1 192.168.1.1192.168.1.255
# Windows or UNIX Example
# The lines below allow access for addresses 127.0.0.1 and
# 192.168.1.190
restrict general access = allow 127.0.0.1 192.168.1.190
DataFlux Integration Server User's Guide
147
Setting
restrict
get_all_stats
access
Description
This security setting allows control over which clients can request status of all
jobs. The default is allow all (Integration Server). See configuration setting
restrict general access for more information.
Note: When the status of all jobs is requested, the user
will receive job IDs. It is recommended that access be limited
to administrators.
# Windows or UNIX Example
restrict get_all_stats access = deny none
restrict
post/delete
access
This configuration setting allows control over which clients can post and
delete jobs. The default is allow all (Integration Server). See configuration
setting restrict general access for more information.
# Windows or UNIX Example
restrict post/delete access= allow none
security path
Directory containing DIS security files. If not set, it defaults to the
dis_security directory that must exist under the directory containing the
server's configuration file (Integration Server).
# Windows Example
security path = C:\Program
Files\DataFlux\DIS\[version]\etc\dis_security
# UNIX Example
security path =
/opt/dataflux/aix/dis/[version]/etc/security
soap over ssl
Configuration setting in the dfexec.cfg file that enables SSL to be used with
DIS. Servers will need to be configured for SSL. Clients can then
communicate over SSL when the server's address is entered as https://
instead of http://.
# Windows or UNIX Example
soap over ssl = yes
soap ssl key
file
Path to the key file that is required when the SOAP server must authenticate
to clients. If this configuration directive is not used, comment it out.
# Windows or UNIX Example
soap ssl key file = 'path to file'
soap ssl key
passwd
Password for the soap ssl key file. If the key file is not password protected,
this configuration directive should be commented out.
# Windows or UNIX Example
soap ssl key passwd = 'encrypted password'
soap ssl CA
cert file
148
File where the Certificates Authority stores trusted certificates. If this
DataFlux Integration Server User's Guide
Setting
Description
configuration directives is not needed, comment it out.
# Windows or UNIX Example
soap ssl CA cert file = 'path to file'
soap ssl CA
cert path
Path to the directory where trusted certificates are stored. If this
configuration directive is not needed, comment it out.
# Windows or UNIX Example
soap ssl cert path = 'path to directory'
strong
passwords
Setting used by the disadmin application in UNIX to enforce the following
rules for passwords:
•
minimum length of six characters
•
requires at least one number
•
requires at least one uppercase letter
•
requires at least one lowercase letter
This setting affects the following disadmin commands: adduser, moduser,
and passwd. See Security Commands for UNIX for more information about
using the disadmin application and use of strong passwords.
# UNIX Example
strong passwords = yes
Architect Configuration Directives
The following table lists the Architect configuration settings:
Setting
arch config
Description
This path indicates the location of the Architect macro definitions file. If not set,
this value defaults to \etc\architect.cfg (Architect batch jobs and real-time
services).
# Windows Example
arch config = C:\Program Files\DataFlux\dfPower
Studio\[version]\etc\architect.cfg
# UNIX Example
arch config =
/opt/dataflux/aix/[version]/dfpower/etc/architect.cfg
canada
post db
This setting indicates the path to the Canada Post database for Canadian
address verification (Architect batch jobs and real-time services).
# Windows Example
DataFlux Integration Server User's Guide
149
Setting
Description
canada post db = C:\Program Files\DataFlux\dfPower
Studio\[version]\mgmtrsrc\RefSrc\SERPData
# UNIX Example
canada post db =
/opt/dataflux/aix/dfpower/[version]/mgmtrsrc/refsrc/
serpdata
checkpoint
Sets the minimum time between log checkpoints, allowing control of how often
the log file is updated. Add one of the following to indicate the unit of time: h,
min, s (Architect batch jobs and Profile jobs).
# Windows or UNIX Example
checkpoint = 15min
cluster
memory
Cluster memory is the amount of memory to use per cluster of match-coded
data. Use this setting if you are using clustering nodes in dfPower (Architect
batch jobs and real-time services). This setting may affect memory allocation.
Note: This setting must be entered in megabytes, for
example, 1 GB should be set to 1024 MB.
# Windows or UNIX Example
cluster memory = 64MB
copy qas
files
When set to yes, the QAS config address verification files are copied to the
current directory if they are new. The setting defaults to no (Architect batch
jobs).
# Windows or UNIX Example
copy qas files = yes
datalib
path
This is the path to the verify data libraries (Architect batch jobs and real-time
services), excluding USPS data. All values containing special characters or
spaces must be enclosed in single quotes.
# Windows Example
datalib path = 'C:\Program Files\DataFlux\DIS\[version]\data'
# UNIX Example
datalib path = '/opt/dataflux/hpux/dis/[version]/data'
dfclient
config
Sets the path for the dfIntelliServer® client configuration file, if using
dfIntelliServer software. The client can be local or loaded on another machine
(Integration Server, dfIntelliServer). This setting is necessary if using
distributed nodes in an Architect job.
# Windows Example
dfclient config = C:\Program
Files\DataFlux\dfIntelliServer\etc\dfclient.cfg
# UNIX Example
dfclient config =
150
DataFlux Integration Server User's Guide
Setting
Description
/opt/dataflux/solaris/dfintelliserver/etc/dfclient.cfg
enable dpv
To enable Delivery Point Validation (DPV 35) processing for US Address
Verification, set to yes. It is disabled by default (Architect batch jobs and realtime services).
# Windows or UNIX Example
enable dpv = yes
enable elot
To enable USPS eLOT processing for US Address Verification, set to yes. It is
disabled by default (Architect batch jobs and real-time services).
# Windows or UNIX Example
enable elot = yes
enable lacs
To enable Locatable Address Conversion System (LACS 36) processing, set to
yes. It is disabled by default (Architect batch jobs and real-time services).
# Windows or UNIX Example
enable lacs = yes
enable rdi
Enables Residential Delivery Indicator (RDI 37) processing for US Address
Verification. The default is no (Architect batch jobs and real-time services).
# Windows or UNIX Example
enable rdi = yes
fd table
memory
Sets the memory size for calculating frequency distribution. If this is not set, a
default value of 262,144 bytes will be used on 32-bit systems and 524,288 on
64-bit systems. This memory refers to the number of bytes used per field while
processing a table. When processing tables with many fields, this number may
be reduced to alleviate memory issues. The larger the value, the more efficient
the calculation will be. A minimum value of 4096 bytes exists (8192 on 64 bit
systems).
Note: This is a separate parameter from the frequency
distribution memory cache size that is specified on a per job
basis.
# Windows or UNIX Example
fd table memory = 65536
35
Delivery Point Validation (DPV) is a USPS database that checks the validity of residential and
commercial addresses.
36
Locatable Address Conversion System (LACS) is used updated mailing addresses when a street is
renamed or the address is updated for 911, usually by changing a rural route format to an urban/city
format.
37
Residential Delivery Indicator (RDI) identifies addresses as residential or commercial.
DataFlux Integration Server User's Guide
151
Setting
ftp get
command
Description
Used to receive files by FTP. During the DIS installation, the operating system is
scanned for the following FTP utilities: NcFTP, Perl LWP Modules, cURL, and
Wget. If multiple utilities are found, NcFTP and Perl LWP Modules are given
precedence and FTP get/put commands are written to the dfexec.cfg file.
# Windows or UNIX Example
ftp get command = '"C:\Program Files\NcFTP\ncftpget.exe" -d
%L -u %U -p %P %S %T %F'
ftp put
command
Used to send files by FTP. During the DIS installation, the operating system is
scanned for the following FTP utilities: NcFTP, Perl LWP Modules, cURL, and
Wget. If multiple utilities are found, NcFTP and Perl LWP Modules are given
precedence and FTP get/put commands are written to the dfexec.cfg file.
# Windows or UNIX Example
ftp put command = '"C:\Program Files\NcFTP\ncftpput.exe" -d
%L -u %U -p %P %S %T %F'
geo db
Sets the path to the database used for geocoding and coding telephone
information (Architect batch jobs and real-time services).
# Windows Example
geo db = C:\Program Files\DataFlux\dfPower
Studio\[version]\mgmtrsrc\RefSrc\GeoPhoneData
# UNIX Example
geo db =
/opt/dataflux/hpux/dfpower/[version]/mgmtrsrc/fresrc/
geophonedata
java
classpath
Setting used for the Java™ Plugin that indicates the location of compiled Java
code.
# Windows Example
java classpath = \usr\java14_64\jre\bin
# UNIX Example
java classpath = /usr/java14_64/jre/bin
java debug
Optional Java Plugin setting that enables debugging in the Java Virtual Machine
(JVM™) used by Architect or Integration Server. The default setting is no.
# Windows or UNIX Example
java debug = yes
java debug
Optional Java Plugin setting that indicates the port number where the JVM
port
listens for debugger connect requests. This can be any free port on the
machine.
# Windows or UNIX Example
java debug port = 23017
152
DataFlux Integration Server User's Guide
Setting
java vm
Description
This Java Plugin setting references the location of the JVM DLL (or shared library
on UNIX variants).
# Windows Example
java vm = [JRE install directory]\bin\server\jvm.dll
# UNIX Example
java vm = /[JRE install directory]/bin/server/jvm.dll
license
location
This is the license directory containing the license file (Architect batch jobs,
real-time services, and Profile jobs). It was labeled license dir in previous
versions. All values containing special characters or spaces must be enclosed in
single quotes.
Caution: License location is only valid for UNIX. In
Windows, set or change the license location using the License
Manager. To access the License Manager application click Start >
Programs > DataFlux Integration Server > License
Manager.
# UNIX Example
license location = '/opt/dataflux/dis/[version]/etc'
mail
command
This command is used for sending alerts by email (Profile jobs). The command
may contain the substitutions %T (To) and %B (Body). %T will be replaced with
the destination email address and %B with the path of a temporary file
containing the message body. If %T and %B are left blank, these fields default
to what was specified in the job. The -s mail server parameter specifies the mail
server and is not necessary on UNIX systems. All values containing special
characters or spaces must be enclosed in single quotes.
Sendmail is the open source program in UNIX used for sending mail. In
Windows, mail is sent by the vbscript mail.vbs.
# Windows Example (where mail server is named mailhost)
mail command = 'cscript -nologo "%DFEXEC_HOME%\bin\mail.vbs"
-s mailhost "%T" < "%B"'
# UNIX Example
mail command = '/usr/lib/sendmail %T < %B'
odbc ini
Where the odbc.ini file is stored (Architect batch jobs, Profile jobs, Integration
Server).
# Windows Example
odbc ini = C:\Windows
# UNIX Example
odbc ini = /opt/dataflux/solaris
DataFlux Integration Server User's Guide
153
Setting
plugin dir
Description
Where Architect plug-ins are located (Architect batch jobs and real-time
services, Profile jobs).
# Windows Example
plugin dir = C:\Program Files\DataFlux\dis\[version]\bin
# UNIX Example
plugin dir = /opt/dataflux/aix/dis/[version]/bin
qkb root
Location of the Quality Knowledge Base (QKB) files. This location must be set if
using steps that depend on algorithms and reference data in the QKB, such as
matching or parsing (Architect batch jobs and real-time services, Profile jobs).
Note: If changes are made to the QKB make sure the server
copy is updated as well.
# Windows Example
qkb root = C:\Program Files\DataFlux\qkb
# UNIX Example
qkb root = /opt/dataflux/qkb
repository
config
Location of the Profile repository config file (Profile jobs and Integration Server).
All values containing special characters or spaces must be enclosed in single
quotes.
# Windows Example
repository config = 'C:\Program
Files\DataFlux\DIS\[version]\etc\profrepos.cfg'
# UNIX Example
repository config =
'/opt/dataflux/linux/dis/[version]/etc/profrepos.cfg'
sort chunk
Allows you to specify the amount of memory to use while performing sorting
operations. The amount may be given in KB or MB, but not GB (Architect batch
jobs and real-time services).
# Windows or UNIX Example
sort chunk = 128MB
usps db
This is the path to the USPS database required for US address verification
(Architect batch jobs and real-time services).
# Windows Example
usps db = C:\Program Files\DataFlux\verify\uspsdata
# UNIX Example
usps db = /opt/dataflux/aix/verify/uspsdata
154
DataFlux Integration Server User's Guide
Setting
verify
cache
Description
Indicates an approximated percentage (0 - 100) of the USPS reference data set
that will be cached in memory prior to an address verification procedure.
(Architect batch jobs and real-time services). This setting can affect memory
allocation.
# Windows or UNIX Example
verify cache = 30
verify
preload
Allows you to specify a list of states whose address data will be preloaded.
Preloading increases memory usage, but significantly decreases the time
required to verify addresses in a state (Architect batch jobs and real-time
services).
# Windows or UNIX Examples
verify preload = NY TX CA FL
verify preload = ALL
world
Sets the path where AddressDoctor data is stored.
address db
# Windows Example
world address db= 'C:\world_data\'
# UNIX Example
world address db= '/opt/dataflux/linux/worlddata'
world
address
license
The license key provided by DataFlux used to unlock AddressDoctor country
data. The value must be enclosed in single quotes (Architect batch jobs and
real-time services).
# Windows or UNIX Example
world address license = 'abcdefghijklmnop123456789'
Data Access Component Directives
The following table lists the settings in the app.cfg file, which are used by the DAC to
determine the operation it will perform:
DataFlux Integration Server User's Guide
155
Setting
DAC Logging
Description
Specifies whether or not to create a log file for DAC operations.
The DAC checks the following values and locations, based on your operating
system:
Windows — The USER\logfile configuration value. Next, the DAC checks
SYSTEM\logfile for a string representing a log file name.
UNIX — The sql_log.txt file in the current working directory.
User saved
connection
Specifies where to find user saved connections.
The DAC checks the following values and locations, based on your operating
system:
Windows — The USER\savedconnectiondir configuration value. Next, the
DAC checks the application settings directory for the user, which is usually
in the \Documents and Settings directory, in the
DataFlux\dac\[version] subdirectory.
UNIX — The $HOME/.dfpower/dsn directory.
System saved
connection
Specifies where to find system saved connections.
The DAC checks the following values and locations, based on your operating
system:
Windows — The DAC/SAVEDCONNSYSTEM configuration value. Next, the
DAC checks the DFEXEC_HOME environment variable, in the
$DFEXEC_HOME\etc\dsn directory.
UNIX — The $DFEXEC_HOME/etc/dsn directory.
Use braces
Specifies whether to enclose DSN items with braces when they contain
reserved characters.
The DAC checks the following values and locations, based on your operating
system:
Windows — The USER\[dsn_name]\usebraces has a double word value of 1,
where [dsn_name] is the name of the DSN. Next, the DAC will check the
SYSTEM\[dsn_name]\usebraces value.
UNIX — The $HOME/.dfpower/dsn.cfg file for [dsn_name] = usebraces.
156
DataFlux Integration Server User's Guide
Setting
Oracle
NUMBER(38)
handling
Description
Specifies whether to treat NUMBER (38) columns as an INTEGER (which is
the default) or a REAL value. This setting applies if Oracle is the only driver
to which you are connecting.
The DAC checks the following values and locations, based on your operating
system:
Windows — The USER\[dsn_name]\oranum38real has a double word value
of 1. Next, the DAC checks that SYSTEM\[dsn_name]\oranum38real has a
double word value of 1.
UNIX — The $HOME/.dfpower/dsn.cfg file for [dsn_name] =
oranum38real.
Suffix for
Specifies a string that is appended to every CREATE TABLE statement. If
CREATE TABLE
you include %t in this string, it is substituted with the table name.
statements
The DAC checks the following values and locations, based on your operating
system:
Windows — The USER\[dsn_name]\postcreate specifies a string. Next, the
DAC checks that SYSTEM\[dsn_name]\postcreate specifies a string.
UNIX — This setting is not supported.
Table type
filter
Limits the list of tables to several preset types. The default is
'TABLE','VIEW','ALIAS','SYNONYM'. If you set this value to * (asterisk) the
list will not be filtered.
The DAC checks the following values and locations, based on your operating
system:
Windows — The USER\[dsn_name]\tablelistfilter specifies a comma
delimited string that list single quoted values that indicate table types.
Next, the DAC checks whether SYSTEM\[dsn_name]\tablelistfilter specifies
a comma delimited string.
UNIX — This setting is not supported.
DataFlux Integration Server User's Guide
157
Setting
TKTS DSN
directory
Description
Specifies the path where TKTS DSNs are stored in XML files.
$DFTKDSN may specify the path to the TKTS DSN directory. If it does not
specify the value, the DAC checks the following values and locations, based
on your operating system:
Windows — The $DFEXEC_HOME\etc\dftkdsn\ directory.
UNIX — The $DFEXEC_HOME/etc/dftkdsn\ directory.
TK Path
Specifies where TK files are located. The dftksrv path and core directory
should be specified.
$DFTKPATH may specify the TK path. If it does not, the DAC checks the
following values and locations, based on your operating system:
Windows:
1.
The USER\tkpath value.
2.
The SYSTEM\tkpath value.
3.
The $DFEXEC_HOME\bin;$DFEXEC_HOME\bin\core\sasext location.
UNIX — Check $TKPATH. Next, check $DFEXEC_HOME/lib/tkts.
DFTK log file
Specifies the log file that interactions with the DFTKSRV layer and is only
useful for debugging issues specific to dftksrv.
The DAC checks the following values and locations, based on your operating
system:
Windows:
1.
The USER\dftklogfile value.
2.
The SYSTEM\dftklogfile value.
3.
The $DFTKLOGFILE value.
UNIX — The $DFTKLOGFILE value.
158
DataFlux Integration Server User's Guide
Setting
TKTS log file
Description
Specifies the log file that is produced by the TKTS layer and is useful for
debugging tkts issues.
The DAC checks the following values and locations, based on your operating
system:
Windows:
1.
The USER\tktslogfile configuration value.
2.
The SYSTEM\tktslogfile value.
3.
The $TKTSLOGFILE value.
UNIX — The $TKTSLOGFILE value.
Disable CEDA
Specifies whether to disable CEDA. This setting is only applicable to tkts
connections.
The DAC checks the following values and locations, based on your operating
system:
Windows:
1.
The USER/dftkdisableceda configuration value, which should specify
any non-null value, for example, yes.
2.
The SYSTEM \dftkdisableceda value.
3.
The $DFTKDISABLECEDA value.
UNIX — The $DFTKDISABLECEDA value.
TKTS startup
sleep
Specifies how much time, in seconds, to delay between the start of the
dfktsrv program and the booting of TK.
The DAC checks the following values and locations, based on your operating
system:
Windows — The USER\tktssleep configuration value. Next, the DAC checks
the SYSTEM\tktssleep value.
UNIX — This setting is not supported.
DataFlux Integration Server User's Guide
159
Setting
Command file
execution
Description
Specifies a text file with SQL commands (one per line). These commands
will run in turn, on any new connection that is made. For example, they can
be used to set session settings. This is only implemented for the ODBC
driver.
The USER\savedconnectiondir configuration value may specify the path to
the saved connections. The DAC checks for files with the same filename as
the DSN and a .sql extension.
Note: Environment variables are specified as $[variable_name]. Typically,
DIS will set environment variables to appropriate locations. For example,
$DFEXEC_HOME is set to the DIS home directory.
160
DataFlux Integration Server User's Guide
Glossary
A
ACE
An access control entry (ACE) is an item in an access control list used to administer
object and user privileges such as read, write, and execute.
ACL
Access control lists (ACLs) are used to secure access to individual DIS objects.
API
An application programming interface (API) is a set of routines, data structures, object
classes and/or protocols provided by libraries and/or operating system services in
order to support the building of applications.
D
DAC
A data access component (DAC) allows software to communicate with databases and
manipulate data.
DPV
Delivery Point Validation (DPV) is a USPS database that checks the validity of
residential and commercial addresses.
DSN
A data source name (DSN) contains connection information, such as user name and
password, to connect through a database through an ODBC driver.
L
LACS
Locatable Address Conversion System (LACS) is used updated mailing addresses when
a street is renamed or the address is updated for 911, usually by changing a rural
route format to an urban/city format.
M
MMC
The Microsoft Management Console (MMC) is an interface new to the Microsoft
Windows 2000 platform which combines several administrative tools into one
configurable interface.
O
ODBC
Open Database Connectivity (ODBC) is an open standard application programming
interface (API) for accessing databases.
DataFlux Integration Server User's Guide
161
Q
QAS
QuickAddress Software (QAS) is used to verify and standardize US addresses at the
point of entry. Verification is based on the latest USPS address data file.
QKB
The Quality Knowledge Base (QKB) is a collection of files and configuration settings
that contain all DataFlux data management algorithms. The QKB is directly editable
using dfPower Studio.
R
RDI
Residential Delivery Indicator (RDI) identifies addresses as residential or commercial.
S
SERP
The Software Evaluation and Recognition Program (SERP) is a program the Canadian
Post administers to certify address verification software.
SOA
Service Oriented Architecture (SOA) enables systems to communicate with the master
customer reference database to request or update information.
SOAP
Simple Object Access Protocol (SOAP) is a Web service protocol used to encode
requests and responses to be sent over a network. This XML-based protocol is platform
independent and can be used with a variety of internet protocols.
U
USPS
The United States Postal Service (USPS) provides postal services in the United States.
The USPS offers address verification and standardization tools.
162
DataFlux Integration Server User's Guide
Download PDF

advertising