Integration Collection for Pipeline Pilot

DATABASE INTEGRATION GUIDE
PIPELINE PILOT INTEGRATION COLLECTION 2016
Copyright Notice
©2015 Dassault Systèmes. All rights reserved. 3DEXPERIENCE, the Compass icon and the 3DS logo,
CATIA, SOLIDWORKS, ENOVIA, DELMIA, SIMULIA, GEOVIA, EXALEAD, 3D VIA, BIOVIA and NETVIBES are
commercial trademarks or registered trademarks of Dassault Systèmes or its subsidiaries in the U.S.
and/or other countries. All other trademarks are owned by their respective owners. Use of any Dassault
Systèmes or its subsidiaries trademarks is subject to their express written approval.
Acknowledgments and References
To print photographs or files of computational results (figures and/or data) obtained using BIOVIA
software, acknowledge the source in an appropriate format. For example:
"Computational results obtained using software programs from Dassault Systèmes BIOVIA. The ab
initio calculations were performed with the DMol3 program, and graphical displays generated with
Pipeline Pilot."
BIOVIA may grant permission to republish or reprint its copyrighted materials. Requests should be
submitted to BIOVIA Support, either through electronic mail to biovia.support@3ds.com, or in writing
to:
BIOVIA Support
5005 Wateridge Vista Drive, San Diego, CA 92121 USA
Contents
Chapter 1: Introduction
Who Should Read this Guide
Requirements
Supplied Database Drivers
Additional Information
Chapter 2: Configuring a Database
Overview
Pipeline Pilot Data Sources
Configuring Data Sources
ODBC (PP) Data Sources
JDBC Data Sources
ODBC (DSN) Data Sources
MongoDB Data Sources
Connection Pooling
Configuring Data Source Access Rights
Authentication
Initializing the Connection
Additional Information
Query Service Settings
Testing Data Source Connections
Configuring an ODBC Data Source on the
Server
Support for Molecular Databases
Remote Access Files
Chapter 3: SQL Components Overview
Database Access
SQL Component Parameters
Parameter Mapping
Selecting a Data Source
Sharing SQL Component Connections
Batch Size
Handling of Date Values
Data Source Access Rights
Examples
View Data Source Rights
Use Data Source Rights
None Rights
Chapter 4: Building SQL Statements
Opening the SQL Builder
SQL Editing Modes
Graphically Building SQL Statements
Manually Constructing SQL Statements
SQL Editing Guidelines
Creating a Data Table
1
1
1
1
2
3
3
3
4
5
5
5
5
6
6
6
7
7
7
7
7
8
8
9
9
10
10
12
13
13
14
14
14
15
16
17
19
19
19
20
21
21
22
Testing SQL Statements
Error Handling
Chapter 5: Customizing SQL Components
Dynamic SQL Using String Replacement
Chapter 6: Building Protocols with Multiple
SQL Components
Chapter 7: Calling Database Stored
Procedures
Oracle Stored Procedures
Simple Stored Procedure
Simple Function
Stored Procedure with Result Sets
Microsoft SQL Server Stored Procedures
Simple Function
Stored Procedure with Result Sets
MySQL Stored Procedures
Simple Stored Procedure
Simple Function
Stored Procedure with Result Sets
Identifying Result Sets
Appendices
Appendix A: ODBC Server Configuration
Configuring an ODBC Data Source on Linux
Appendix B: Using dates with Oracle based
components
22
23
24
24
26
27
27
27
29
29
32
32
33
35
35
37
37
39
41
41
44
45
Chapter 1:
Introduction
The Pipeline Pilot Integration collection includes a set of components designed for use with Open
Database Connectivity (ODBC) and Java Database Connectivity (JDBC) drivers. These drivers allow
connections to a variety of compliant databases such as Oracle, SQL Server, and MS Access that reside
anywhere on the network.
Network databases can be accessed through the SQL components and configured on the server.
Connections to these data sources are defined through the Pipeline Pilot Administration Portal. All
information for the data source can be specified in the Administration Portal (including the necessary
login and password) or this information can be provided from Data Service Names (DSNs) that are
defined on the server using the ODBC Administrator tool.
Who Should Read this Guide
This guide provides information for integrating the SQL components with relational databases. This first
chapter explains how to configure a Pipeline Pilot data source on your server (for administrators). The
remaining chapters explain how to use SQL components to access data sources on your network (for
client users).
Requirements
To configure an Pipeline Pilot data source, the Administration Portal role must be assigned to you.
When using ODBC DSNs to specify the database information, you also need administrator privileges
on the Pipeline Pilot server.
To use the SQL components, you need some experience writing structured query statements. You
also need a valid login and password to access the databases on your network.
Supplied Database Drivers
ODBC drivers from DataDirect are included with your Pipeline Pilot server installation. These drivers
support connections to Oracle, Microsoft SQLServer, MySQL1 and DB2. These DataDirect drivers
support faster access using a wire protocol and portability across platforms. No additional drivers or
database software needs to be installed by administrators to let Pipeline Pilot communicate with
databases using these drivers.
In addition, Pipeline Pilot ships with a version of the Oracle JDBC driver for connecting to Oracle
databases. To access other ODBC databases, install the appropriate driver on the server and create a
DSN. To access other JDBC databases, install the appropriate driver jar file through the Administration
Portal and configure the Data Source.
Introduction | Page 1
Notes:
Pipeline Pilot only officially supports the DataDirect drivers included with your product installation,
although other drivers might continue to work. For full compatibility, use the DataDirect drivers for
Pipeline Pilot.
From the Pipeline Pilot Home Page, you can access documentation for Data Direct drivers that ship
with Pipeline Pilot (select the "ODBC Driver Help" link in the Administrators section).
Pipeline Pilot Clients cannot query a Microsoft Access database if your server runs on Linux.
To use the installed drivers, you need to set up a new Data Source through the Administration
Portal. Each Data Source maps a driver to a particular database on your network. Once a data
source is defined, it is automatically displayed in Pipeline Pilot's Data Source dialog, allowing you to
configure SQL components to read from that particular database.
The Oracle ODBC driver from Data Direct does not support reading LOB data from tables which use
SECUREFILE storage. If you have LOB data stored in tables that use SECUREFILE storage, you can
read it using the supplied Oracle JDBC driver or you can install a different ODBC driver, such as
Oracle's own ODBC driver.
In the case of an updated ODBC driver on the Linux platform, the included driver replaces your
existing version. On Windows you will have the option of continuing to use the previous version.
See release notes for more details.
In addition, you could use a custom ODBC DSN or a JDBC driver to connect to the database instead
of using the latest Data Direct driver.
1The DataDirect drivers only work with the commercial version of MySQL. To connect to the community
edition, use a MySQL-provided ODBC or JDBC driver.
Additional Information
For more information about the Pipeline Pilot Integration collection and other BIOVIA software
products, visit https://community.3dsbiovia.com.
Page 2 | Pipeline Pilot • Database Integration Guide
Chapter 2:
Configuring a Database
Overview
The Pipeline Pilot server can access databases that are ODBC or JDBC compliant (for example, Oracle and
SQL Server). It can also read molecular data from ISIS and AEI databases. The server accesses these
databases through a standard ODBC or JDBC layer.
To communicate with the database, the ODBC layer uses an ODBC driver specific to the database type
that is in use (such as, Oracle, DB2, etc.). Database configuration involves installing and configuring the
appropriate driver on the server machine.
The following diagram illustrates how the server communicates with an Oracle database using a JDBC
driver:
Pipeline Pilot communication with a database
Protocols are edited on the client systems and protocol jobs run on the server. Therefore, the
connection is made to a database source from the server. The server is the only machine that needs to
be configured to communicate with the database. The client machines do not require configuration.
To use the Pipeline Pilot ODBC or JDBC drivers, no additional installation is required. To use another
ODBC driver, install the driver on the Pipeline Pilot server. To use a different JDBC driver, use the
Administration Portal to upload the jar file to the server.
Pipeline Pilot Data Sources
To communicate with the database, a driver needs configuration information, such as server name,
username, etc. This information is provided when the data sources are configured in the Administration
Portal.
If a data source has been configured by the Administrator, a user can select this data source on a SQL
component in a protocol if the administrator has given the user access to the data source. When the
protocol runs, the SQL component accesses the configuration of the data sources, passes this
information to the driver software, and makes a connection to the database. The SQL component can
then execute commands and retrieve information from this database.
You can configure the following types of data source connections for use with Pipeline Pilot's SQL
components:
ODBC (DSN): ODBC-compliant databases that you configure to work with your own drivers that you
install on the Pipeline Pilot server. For example, create a DSN to reference a Microsoft Access database.
The data source includes all the information required to connect to the database, or it references the
name of an ODBC DSN separately configured on the server.
ODBC (PP): Data sources that use Pipeline Pilot's Data Direct drivers.
JDBC: JDBC-compliant data sources that use driver JAR files that you install on the Pipeline Pilot server.
Configuring a Database | Page 3
MongoDB: Mongo data sources defined in the Admin Portal.
Tip: You can also store the username and password with the data source. This is convenient because
this information does not need to be specified on each SQL component that uses the data source. We
recommend this method because it eliminates the need to specify username and password
information inside protocols. All the required information is stored in an encrypted file on the Pipeline
Pilot server.
Configuring Data Sources
To create a Data Source in the Administration Portal:
1. Log into to the Administration Portal and open the Setup > Database Sources page. The Data
Sources page opens. All currently available data sources for your server are listed on the left. This list
is blank if no data sources are configured.
2. Click Add Data Source. A form is displayed on the right.
3. Enter a name (and optional) description for your new data source.
4. Select the data source type: ODBC (PP), JDBC, ODBC (DSN), or MongoDB.
5. Enter the required information for the type of data source (described below).
6. Click Save.
7. To verify data source connectivity, click Test.
Configuring new data sources in the Administration Portal (Setup > Database Sources)
Page 4 | Pipeline Pilot • Database Integration Guide
Tips:
To view the setup for an available data source, select its name on the left. Details are displayed in
the form on the right.
To update the setup, modify the form fields and click Save.
To delete the data source, click
.
ODBC (PP) Data Sources
The ODBC (PP) data sources are only intended for the Data Direct drivers that ship with Pipeline Pilot. If
you need a different type of ODBC driver, configure it as an ODBC (DSN).
To create an ODBC (PP) data source using a Pipeline Pilot supplied database driver:
1. In the Add Data Source form, select the correct Driver.
2. Select "Latest" as the Driver Version. (It is a best practice to always run with the latest driver
installed with Pipeline Pilot).
3. Complete the remaining fields (such as, Server, Port, ServiceId/Service Name or Database).
4. If custom connection string settings are required (for example,
ArraySize=100000;WireProtocolMode=2), specify them as delimited values in the Connection String
field, with a semicolon (;) separator.
JDBC Data Sources
To create a JDBC data source:
1. In the Add Data Source form, select the Driver.
IMPORTANT! To make a driver Jar file available, you need to upload it to the server first. Click Import
JDBC Driver, browse to the driver Jar file, and click Upload.
2. Enter the Connection String.
If the driver is recognized (we use the Squirrel drivers file), a template connection string is provided
when the driver is selected.
Tip: Oracle supports database specifiers (such as the Oracle Net connection descriptor) beyond the
standard JDBC thin-driver connection string. Refer to Oracle's documentation for more information.
ODBC (DSN) Data Sources
Pipeline Pilot automatically recognizes all system DSNs defined on the server to create an ODBC (DSN)
Data Source. If for some reason, you need to manually create the data source, follow the instructions
below.
To create an ODBC (DSN) data source:
1. Log into the server and create a DSN using your ODBC Administrator application. For details, see
Configuring an ODBC Data Source on the Server.
2. In the Add Data Source form, select the DSN from the list of available server DSNs on that machine.
MongoDB Data Sources
To create a MongoDB data source:
Specify the Server and Port.
Configuring a Database | Page 5
Connection Pooling
For ODBC (PP) and JDBC data sources it is possible to specify a Connection Timeout. This setting
specifies how long a closed database connection persists in a pooled Pipeline Pilot server process.
To enable connection pooling for a data source:
1. Specify the value of the Connection Timeout property in seconds (there is no pooling if the value is
unset or 0).
2. When a connection is 'closed' in a pooled Pipeline Pilot server process, it will actually remain open
for Connection Timeout seconds.
3. If another protocol running in the same process tries to 'open' the same database connection, it will
reuse the existing open connection eliminating the overhead of making the connection.
4. If the timeout is exceeded before a new request is made to the connection or the Pipeline Pilot
server process terminates, the connection is physically closed.
Configuring Data Source Access Rights
Data source definitions are saved in an encrypted data file on the Pipeline Pilot server. To enhance
security, you can configure how users access your data sources.
To configure data source access rights:
1. In the Add Data Source form, open Access Privileges and enter the required information for the
fields that are displayed.
2. To restrict data source access to a set of groups or users, select "Use Data Source" as the Access
Level option. This allows users to access the data source, but not view the definitions. Then change
the Access Level for the everybody group to "None".
3. A single username and password can be used for all connections, eliminating the need for users to
manually enter this information. To allow users to bypass these prompts, configure Optional DB
Username and Optional DB Password settings. By storing this information in one central and
encrypted location, you can improve your database access security.
To limit usage of a data source by component:
1. Select the data source which you wish to restrict.
2. Enter the name of the Required Registrant for the only component(s) which can access this data
source. Only components with this Registrant will be able to connect to this data source.
Tip: Select a component, right-click and choose Show Versions... to identify its Owner (Registrant).
Authentication
More strict user identification and authentication for connecting to the data source can be configured to
enhance security. These options are not available for MongoDB data sources.
To configure the data source identification method:
1. Expand the Advanced Settings section of the Add or Edit Data Source form.
2. To force all connections to the current data source to employ the username and password of the
current Pipeline Pilot or Pipeline Pilot Client user for identification check the Require PP Credentials
checkbox.
3. For JDBC data sources, the Optional DB Username and Optional DB Password can be used to
connect to the database. To use these credentials check the Proxy Authentication checkbox.
Page 6 | Pipeline Pilot • Database Integration Guide
Note: If Require PP Credentials and Proxy Authentication are both selected for a JDBC data source,
the specified username and password are used for the connection but the Pipeline Pilot or Pipeline
Pilot Client user credentials are used for the proxy authentication.
Initializing the Connection
If a data source requires that SQL statements are executed every time a connection is made, this can be
configured. This option is not available for MongoDB data sources.
Enter the required SQL statements in the Initial SQL text box. Ensure that you use the appropriate
database conventions if you are specifying multiple statements. To specify further SQL for individual
connection instances, use the Initial SQL parameter on an SQL Open Connection component.
Additional Information
If a data source requires additional information for its data sources, this can be configured. This option is
not available for MongoDB data sources.
Enter the required SQL statements in the Application Info text box.
Query Service Settings
If a JDBC data source is used with a query service, the user for handling list tables can be customized and
additional relationships between property names and values can be specified.
1. Specify a List DB Username and List DB Password to user to create temporary tables for list
handling. If no custom username and password are specified, the table will be created under
connection user.
2. Enter any custom relationships between property names and values in the Query Service Properties
text box. The format should be:
'property_name1 = value1';'property_name2=value2';...
Testing Data Source Connections
After you configure a data source on your server, perform a connection test to ensure that everything is
working properly.
To test an SQL connection for the currently selected Data Source:
1. Click Test. A Test Database Connection prompt is displayed.
2. If you did not specify the optional username and password for the Data Source, enter this
information in the dialog.
3. Click Test.
If the setup is correct, a 'Login Successful' confirmation is displayed.
If there are any login problems, an error message is displayed with details (for example, wrong
server name or incorrect username/password).
Configuring an ODBC Data Source on the Server
Tip: To configure ODBC DSNs on the server (such as using an ODBC driver not supplied with Pipeline
Pilot), see Appendix A: ODBC Server Configuration.
Configuring a Database | Page 7
Support for Molecular Databases
Pipeline Pilot can read and write reaction and molecule data from BIOVIA Direct (former Symyx Direct).
To provide this functionality, the server might need some configuration.
Database
Requirements
BIOVIA
Direct
(formerly
Symyx
Direct)
Support for BIOVIA Direct 6.x and higher is included in the standard distribution of
Pipeline Pilot 8.0 CU3 and higher. Look for components in Database and Application
Integration > Direct Chemistry Cartridge, such as Direct SQL Select or Direct Cartridge
Template Protocol.
MDL Direct Support for earlier versions of Direct, then known as MDL Direct. In the Pipeline Pilot
Client, search for "MDL Direct" to find components and protocols that support search,
database creation, and molecule registration.
Accord
Chemistry
Cartridge
Legacy Support: new projects should not use.
AEI
Legacy Support: new projects should not use.
Pipeline
Pilot
Chemistry
Cartridge
Deprecated.
ISIS Reader Legacy Support: new projects should not use.
Remote Access Files
A remote access file does not contain actual data. Instead, it contains pointer information that allows
ISIS/Base to connect with an ISIS/Host data source.
An ISIS/Base ".db" file that is a remote access file might contain the following pointer information:
ISIS/Host service name
ISIS/Host node name
User name and password for ISIS/Host server
Name of an Hview
Network protocol
Port number assigned to the ISIS/Host Internet daemon
IP address for the ISIS/Host server machine
Tip: If any of the above pointers are not physically stored within the remote access file, they must be
passed as parameters in the MDL ISIS Reader component, which is deprecated.
Page 8 | Pipeline Pilot • Database Integration Guide
Chapter 3:
SQL Components Overview
Interacting with databases is an integral aspect of most server-based applications. One of the key
strengths of the data pipelining approach is the ability to combine data stored in files with data stored in
a database and treat both sources as equivalent data streams. Analogous to components that interact
with files, the database components allow you to select, insert, and update data in any accessible
database source.
The Integration collection includes the following SQL components for accessing databases using ODBC
and JDBC drivers:
SQL Delete: Deletes records from a data table.
SQL General: Executes SQL against a database, which can be used to perform any SQL operation
(some of the other components, such as SQL Insert, include a convenient set of parameters).
SQL Insert: Inserts new records into a data table.
SQL Select: Selects and retrieves records from a database.
SQL Select for Each Data: Joins data from a database to streaming data.
SQL Update: Updates records in a database.
A set of components is also available for advanced operations, including:
SQL Admin: Performs administrative functions such as creating and dropping tables. Note the
exact syntax (for example, available data types) differs from one database to the next (for example,
varchar in SQLServer and varchar2 in Oracle).
SQL Open Connection/SQL Close Connection: Extra control over when a database connection is
opened and closed (it is not necessary to use these components to share a connection across
multiple SQL components).
SQL End Transaction: Commits or rolls back the current transaction (set of pending database
operations).
SQL Read Schema Tables: Returns information about the set of tables and views, etc. that are
present in the database.
Notes:
From the Components tab, these SQL components are available in "Database and Application
Integration\Database Access". (Use the Search bar to quickly find components.)
The Protocols tab includes a variety of protocol examples that use database components. A good
place to look is in "Examples\Integration\Database Operations".
Database Access
All rules and restrictions enforced by the database apply when accessed through Pipeline Pilot and the
SQL components. Your login and password must be valid and you must have the appropriate
permissions on database objects (tables, views, and indexes) that you need to access. Table restrictions,
such as uniqueness constraints, are enforced.
SQL Components Overview | Page 9
SQL Component Parameters
The SQL components take as parameters the Data Source name, a structured query language (SQL)
statement, a user name, and a password. This data is passed to the database, where it authenticates
the user and executes the SQL. For example, if the SQL is a query containing a SELECT statement, the
component generates a set of records and passes them to subsequent nodes in the protocol. If the SQL
fails, an error is passed back to the server and a message is displayed on the client machine.
All SQL components use the following parameters:
Data Source: Defines the data source for this component. There are three options:
Name of the data source defined in the Administration Portal.
The name of a connection defined on a previous SQL Open Connection component. Use the form
(Name=Connection Name). Either the Data Source or Connection name can be specified from the
Select Data Source dialog by clicking the ... button
Alternatively, this field can be an ODBC driver specification that eliminates the need to define a
data source in the Administration Portal. For example:
DRIVER=BIOVIA Oracle
7.1;AUT=0;EDP=1;PRR=1;HOST=dbHost;UID=dbUser;PWD=dbPwd;PORT=1521;SID=db
Sid
-- (the AUT=0, ... settings reflect the correct defaults of 'Application
Using Threads', 'Enable SQLDescribeParam' and 'Procedure Returns
Results').
Authentication Method: Determines whether to use the specified username/password or your
Pipeline Pilot credentials.
Data Source Username: User name for logging into the database.
Data Source Password: Password for logging into the database.
SQL Statement: SQL statement sent to the database for execution.
Separate Connection: If set to False (default), all SQL components for the same Data
Source/Username use a single connection to access the database.
Data Source and SQL Statements are required on all SQL components; the Username/Password need
not be specified if they are already defined on the Administration Portal Data Source.
Parameter Mapping
If the component depends on using properties from an incoming data record (for example, updating a
database or querying for a record), the parameter SQL Parameter Mapping is also required. This is a list
of properties used in the SQL query statement for specifying one or more properties in the incoming
data stream to use as part of the query.
For example, use the following value for this parameter:
name,cas_rn,smiles,alogp,molecular_weight,ctab
To perform a database INSERT using the following SQL statement:
INSERT INTO molinfo
(name, cas, smiles, alogp, molweight, ctab)
VALUES (?, ?, ?, ?, ?, ?, ?)
In this case, question marks "?" in the SQL expression represent the values of the corresponding
parameter mapping item , which is a list of names as it appears on the property list. For example, the
Page 10 | Pipeline Pilot • Database Integration Guide
value of property "name" is stored in table column NAME; property "cas_rn" is stored in table column
CAS, and so on.
More precisely, "?" is used as a placeholder and is replaced by values on the data records, according to
the values in the SQL Parameter Mapping parameter. These values are mapped in order of appearance
into the SQL statement and replace the question marks. There must be the same number of question
marks in the update statement as there are values in the parameter mapping or the program generates
a runtime error. If a property specified in parameter mapping does not exist on the incoming record and
the column allows NULL values, then a NULL value is used. If the property is not on the current record
and the column is specified as NOT NULL, an error is generated.
You can use the same mechanism to perform a "join" operation; to extract data from the database and
add it to the property list of your data record. For example, if the value of SQL Parameter Mapping is
"canonical_smiles", use the following SQL to create a JOIN command using the "SMILES" field in the
database:
SELECT alogp, molweight FROM molinfo
WHERE smiles = ?
When executed, the values of ALOGP and MOLWEIGHT are added to each data record from the
matching database record.
Location of mapping properties
It is possible to map both data record properties and global properties and handle properties with
special characters in their names using the % and @ character and quoting the names. For example:
%’Mol Weight’, @’Smiles String’
This would map the data record property 'Mol Weight' and the global property 'Smiles String' onto the
SQL statement.
Mapping Parameters for Stored Procedures
When calling stored procedures that return values into mapped parameters, modify the SQL Parameter
Types parameter to describe the desired behavior. Parameters to store procedures can be in, inout, or
out.
The value for SQL Parameter Types must be a comma separated list of values (one of “in”, “inout”,
“out”) equal to the number of values in SQL Parameter Mapping. By default all parameters are assumed
to be "in".
For example, assume a stored procedure called "calc_alogp" that calculates alogp given a smiles string as
its first argument and returning alogp as its second argument:
{call calc_alogp(?, ?)}
The value for SQL Parameter Mapping is:
Smniles, alogp
and the value for SQL Parameter Types is:
in, out
This specifies that the value "smiles" is going "in" to the procedure call, and the value "alogp" is coming
"out" from the procedure call.
SQL Components Overview | Page 11
Selecting a Data Source
To work with the SQL components, select the Pipeline Pilot data source you want to use. If it is not
specified on the data source, username and password is also required for database access. Contact your
system administrator for this information.)
To select a Data Source:
1. In the Pipeline Pilot Client, open the SQL component and select the Data Source parameter. The
Data Source dialog opens. Use it to select a data source for configuring an SQL component.
Data Source parameter Data Source dialog
Note: Data Sources are tied to Pipeline Pilot servers. A data source must be defined on the server
before you can use it, although you can substitute a connection string. All rules and restrictions
enforced by the database apply when accessed through Pipeline Pilot and SQL. To connect to the
data source, your user name and password must be valid.
Page 12 | Pipeline Pilot • Database Integration Guide
2. Select a name from the list of available data sources.
3. Enter your user name and password. (If you configured the server to use a default username and
password, you won’t need to specify this information).
Tip: Instead of selecting a data source, you can also use a data source already opened in a SQL
Open Connection component. Parameter values are then shared between SQL components for
more efficient protocol execution. In Connection Name, enter the name. For details, see Sharing
SQL Component Connections.
4. Click OK.
Sharing SQL Component Connections
Pipeline Pilot's SQL components can each open their own connection to a database. There are
advantages to having multiple SQL components share a single connection, including:
Speed: Opening and closing a database connection takes time. Opening only one connection is much
faster.
Less demand on limited resources: The number of concurrent connections to a database is usually
limited.
Easier to set up: Instead of having to enter three separate connection parameters (Data Source, user
name and password) for each SQL component, you can refer to the name of the connection set when
opening the connection).
Smoother transaction: A protocol can be configured to cover one database transaction so changes
are only committed to the database if every operation is successful.
By default, the SQL components share a connection with all other components that use the same Data
Source and Username/Password. The Separate Connection parameter is set to "False".
To specify additional options on the connection (for example, an Initial SQL statement for row-level
security not specified on the Data Source), specify the Data Source information in a single place, use the
SQL Open Connection component. By giving this connection a name, the name can be specified in the
Data Source dialog for the other SQL components instead of specifying the Data Source name and
Username/Password.
Additionally, other components are useful for managing shared connections, including:
SQL End Transaction: Can be used to commit or rollback a transaction based on appropriate
conditions (for example, a PilotScript expression for which "True" means commit, and "False" means
rollback). You can specify Commit Frequency on the SQL Insert component when using a shared
connection.
SQL Close Connection: Closes a shared connection. If no SQL Close Connection component is
provided, the connection is automatically closed when the entire protocol finishes executing.
Batch Size
For ODBC database modification components (SQL Insert/Update/Delete) a Batch Size parameter is
specified, which controls the size of arrays of parameter values passed to the server. Using parameter
arrays rather than executing statements for one value at a time can significantly speed up the SQL
operations. The default Batch Size is set to "100". If you encounter any problems with your SQL
statement, unset Batch Size or set it to "1" to eliminate batched processing. [If the SQL Commit
Frequency has been specified for an Insert/Update/Delete component, as each batch is processed, if at
least SQL Commit Frequency records have been processed then a commit will be performed. For
SQL Components Overview | Page 13
example if SQL Commit Frequency is set to 75 and the Batch Size is set to 100, then a commit will be
performed for each batch (i.e. every 100 records).]
Tip: Batches cannot be used with Stored Procedures. If an error occurs for a record during a batch
(for example, a duplicate Primary Key violation), the offending input record is sent to the Fail port and
the successful records are sent to the Pass port.
If an insert seems to hang or not terminate correctly, an SQL error most likely occurred. Sometimes
the driver does not handle errors well for array inserts. In this case, set the Batch Size to "1" and rerun the protocol. The SQL error message is then displayed.
If an error occurs in a batch (for instance due to a constraint violation), some database drivers (e.g.
Oracle) will simply ignore the remaining records in the batch. In this case we set ErrorText to 'Row (#) was
skipped (due to previous error)', where # is the row number and send the records to the fail port. In
order to process the records that did not have an error, it would be necessary to either change the
Batch Size to "1" or pass these 'skipped' records into another SQL Insert component.
Handling of Date Values
Pipeline Pilot date variables can store date and time values to millisecond granularity. When inserting
date values into a database, the ODBC components only support second granularity. To compare a date
variable generated in Pipeline Pilot with the value stored in the database, you might need to remove the
milliseconds part of the date value. Use the following PilotScript expression to handle this task:
DateSetMilliseconds(Current_Date, 0);
Note: It is recommended that you use mapped parameters for dates in SQL statements instead of a
literal string, or use the TO_DATE() method in SQL to properly interpret format the literal strings. See
Appendix B: Using dates with Oracle based components.
Data Source Access Rights
As explained in Configuring Data Source Access Rights, you can provide end users with different access
rights to a data source. The rights are:
Edit Data Source: End users can apply the data source to a component in the Pipeline Pilot Client and
view the connection settings in the Data Source Information component.
View Data Source: End users can apply the data source to a component in the Pipeline Pilot Client
and view the connection settings in the Data Source Information component.
Use Data Source: End users can apply the data source to a component in the Pipeline Pilot Client.
None: End users cannot select a data source for a component in the Pipeline Pilot Client. The Select
Data Source dialog does not display the data source name in the list. If the user tries to run a protocol
with an inaccessible data source, an error occurs.
Examples
Edit Data Source Rights
In the Administration Portal, for the data source (niedersachsen_ddmysl60), set the "everybody" group
(all users) for the "Edit Data Source" privilege.
Page 14 | Pipeline Pilot • Database Integration Guide
Selected data source in the Administration Portal (Setup > Databases)
Everybody has the "Edit Data Source" privilege
View Data Source Rights
In the Administration Portal, for the data source (niedersachsen_ddmysl60), set the everybody group
(all users) to have the "View Data Source" privilege:
SQL Components Overview | Page 15
Everybody has the ‘View Data Source’ privilege
Use Data Source Rights
In the Administration Portal, for the data source (niedersachsen_ddmysl60), set the "everybody" group
(all users) to have the "Use Data Source" privilege:
Everybody has the “Use Data Source” privilege
For Use Data Source rights, users can select a data source for a component in the Pipeline Pilot Client.
(No View or Edit button is available for accessing the data source definition.)
Page 16 | Pipeline Pilot • Database Integration Guide
Select Data Source dialog options for end users with “Use Data Source” privileges.
None Rights
In the Administration Portal, for the data source (niedersachsen_ddmysl60), set a privilege for one or
more specific users, and then set the "everybody" group to have the "None" privilege:
The everybody group has the ‘None’ privilege (i.e. no access)
Users with None rights cannot select the data source in the Edit Data Source dialog. If the user tries to
run a protocol with an inaccessible data source, an error occurs.
SQL Components Overview | Page 17
Select Data Source dialog options for end users with “None” privileges for the data source
(Niedersachsen_ddmysql60).
With these access rights, the end user cannot select the data source from the dialog.
Notes:
If possible, for ODBC data sources, use ODBC (PP) as the data source type, and configure the data
source directly in the Pipeline Pilot Client dialogs or in the Admin Portal. This is quicker than logging
into the Pipeline Pilot server to manually create a DSN. The appropriate settings (for example,
"EDP=1" for EnableDescribeParams and "PRR=1" for ProcedureRetResults) are automatically set,
so you only need to focus on the connection settings.
Data source definitions (DSNs) are stored in an encrypted file called "DataSources.xml" on the
Pipeline Pilot server. To access a data source without providing the user name or password for
database connectivity, you can associate access privileges with your data sources.
For JDBC database connections or for ODBC connections to databases accessed with Pipeline
Pilot's Data Direct drivers for Oracle, MySql, SQL Server, and DB2, you can create a data source
without a DSN, directly from the Pipeline Pilot Administration Portal (Setup > Databases).
You can store a DSN directly in a file on your Pipeline Pilot server, bypassing the need to create a
DSN. (You do not need to log into the Administration Portal to perform this task.)
You can also use operating system DSNs. By default, all your existing DSNs are available for use.
Page 18 | Pipeline Pilot • Database Integration Guide
Chapter 4:
Building SQL Statements
You can configure the SQL component to work with your network data by building SQL statements. The
SQL Builder is available for this purpose, making it convenient to compose and test SQL queries.
Opening the SQL Builder
To open the SQL builder:
1. Open the SQL component in the workspace.
2. Ensure you have a valid Data Source specified for the Data Source parameter. (For details, see
Selecting a Data Source.)
3. Open the SQL Statement parameter.
SQL Statement parameter
4. The SQL Builder dialog opens and connects you to your database. By default, all tables and views
available in the database are listed on the left. To filter the list, click Options, and select "Only show
tables owned by user". The list is updated to expose tables and views for the logged-in user.
Note: All rules and restrictions enforced by the database apply when accessed through Pipeline Pilot
and the SQL components. You need appropriate permissions on database objects such as tables,
views, and indexes. Table restrictions, such as uniqueness constraints, are enforced. If you do not
have access to a table, it is not displayed in the graphical list.
SQL Editing Modes The SQL Builder offers different ways to build SQL statements—in a graphical mode and in a text mode.
Both modes allow you to inspect the tables, views, synonyms and nicknames all configured in the
database.
Graphical mode: For graphically building SQL SELECT queries. This mode is accessible from the Build
SQL Select tab. It supports drag and drop operations and makes it convenient to construct SQL
SELECT queries visually.
Building SQL Statements | Page 19
Text mode: For manually constructing more advanced SELECT and other types of queries. This mode
is accessible from the Edit SQL tab and offers support for drag and drop of tables and columns.
Graphically Building SQL Statements
Use the Build SQL Select tab to visually compose your SQL statements. It consists of several task panes.
The left pane displays tables and views of your data. The adjacent across the top is a representation of
the tables and their columns that are a part of the query. The right pane is used to construct the WHERE
clause in the SQL statement. The bottom pane is the SQL that the graphical interface is building; the SQL
itself is read only in this view. Visually build SQL statements at Build SQL Select tab
To graphically build an SQL statement:
1. From the list of available tables and views, select the one you want to add to the query by dragging
and dropping it into the second pane. 2. Repeat for each table you want to add to the query. 3. The table is displayed showing each of its columns and the column types. To add columns to the
data set, select the check box on the left side. You can also select a column by dragging the column
from the table to the view pane.
4. To create SQL JOIN statements, make a link between the columns to join by using the purple boxes
on either side of a column (similar to the way you connect components). Drag the mouse from the
first purple node to the second one. The color and text format represent different attributes:
Page 20 | Pipeline Pilot • Database Integration Guide
Bold text: All primary keys. Blue line: A join between the two columns (usually in different tables). IMPORTANT! You can only build SQL SELECT statements in the visual editor. Other SQL statements
need to be constructed manually, as explained in the next section.
Manually Constructing SQL Statements
To manually enter an SQL statement:
1. Select the Edit SQL tabs. This tab displays two panes. The left pane displays tables and views. The
right pane is a text editor for entering your SQL syntax.
2. Select the tables and views from the left. You can drag and drop tables and columns directly into
your SQL statement.
3. Enter the SQL text on the right. Auto-completion speeds up SQL development by reducing the amount of keyboard input on
your part.
For a list of available tables, press CTRL+SPACE or F4. For a list of columns, type a period (.) after a table name. SQL keywords are displayed in distinct colors to reveal different parts of an SQL statement.
Manually construct SQL statements at Edit SQL tab
SQL Editing Guidelines
The text SQL edit mode is always available for manually editing your SQL syntax. However, after
manually editing an SQL statement in the text mode, you cannot go back to the graphic mode without
Building SQL Statements | Page 21
losing any manual changes you made. A prompt is displayed before the changes are discarded.
Prompt to discard manual SQL edits
Tip: You can erase all queries and start over by selecting Reset All.
Creating a Data Table
You can create a database table using the following SQL statement:
CREATE TABLE molinfo (
name VARCHAR(20),
CAS_RN VARCHAR (20),
molweight FLOAT,
molformula VARCHAR (40),
alogp FLOAT,
smiles VARCHAR(255),
ctab CLOB
)
Alternatively, you can read from the database using the following SQL statement:
SELECT * FROM molinfo
Testing SQL Statements
Test your SQL syntax directly in the SQL builder. This can help you debug syntax errors while you do
your work. This can help you determine if your SQL command is returning the information you require. Tip: The Results tab only brings back the first n records, which is set to 10 by default. To change this,
click Options and enter a value in the prompt up to a maximum of 100 records.
To test your SQL statement:
Select Execute SQL. The results are displayed in the Results tab.
For example, test the following SQL statement:
SELECT * FROM Categories
The results should look something like this:
Results tab
Page 22 | Pipeline Pilot • Database Integration Guide
Notes:
You can only test queries that begin with SELECT.
You can enter SQL commands in Edit mode without connecting to a database. However, to see a
list of items in Tables and Views, and to execute test SQL commands, you need to specify a Data
Source.
Error Handling
The Pipeline Pilot Client displays errors generated by the database and returned through the driver. A
protocol's response to the error is determined by the OnGeneralError setting. For example, a general
error occurs when you execute a DROP TABLE command for a non-existent table. If you change
OnGeneralError to Send Data out the Fail Port, the input record is sent to the Fail port with an error
message in the ErrorText property, rather than stopping the protocol.
There is another scenario where an SQL component can send a record to the Fail port without an actual
error. The Select for Each Data component matches data records against a SQL query. For each record
returned from the query, an output record is created on the Pass port, with a copy of the input record
properties and the fields returned from the SQL statement. If there are no matches for an input record
(the SQL query returns no records for a particular set of input parameters), this input record is routed to
the Fail port.
Tip: To determine whether an update or delete was successful (i.e., whether it actually modified any
records as opposed to just an SQL error), set the Additional Options > Row Count parameter on the
component to the name of a property to place in the output record. The value of this property
reflects the number of table rows modified for each input record. If no modifications were performed,
the value is 0.
When an error occurs, the database might be in an incomplete state, with only part of the processing
committed. Commit Frequency determines how often the data is committed to the database. A value of
"1" commits every modification individually, causing a large amount of disk processing and slower
processing. Setting Commit Frequency to a large value to allow for an "all-or-nothing" transaction
requires the database to have enough disk space to store the entire transaction before it is committed.
The default value is "1,000". If an error occurs, a rollback is performed to the last committed state (If an
error occurs after 5,326 inserts, there would be 5,000 records/rows committed). To roll back all of the
rows, set this value to a large enough number, and also ensure that the database can handle the large
transaction.
Note: It is recommended that you use mapped parameters for dates in SQL statements instead of a
literal string, or use the TO_DATE() method in SQL to properly interpret format the literal strings. See
Appendix B: Using dates with Oracle based components.
Building SQL Statements | Page 23
Chapter 5:
Customizing SQL Components
You can create a custom SQL component to perform a specific task. Custom components are versatile
because you can reuse them in other protocols and share them with other users who need to perform
similar tasks. By doing some work upfront (for example, selecting the correct Data Source and specifying
a query), you can reuse a component repeatedly without having to recreate it each time. This is
particularly helpful for sharing with other users who might be less proficient working with SQL.
One task you might frequently perform is retrieving all records from a database that were added since
the previous workday. The following steps explain how to customize an SQL component using the SQL
Select component.
To customize the SQL Select component:
1. Add the SQL Select component to a new blank workspace.
2. Change the component caption to "New Molecules Reader".
3. Open the Data Source parameter and select the data source name on your server.
4. Open the SQL Statement parameter and enter the SQL query statement. For example:
SELECT * FROM molecules WHERE (cdate > sysdate – 1)
When complete, the parameters should look something like this:
Parameters for SQL Select component
5.
6.
7.
8.
Right-click the component and select Edit.
Click the Help tab and enter tooltip help and a description for this component.
Click OK to save the parameter changes.
Save the component as "New Molecules Reader".
Dynamic SQL Using String Replacement
You can use SQL statements that are dynamically generated at run time by using string replacement
notation. You can also use the values of parameters in addition to global variables. The syntax is the
same for parameters as global variables. (The parameters and global variables referenced in the SQL
statement need to have a value prior to the execution of the database component.)
Customizing SQL Components | Page 24
The following example illustrates an SQL statement that uses the values of two global properties
(@mySalary and @myName):
SELECT *
FROM employee
WHERE salary > $(mySalary) OR firstName = ’$(myName)’
Page 25 | Pipeline Pilot • Database Integration Guide
Chapter 6:
Building Protocols with Multiple SQL Components
A protocol can include multiple SQL components that access and update the same database table.
However, you might have performance problems when running this type of protocol if each component
uses a separate database connection.
For example, suppose an SQL component executes a SQL statement such as:
UPDATE table mol_a WHERE id = 1 SET smiles=. . .
Followed by a second SQL component on a different connection that executes a SQL statement like this:
SELECT * FROM mol_a WHERE id = 1
The SQL is executed within a transaction which can create problems when running the protocol. The first
SQL statement is executed in a transaction and that transaction remains open. The second SQL
statement cannot execute (because the row is locked by the previous transaction), until the first
transaction is completed (commit or rollback). This creates a "deadlock" situation across two database
connections.
To solve this problem, ensure that both components use the same SQL connection (ensure that the
Separate Connection parameter is set to "False").
Building Protocols with Multiple SQL Components | Page 26
Chapter 7:
Calling Database Stored Procedures
SQL components provide support for calling database stored procedures. These are small programs
written in a database native SQL language and compiled on the database. Some organizations prefer
applications to call stored procedures rather than use native SQL statements for security and
performance reasons.
The SQL components are able to call the following types of stored procedures:
Stored procedures with input, input/output, and output arguments: These procedures can modify
their arguments, perform database operations, and optionally return modified values to the calling
program.
Functions: Functions are stored procedures that have a return value. The function is executed on the
server and its return value is returned to the SQL component
Stored procedures that return one or more result sets: Stored procedures can return values by
generating one or more result sets. In MySQL and SQL, perform a select statement within the stored
procedure. In Oracle, define one or more arguments as 'REF CURSOR' and create the cursors within
the stored procedure. Returning result sets is an efficient way of retrieving data from a stored
procedure.
The SQL Statement syntax for calling a stored procedure from ODBC or JDBC is:
{ call proc(?, ?) }
Where the { and } characters must wrap the entire SQL statement, proc is the name of the stored
procedures, and ?, ? represents the set of arguments (in this case there are two arguments).
Tip: for Oracle and SQL Server the maximum string length that can be passed in or out of a stored
procedure argument is 4000 characters; for MySQL the limit is 255.
Oracle Stored Procedures
Simple Stored Procedure
This example modifies some of its arguments and returns their values (it does not interact with any
database tables).
To create the stored procedure in the database:
1. Execute the following SQL (via an SQL Admin component or using a native query tool for the
database):
CREATE OR REPLACE PROCEDURE proc1
(
i_Arg1
IN NUMBER,
o_Arg2
OUT VARCHAR,
io_Arg3
IN OUT NUMBER,
o_Arg4
OUT NUMBER,
o_Arg5
IN OUT DATE
)
AS
BEGIN
Calling Database Stored Procedures | Page 27
o_Arg2 := 'A return string';
io_Arg3 := io_Arg3 + i_Arg1 * 2;
o_Arg4 := 5 * i_Arg1;
o_Arg5 := o_Arg5 + 3;
END proc1;
2. Set up an SQL General component connected to the data source with the following SQL Statement
(all of these examples assume you have a SQL Open Connection component that connects to the
database and specifies a Connection Name of “DB”):
SQL statement for a simple stored procedure that modifies some of its arguments and returns their
values without interacting with any database tables
In the above example, "proc1" takes five arguments. The first argument (corresponding to property
v1) is an INPUT argument. Arguments 2 and 4 are OUTPUT arguments (no input value sent from
Pipeline Pilot, but an output value retrieved). Arguments 3 and 5 are IN OUT arguments (an input
value is sent from Pipeline Pilot and retrieved back after the stored procedure is executed).
Tip: If a property is not present on the data record for an IN OUT parameter, or if the parameter is
marked as OUT in the SQL component (even though the stored procedure indicates it is an IN
OUT argument), a NULL is sent into the stored procedure. Make sure that the stored procedure
can handle this NULL value.
3. Provide input data for the SQL component (for example, with a Custom Manipulator):
v1 := 5;
v3 := 3;
v5 := '01-AUG-2009';
4. Execute the pipeline and view the output data record. Data properties appear in the order they are
created:
v1: 5
v3: 13
Modified as 2 * v1 plus its original value.
v5: 08/04/09 00:00:00
Represents a date (in the mm/dd/yy hh:mi:ss format) that is three days later
than the input date.
v2: A return string A static return string; created at the end of the data record (as an output
parameter).
v4: 25
v1 * 5; created at the end of the data record (as an output parameter).
Page 28 | Pipeline Pilot • Database Integration Guide
Simple Function
This example shows you how to invoke a simple function from the SQL components.
To invoke a simple function:
1. Create the function in the database by executing the following SQL (either as an SQL Admin
component or using a native query tool for the database):
CREATE OR REPLACE FUNCTION func1
(
i_id
) RETURN VARCHAR2
AS
BEGIN
IN NUMBER
return 'This is the value: ' || to_char(i_id);
END func1;
2. Set up an SQL General component connected to the data source with the following SQL statement:
SQL statement to invoke a simple function
In this example, "func1" takes one argument. It also represents the return value in the parameter
mapping; since the parameters are mapped from left to right in the SQL statement, the return value
(output parameter) is the first one in the mapping.
Note: The ? := funcName syntax for calling Oracle functions.
3. Provide input data for the SQL component (for example, with a Custom Manipulator):
v1 := 17;
4. Execute the pipeline and view the output data record:
v1:
17
v2:
This is the value: 17
Stored Procedure with Result Sets
This example shows how to invoke a stored procedure that returns one (or more) result sets from SQL.
It involves creating a type to represent the REF CURSOR, and embed the type and procedure within an
Oracle package (as an alternative use the system defined type SYS_REFCURSOR).
To invoke the stored procedure with result sets:
Calling Database Stored Procedures | Page 29
1. Create a base table and load it with some data (either as this SQL script or through the SQL Admin
components):
create table T1
(
myInt
myFloat FLOAT,
myChar
);
INTEGER,
VARCHAR2(50)
insert into T1 values (1, 1.5, 'Val 1');
insert into T1 values (2, 2.5, 'Val 2');
insert into T1 values (3, 3.5, 'Val 3');
insert into T1 values (4, 4.5, 'Val 4');
insert into T1 values (5, 5.5, 'Val 5');
commit;
2. Then define the stored procedure (note the curODBC (i.e., REF CURSOR) arguments must be marked
as IN OUT):
CREATE OR REPLACE PACKAGE procTest AS
type curODBC is ref cursor;
PROCEDURE procArray1
(
i_id
o_cursor
o_cursor2
o_string
);
IN NUMBER,
IN OUT curODBC,
IN OUT curODBC,
OUT VARCHAR
END procTest;
CREATE OR REPLACE PACKAGE BODY procTest AS
PROCEDURE procArray1
(
i_id
o_cursor
o_cursor2
o_string
IN NUMBER,
IN OUT curODBC,
IN OUT curODBC,
OUT VARCHAR
)
AS
BEGIN
open o_cursor for
select myInt, myFloat, myChar from T1
where myInt >= i_id order by myInt;
Page 30 | Pipeline Pilot • Database Integration Guide
open o_cursor2 for
select myInt from T1
where myInt <= 2 order by myInt;
o_string := 'Return value from procArray1';
END procArray1;
END procTest;
3. Set up an SQL General component connected to the data source with the following SQL statement:
SQL statement to invoke a stored procedure that returns one (or more) result sets
This example does not specify the REF CURSOR arguments in the SQL Statement or in the parameter
mapping. The DataDirect ODBC driver automatically takes care of handling these arguments as
result sets. The Select Behavior parameter is set to "Multiple Data", so each row from the returned
result sets is output as a separate data record.
4. Provide input data for the SQL component (for example, with a Custom Manipulator):
v1 := 3;
5. Execute the pipeline and view the output data records.
v1
v2
myInt
myFloat
myChar
3
Return value from procArray1
3
3.5000
Val 3
3
Return value from procArray1
4
4.5000
Val 4
3
Return value from procArray1
5
5.5000
Val 5
3
Return value from procArray1
1
3
Return value from procArray1
2
myl
v2 was updated to reflect the output value from the stored procedure. There are now a total of five
data records, three from the first result set, and two from the second. The first result set has three
properties (myInt, myFloat, and myChar); the second result set has only one (myI).
Tip: If you configure the SQL component’s Select Behavior parameter as "Append Values" instead of
"Multiple Data", the output records from the REF CURSORs are treated as Arrays on a single data
record. Ensure that the names of the columns in the cursors are different or the values from the
second result set will overwrite those from the first.
Calling Database Stored Procedures | Page 31
The ODBC protocol does not support Oracle VARRAY (it is not possible to return a VARRAY from a stored
procedure). Using a REF CURSOR to retrieve a result set and specifying Select Behavior as "Append
Values" provides equivalent behavior:
v1
myInt
myFloat
myChar
myl
v2
3
3
4
5
3.5000
4.5000
5.5000
Val 3
Val 4
Val 5
1
2
Return value from procArray1
Tip: If you use JDBC to access Oracle Stored procedures that return ref cursors, you need to indicate
the arguments in the SQL Parameter Mapping parameter (use any property name since it won’t be
used) and describe their type as cursor in the SQL Parameter Types parameter:
SQL statement to invoke a stored procedure that returns one (or more) result sets using JDBC
Microsoft SQL Server Stored Procedures
Simple Function
This example shows you how to invoke a simple function.
To create a simple function:
1. Create the function in the database by executing the following SQL (either as an SQL General
component or using a native query tool for the database):
CREATE FUNCTION [dbo].[func1] (
@i_id INT
) RETURNS VARCHAR(500)
AS
BEGIN
return('This is the value: ' + str(@i_id))
END
Page 32 | Pipeline Pilot • Database Integration Guide
2. Set up an SQL General component connected to the data source with the following SQL statement:
SQL statement to invoke a simple function (C6_5.png)
In this example, "func1" takes one argument. The return value in the parameter mapping is also
represented, and since the parameters are mapped from left to right in the SQL statement, the
return value (output parameter) is the first one in the mapping.
Note: The ? = call funcName syntax for calling SQL Server functions.
3. Provide input data for the SQL component (for example, with a Custom Manipulator):
v1 := 17;
4. Execute the pipeline and view the output data record:
v1:
17
v2:
This is the value: 17
Tip: As an alternative syntax, it is possible to call functions in SQL Server using the "select
dbo.funcName(?) propName" syntax (as described for MySQL below).
Stored Procedure with Result Sets
This example invokes a stored procedure that returns one (or more) result sets from the SQL
components. Your stored procedure must have one or more SELECT statements that do not place their
results anywhere (for example, into an insert); each of these SELECT statements (in order) is returned as
a result set from execution of the stored procedure.
To create the example:
1. Create a base table and load it with data (either as this SQL script or through SQL Admin
components):
create table t1
(
myInt
INTEGER,
myFloat
FLOAT,
myChar
VARCHAR(50)
)
insert into t1 values (1, 1.5, 'var 1')
insert into t1 values (2, 3.5, 'var 2')
Calling Database Stored Procedures | Page 33
insert into t1 values (3, 4.5, 'var 3')
insert into t1 values (4, 4.5, 'var 4')
insert into t1 values (5, 5.5, 'var 5')
Then define the stored procedure:
CREATE PROCEDURE procArray1 (
@i_id INT,
@o_string VARCHAR(255) OUTPUT
)
AS
select myChar from t1
where myInt <= 2 order by myInt;
select myInt, myFloat, myChar myC from t1
where myInt >= @i_id order by myInt;
set @o_string = 'Return value from procArray1'
2. Set up an SQL General component connected to the data source with the following SQL Statement:
SQL statement for a stored procedure that returns one (or more) result sets from ODBC. Select
Behavior is set to "Multiple Data", so each row from the returned result sets is output as a separate
data record.
3. Provide input data for the SQL component (for example, with a Custom Manipulator):
v1 := 3;
4. Execute the pipeline and view the output data records:
v1
v2
myChar
myInt
myFloat
myC
3
3
4.5000
var 3
3
4
4.5000
var 4
3
5
5.5000
var 5
3
var 1
3
var 2
Page 34 | Pipeline Pilot • Database Integration Guide
There are now a total of five data records, two from the first result set, and three from the second. The
first result set has only one property (myChar); the second result set has three properties (myInt,
myFloat, and myC).
Note: v2 was not updated to reflect the output value from the stored procedure. Unlike the Oracle
example, output parameters from stored procedures in MySQL and SQL Server that return result sets
can only be retrieved if Select Behavior is set to "Append Values".
If you configure Select Behavior as "Append Values" instead of "Multiple Data", the output records from
the stored procedure result sets are treated as Arrays on a single data record (make sure that the names
of the columns in the selects are different or the values from the second result set will overwrite those
from the first):
v1
myChar
myInt
myFloat
myC
V2
3
var 1
var 2
3
4
5
4.5000
4.5000
5.5000
var 3
var 4
var 5
Return value from procArray1
Tip: You can use the same SQL statement to execute this stored procedure using the JDBC Microsoft
SQL Server driver (com.microsoft.sqlserver.jdbc.SQLServerDriver). However, this driver does not
support access to OUT parameters at the same time as extra result sets. In the SQL Parameter Types
parameter, set the value to "in". This correctly retrieves the results from the select statements (but v2
is unset).
MySQL Stored Procedures
Simple Stored Procedure
This procedure simply modifies some of its arguments and returns their values (it does not interact with
any database tables).
To create the example:
1. Create the stored procedure in the database by executing the following statement (either as an SQL
General component or using a native query tool for the database):
DELIMITER $$
DROP PROCEDURE IF EXISTS `proc1` $$
CREATE PROCEDURE `proc1`(
IN i_Arg1 INT,
OUT o_Arg2 VARCHAR(255),
INOUT io_Arg3 FLOAT,
OUT o_Arg4 FLOAT,
INOUT io_Arg5 DATETIME
)
BEGIN
select 'A return string' into o_Arg2;
select io_Arg3 + i_Arg1 * 2 into io_Arg3;
select 5.5 * i_Arg1 into o_Arg4;
Calling Database Stored Procedures | Page 35
select ADDDATE(io_Arg5, 3) into io_Arg5;
END $$
DELIMITER ;
2. Set up an SQL General component connected to the data source with the following SQL Statement:
SQL statement that modifies some of its arguments and returns their values without interacting with
any database tables
In this example, proc1 takes five arguments. The first argument (corresponding to property v1) is an
INPUT argument. Arguments 2 and 4 are OUTPUT arguments (no input value sent from Pipeline Pilot
but an output value retrieved). Arguments 3 and 5 are IN OUT arguments (an input value is sent
from Pipeline Pilot and retrieved back after the stored procedure is executed).
Note: If a property is not present on the data record for an inout argument, a NULL is sent into
the stored procedure; make sure that the stored procedure can handle this NULL value.
3. Provide input data for the SQL component (for example, with a Custom Manipulator):
v1 := 5;
v3 := 3.5;
v5 := date('01-AUG-2009 11:05:00');
Note: MySQL string properties are not automatically converted into date values the way they are
for Oracle and SQL Server. It is necessary to provide a date property to avoid SQL format errors.
4. Execute the pipeline and view the output data record. Data properties appear in the order they are
created:
v1: 5
v3: 13.500
Modified as 2 * v1 plus its original value.
v5: 2009-08-04
11:05:00
Represents a date (in the mm/dd/yy hh:mi:ss format) that is three days later
than the input date.
v2: A return string
A static return string; created at the end of the data record (as an output
parameter).
v4: 27.5
v1 * 5.5; created at the end of the data record (as an output parameter).
Page 36 | Pipeline Pilot • Database Integration Guide
Simple Function
This example shows you how to invoke a function from the SQL components.
To create the example:
1. Create the function in the database by executing the following SQL (either as an SQL General
component or using a native query tool for the database):
CREATE FUNCTION func1(
i_Arg1 INT
) RETURNS varchar(255)
BEGIN
return concat('This is the value: ', i_Arg1);
END $$
2. Set up an SQL General component connected to the data source with the following SQL Statement.
SQL statement that invokes a function (C6_7.png)
In this case, "func1" takes one argument. In MySQL, you cannot use the call syntax to invoke
functions. Use a SELECT statement and alias the function name to the appropriate property name
(this output property is not specified in the SQL Parameter mapping because it is not a parameter in
the SQL call).
3. Provide input data for the SQL component (for example, with a Custom Manipulator):
v1 := 17;
4. Execute the pipeline and view the output data record:
v1:
17
v2:
This is the value: 17
Stored Procedure with Result Sets
This example shows you how to invoke a stored procedure that returns one (or more) result sets from
ODBC or JDBC. To do this, the stored procedure has one or more SELECT statements that do not place
their results anywhere (for example, into an insert or a variable); each of these select statements (in
order), is returned as a result set from invocation of the stored procedure.
To create the example:
1. Create a base table and load some data (either as this SQL script or through the SQL Admin
components):
create table t1
(
Calling Database Stored Procedures | Page 37
myInt
myFloat
myChar
INT,
FLOAT,
VARCHAR(50)
);
insert into t1 values (1, 1.5, 'var 1');
insert into t1 values (2, 3.5, 'var 2');
insert into t1 values (3, 4.5, 'var 3');
insert into t1 values (4, 4.5, 'var 4');
insert into t1 values (5, 5.5, 'var 5');
2. Then define the stored procedure:
CREATE PROCEDURE `procArray1`(
IN i_id INT,
INOUT o_string VARCHAR(255)
)
BEGIN
select ‘Return value from procArray1' into o_string;
select myChar myC from t1
where myInt <= 2 order by myInt;
select myInt, myFloat, myChar from t1
where myInt >= i_id order by myInt;
END $$
3. Set up an SQL General component connected to the data source with the following SQL Statement:
SQL statement that invokes a stored procedure that returns one (or more) result sets.
Select Behavior is set to "Multiple Data"; each row in the result sets is output as a separate data
record.
4. Provide input data for the ODBC component (for example, with a Custom Manipulator):
v1 := 3;
Page 38 | Pipeline Pilot • Database Integration Guide
5. Execute the pipeline and view the output data records:
v1
v2
myC
myInt
myFloat
myChar
3
3
4.5000
var 3
3
4
4.5000
var 4
3
5
5.5000
Var 5
3
var 1
3
var 2
There are now a total of 5 data records, 2 from the first result set and 3 from the second. The first
result set has only one property (myC); the second result set has three properties (myInt, myFloat,
and myChar).
Note: v2 was not updated to reflect the output value from the stored procedure. Unlike the
Oracle example, output parameters from stored procedures in MySQL and SQL Server that return
result sets can only be retrieved if Select Behavior is set to "Append Values".
Tip: You can use the same SQL statements to access these stored procedures and functions using
the JDBC MySQL driver (com.mysql.jdbc.Driver).
Identifying Result Sets
If you have a stored procedure returning multiple result sets and you want to distinguish which records
came from which result set, you can add a Source Tag parameter to the SQL component.
To identify the result set:
1. Select the Additional Options parameter and click Edit.
2. Select Add to create a new parameter.
3. Assign the name "Source Tag" and set its legal values to:
KeepCurrent
ResultSet
Calling Database Stored Procedures | Page 39
By setting the parameter to ResultSet, a numeric Source Tag property is added to each data record
The property contains the number of the source result set:
v1
v2
myC
Source Tag
3
var 1
1
3
var 2
1
myInt
myFloat
myChar
3
2
3
4.5000
var 3
3
2
4
4.5000
var 4
3
2
5
5.5000
Var 5
If you configure Select Behavior as "Append Values" instead of "Multiple Data", the output records from
the stored procedure result sets are treated as Arrays on a single data record (make sure that the names
of the columns in the selects are different or the values from the second result set will overwrite those
from the first):
v1
myC
myInt
myFloat
myChar
v2
3
var 1
var 2
3
4
5
4.5000
4.5000
5.5000
var 3
var 4
var 5
Return value from procArray1
Page 40 | Pipeline Pilot • Database Integration Guide
Appendices
Appendix A: ODBC Server Configuration
IMPORTANT! Before configuring an ODBC data source, install Microsoft Data Access Components
(MDAC) version 2.8 or greater on your server. You can download the latest version of the MDAC from
MSDN MDAC Downloads.
To configure an ODBC data source:
1. Log into the server as the system administrator or user who installed Pipeline Pilot.
2. From the Windows Start menu, select Settings > Control Panel > Administrative Tools.
3. Open the tool Data Sources (ODBC) and go to the System DSN tab.
ODBC Data Source Administrator dialog - System DSN tab
Appendices | Page 41
4. Click Add. The Create New Data Source dialog opens.
Create New Data Source dialog
5. Select the appropriate driver for the database connection.
IMPORTANT! For Oracle, DB2 or SQLServer databases, use the DataDirect drivers installed with
Pipeline Pilot.
6. Click Finish. A driver configuration dialog for the specific driver you select opens. It should look
something like this:
Page 42 | Pipeline Pilot (Undefined variable: ProductVariables.Product Version) • Database Integration Guide
Driver-specific configuration dialog
7. Fill out the following fields:
Data Source Name: Type a name for this new data source. You need to use this name later to
specify how to access this database.
Identify the service name of the previously configured database. If your driver has an option to
include the "SQLGetData" or "SQLDescribeParam extension", select this option.
Note: Description and SID fields are optional. You can test the driver configuration, if your driver
dialog includes such a facility.
8. Make a note of the following settings for the DataDirect drivers:
BIOVIA Oracle: From the Advanced tab, enable "SQLDescribeParam" and "DescribeAtPrepare".
In some instances, you might need to disable "SQLDescribeParam" (for example, when using
stored procedures). For further information, contact Technical Support.
Appendices | Page 43
BIOVIA DB2: From the General tab, ensure that you enter a package name to use as the "Bind
process" in the Bind tab. (This might need to be different from the default setting.)
9. Click OK. The new data source is added to the list of system DSNs.
Newly added data source
Configuring an ODBC Data Source on Linux
The steps required on Linux are similar to Windows, except that the ODBC configuration persists in a file
called “odbc.ini”.
To configure an ODBC data source:
1. Log into the server as the user who installed Pipeline Pilot.
2. Change into Pipeline Pilot's installation directory.
3. Change into "apps/scitegic/core/packages_linux64/datadirect".
4. If you have not done so before, create the configuration file "odbc.ini", by copying the installed
"odbc.ini.sample" file.
5. Open "odbc.ini" in your editor.
6. Add an entry to the [ODBC Data Sources] section for your data source.
For example, MyOracleDB=DataDirect 7.1 Oracle Wire Protocol.
7. Duplicate the complete [Oracle Wire Protocol] section, in case your data source is an Oracle DB.
Duplicate the DB2 or SQLServer template sections, in case you need to connect to one of these
databases.
8. Back up "odbc.ini". Effective with Pipeline Pilot version 7.5, "odbc.ini" is copied to the backups folder
during an upgrade or uninstall (and restored during a re-install). This folder (like all files in
"root/apps/scitegic") might be overwritten or removed by the installer, so a backup is a way to
ensure you can access the data if it gets replaced.
To configure an Oracle database on Linux:
1. Change the new section's header to match MyOracleDB.
2. Locate the EnableDescribeParam setting and set it to "1".
Page 44 | Pipeline Pilot (Undefined variable: ProductVariables.Product Version) • Database Integration Guide
3. Locate the HostName setting and set it to the database server’s name.
4. Locate the PortNumber setting and set it to your database’s port.
5. Locate the SID setting and set it to your Oracle SID.
To configure a DB2 database on Linux:
1. Locate the IpAddress setting and set it to the database server's IP address.
2. Locate the Package setting and set it to the right package.
3. Locate the TcpPort setting and set it to port for which the database is configured to listen.
Appendix B: Using dates with Oracle based components
Because date formats in Oracle are controlled by the internal property NLS_DATE_FORMAT, if you use
literal strings to handle dates in your SQL statements you may write code that is non-portable.
For example, with the following SQL expression:
select * from myTable where thedate < '01-FEB-2012';
If the NLS_DATE_FORMAT is DD-MON-RR, this will behave properly, but if one of the Oracle instances
you use has another value (e.g., DD/MM/RR) this select statement would fail and no records would be
selected.
A more robust method is to use a DateTime object in your incoming properties and map that parameter
in the query:
select * from myTable where thedate < ?;
You need to set the SQL Parameter Mapping parameter of the SQL Component to the appropriate
property in the input data (e.g., theDate).
You can use a Custom Manipulator (PilotScript) to ensure that the property is set up properly as a
DateTime:
ChangePropertyType(theDate,'DateTime');
This query will behave correctly no matter which format is used in the Oracle instance.
If you must use a literal string for a date, use the TO_DATE() SQL function to get a proper Date object
for the query:
select * from myTable where thedate < TO_DATE('01-FEB-2012', ‘DD-MON-RR’);
Although this complicates the query, it protects you from differences in the configuration of Oracle
instances.
Appendices | Page 45