Denodo ITPilot 4.6 Developer Guide

Denodo ITPilot 4.6 Developer Guide
DENODO ITPILOT 4.6 DEVELOPER GUIDE
th
Update Aug 16 , 2011
NOTE
This document is confidential and is the property of Denodo Technologies
(hereinafter Denodo).
No part of the document may be copied, photographed, transmitted
electronically, stored in a document management system or reproduced by any
other means without prior written permission from Denodo.
Copyright  2011
This document may not be reproduced in total or in part without written permission from Denodo Technologies.
ITPilot 4.6
Developer Guide
INDEX
PREFACE ............................................................................................................................................................................ I
SCOPE .......................................................................................................................................................................... I
WHO SHOULD USE THIS DOCUMENT ................................................................................................................... I
SUMMARY OF CONTENTS ....................................................................................................................................... I
1
INTRODUCTION ...................................................................................................................................................... 2
2
DEPLOYING AND INVOKING ITPILOT WRAPPER ACCESS WEB SERVICES ............................................... 3
2.1 WEB SERVICE TYPES ................................................................................................................................... 3
2.2 INVOKING SOAP WEB SERVICES .............................................................................................................. 3
2.3 INVOKING THE EXPORTED REST AND HTML WEB SERVICES ............................................................. 3
2.3.1 HTML Output Configuration ......................................................................................................................... 4
2.4 CONFIGURING CONNECTIONS IN THE PUBLISHED WEB SERVICES.................................................. 5
3
ITPILOT DEVELOPMENT API ................................................................................................................................ 7
3.1 CONNECTING TO THE SERVER ................................................................................................................... 7
3.2 OBTAINING WRAPPERS .............................................................................................................................. 8
3.3 USING WRAPPERS ....................................................................................................................................... 8
3.4 PROCESSING QUERY RESULTS .................................................................................................................. 9
3.4.1 Canceling Queries ...................................................................................................................................... 11
3.5 EXAMPLE OF USE ........................................................................................................................................ 11
4
CREATING CUSTOM ITPILOT FUNCTIONS ...................................................................................................... 14
4.1 NAMING CONVENTIONS AND ANNOTATIONS .................................................................................... 15
4.2 COMPOUND TYPES .................................................................................................................................... 15
4.3 PAGE TYPE ................................................................................................................................................... 16
4.4 CUSTOM FUNCTION RETURN TYPE ........................................................................................................ 16
4.5 EXAMPLE ...................................................................................................................................................... 17
5
DEVELOPING ITPILOT WRAPPERS WITH JAVASCRIPT ............................................................................... 18
5.1 INTRODUCTION ........................................................................................................................................... 18
5.2 REPRESENTATION FORMAT OF A WRAPPER........................................................................................ 18
5.2.1 Initialization of Searchable Parameters ..................................................................................................... 19
5.2.2 Main Function ............................................................................................................................................ 19
5.2.3 Generating the Output Structure ............................................................................................................... 19
5.3 PREDEFINED ITPILOT COMPONENT GUIDE ........................................................................................... 19
5.3.1 Introduction ................................................................................................................................................ 19
5.3.2 Data Structures .......................................................................................................................................... 19
5.3.3 Common functions ..................................................................................................................................... 22
5.3.4 Add Record To List ..................................................................................................................................... 24
5.3.5 Condition .................................................................................................................................................... 25
5.3.6 Create List .................................................................................................................................................. 26
5.3.7 Create Persistent Browser ......................................................................................................................... 27
5.3.8 Diff .............................................................................................................................................................. 28
5.3.9 ExecuteJS ................................................................................................................................................... 30
5.3.10
Expression.............................................................................................................................................. 31
5.3.11
Extractor ................................................................................................................................................. 32
ITPilot 4.6
Developer Guide
Fetch ...................................................................................................................................................... 33
5.3.12
5.3.13
Filter ....................................................................................................................................................... 35
5.3.14
Form Iterator .......................................................................................................................................... 36
5.3.15
Get Page ................................................................................................................................................ 40
5.3.16
Init .......................................................................................................................................................... 41
5.3.17
Iterator ................................................................................................................................................... 45
5.3.18
JDBCExtractor ........................................................................................................................................ 46
5.3.19
Loop ....................................................................................................................................................... 48
5.3.20
Next Interval Iterator ............................................................................................................................. 49
5.3.21
Output .................................................................................................................................................... 51
5.3.22
Record Constructor ................................................................................................................................ 52
5.3.23
Record Sequence or Extractor Sequence .............................................................................................. 53
5.3.24
Release Persistent Browser .................................................................................................................. 54
5.3.25
Repeat.................................................................................................................................................... 55
5.3.26
Script...................................................................................................................................................... 56
5.3.27
Sequence ............................................................................................................................................... 57
5.3.28
Store File ................................................................................................................................................ 59
5.3.29
Thread .................................................................................................................................................... 60
5.4 USE OF CUSTOM COMPONENTS IN JAVASCRIPT WRAPPERS......................................................... 61
5.4.1 Developing Custom Components ............................................................................................................... 61
5.4.2 Using Custom Components ........................................................................................................................ 62
5.5 WRAPPER DEVELOPMENT ........................................................................................................................ 62
REFERENCES .................................................................................................................................................................. 63
ITPilot 4.6
Developer Guide
FIGURES
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Example of query execution to a wrapper .............................................................................................. 13
ITPilot Custom Function Sample ............................................................................................................. 17
ITPilot Wrapper Skeleton in JavaScript.................................................................................................. 18
Using the ExecuteJS NSEQL command .................................................................................................. 30
Using threads in the Iterator component ................................................................................................ 45
Using the Loop function .......................................................................................................................... 48
Using the Repeat function ...................................................................................................................... 55
Using custom components from JavaScript ........................................................................................... 62
ITPilot 4.6
Developer Guide
PREFACE
SCOPE
Denodo ITPilot enables easy access to and extraction of data from semi-structured Web data sources. This document
is an introduction to application development using wrappers created by Denodo ITPilot.
WHO SHOULD USE THIS DOCUMENT
This document is aimed at developers that want to gain an insight into how applications are developed that make
best use of the advanced automation and Web data extraction functionalities provided by Denodo ITPilot. The exact
detailed information required to install the system and manage is provided in other manuals, to which reference will
be made as the need arises.
SUMMARY OF CONTENTS
More specifically, this document:
•
Presents the fundamental steps needed to develop an application that uses the wrappers generated by
Denodo ITPilot.
•
Describes the task of exporting and deploying a wrapper as a Web Service
•
Gives a detailed description of how to use the development API offered by Denodo ITPilot.
•
Provides an example of how to develop an application that uses a wrapper installed in a Denodo ITPilot
execution server.
•
Details how to create custom ITPilot functions.
•
Explains how to develop wrappers by using the ITPilot JavaScript components.
Preface
i
ITPilot 4.6
1
Developer Guide
INTRODUCTION
Denodo ITPilot is a Denodo Technologies solution that enables to extract and structure the data present in Web
sources. This process is carried out by constructing an abstraction of the target Web source called a “wrapper” that
frees the client applications of the difficulties associated with accessing and extracting the required data.
ITPilot provides a distributed and scalable environment for generating, executing and maintaining “wrappers”. See
[USER] and [GENER] for more information on how to create, install and maintain wrappers using Denodo ITPilot.
This manual describes the JAVA development API that allows creating clients that use wrappers that have already
been generated and installed. The basic guidelines for using the API are given, the main components are described
and some examples of use are provided. See Javadoc documentation [JDOC] for more details on classes, attributes
and operations.
Besides, this manual explains how to access wrappers through Web Services exported in the execution environment.
Introduction
2
ITPilot 4.6
2
Developer Guide
DEPLOYING AND INVOKING ITPILOT WRAPPER ACCESS WEB SERVICES
The wrappers saved in the execution server can be invoked in two different ways. Firstly, the native ITPilot Java API
can be used to access the wrappers, obtain their data structure and run queries on them from a Java application.
Their description can be found in section 3. Another option is to expose these wrappers through Web Services. This
latter option is described in this section.
A Web Service containing the following operations can be generated for a particular wrapper:
•
An operation containing all searchable and compulsory parameters.
•
Optionally, another operation with all searchable and compulsory parameters plus any searchable and
optional parameters selected in the Web Service generation process (this process is defined in [USER]).
The ITPilot execution server generates a Web Service as a .war file that can be deployed in any J2EE application
server.
2.1
WEB SERVICE TYPES
ITPilot allows one wrapper to be published as a Web Service to enable use by any external application. The ITPilot
execution server generates a Web Service as a .war file that can be deployed in any J2EE application server. The
types of Web services that ITPilot can publish are:
• SOAP [SOAP] Web Services.
• REST-style Web Services that use HTTP directly as the transport protocol and return data encoded in XML.
• HTML Web Services. Similar to the REST-style Web services, but the output consists of an HTML table
containing the response data for the query executed. The table includes JavaScript code to sort the results
by any field and/or paginate the returned results. It is also possible to adjust the size of the table and the
cells and to modify its graphic appearance using a CSS file.
The following section describes the querying process for these Web Services.
2.2
INVOKING SOAP WEB SERVICES
The SOAP version of the published Web Services can be accessed by using any Web Service client or client
generator that meets SOAP/1.2 [SOAP] and WSDL 1.1 [WSDL] standards, such as the Apache Axis wsdl2java [AXIS]
or NET Framework .wsdl [DOTNET] tools. The WSDL from which the clients are generated can be obtained either
from the local file created by ITPilot or through the access URL to the Web Service WSDL,
http://<domain>:<port>/<service_name>/services/<service_name>?wsdl.
ITPilot distribution in the samples/itpilot/itp-clients directory contains a sample client generated
using Apache Axis. The README file residing in this path contains detailed information on how to generate, compile
and run the files comprising the client application.
2.3
INVOKING THE EXPORTED REST AND HTML WEB SERVICES
This section describes how to invoke the REST and HTML versions of the Web Services that have been published by
DataPort, once they have been deployed in the Web Service container.
Once the .war file has been deployed in the J2EE application server, the relative paths /rest and /html of the
webapp show an information screen of the respective Web service version which shows the available operations.
Deploying and Invoking ITPilot Wrapper Access Web Services
3
ITPilot 4.6
Developer Guide
Example: if the Web service container is running on port 9090 of the acme host, and the name chosen for the
exported web service was testWS, the access URL for the information page in the REST (XML output) and HTML
versions would be:
http://acme:9090/testWS/rest
http://acme:9090/testWS/html
For each operation, the input and output parameters are shown. For the REST version, a link to the .xsd file which
describes the schema of the XML document which will return the call of each operation is also shown.
To access the XML Schema of the data returned by invoking an operation of the REST version of the Web service, the
following URL format should be used:
http://host:port/serviceName/rest/opName/xsd
Example: again, if the Web service container runs on port 9090 of the acme host, and the name chosen for the
exported web service was testWS, the following URL will obtain the XML Schema of the data returned by the
operation getPRODUCTDATA:
http://acme:9090/testWS/rest/getPRODUCTDATA/xsd
The format used to invoke a specific operation in the REST version is the following:
http://host:port/serviceName/rest/opName?paramName1=value1&&...&
paramNamn=valuen
, where n is the number of parameters of the operation. The format for the HTML version is the same but replacing
‘rest’ by ‘html’.
Example: the Web service container runs on port 9090 of the acme host, and the name chosen for the exported
web service was testWS. Let us also suppose that the service has an operation called getPRODUCTDATA that
requires no parameters. The operation can be invoked as follows in, respectively, the REST and HTML Web service
versions:
http://acme:9090/testWS/rest/getPRODUCTDATA
http://acme:9090/testWS/html/getPRODUCTDATA
If the operation to be invoked is getPRODUCTDATABYPRODID, which requires one input parameter called
prod_id, the results when this parameter has a value equals to 1 would be obtained by writing:
http://acme:9090/testWS/rest/getPRODUCTDATABYPRODID?prod_id=1
http://acme:9090/testWS/html/getPRODUCTDATABYPRODID?prod_id=1
2.3.1
HTML Output Configuration
The HTML version of the Web Services published may be invoked with certain additional parameters to configure the
HTML table used to display the results of the queries. The additional parameters are as follows:
•
shownumresults. If this parameter is indicated with the true value, the table will display
information on the number of results obtained by the wrapper.
Deploying and Invoking ITPilot Wrapper Access Web Services
4
ITPilot 4.6
•
•
•
•
•
•
Developer Guide
intervalsize. If this parameter is indicated, the results obtained by the wrapper will be displayed
paginated. The value of the parameter indicates the number of results to be displayed in each interval.
maxresults. This indicates a maximum number of results to be displayed. If the wrapper run returns
more results than those indicated, all excess results will be rejected.
cellwidth. Maximum cell width expressed in number of characters. The width of each cell in the table
will be adapted to the text, except where the size indicated in this parameter is exceeded. In this case,
carriage returns will be added to divide the text into lines.
cellheight. Maximum number of lines in a cell after having divided the text according to the
cellwidth parameter value. If this is exceeded, all the cells of this column are given a scroll bar.
width. This specifies the maximum width (in pixels) of the table. If the size is exceeded, a scroll bar is
added.
height. This specifies the maximum height (in pixels) of the table. If the size is exceeded, a scroll bar is
added.
These parameters must be indicated in the part of the URL corresponding to the access path (before the query
parameters) in the following format:
http://host:port/serviceName/html/opName/paramName1/value1/.../p
aramNamen/valuen
For example, the following expression invokes the getPRODUCTDATA operation, limiting the number of results
displayed to 50 and setting a maximum pagination interval size equal to 10. Once again, it is presumed that the Web
service container be run in the 9090 port of the acme machine:
http://acme:9090/testWS/html/getPRODUCTDATA/maxresults/50/intervalsize
/10/
2.4
CONFIGURING CONNECTIONS IN THE PUBLISHED WEB SERVICES
When the Web Service operations have been exported, there are some parameters that can used to configure the
connection pool used by the Web Services to connect to the ITPilot server. The web.xml file, that can be found in
the path WEB-INF/ of the exported web service (either inside of the .war file generated by ITPilot, or from the
directory where the Web Service has been deployed) has three parameters used to configure the connection pool:
1.
poolEnabled: this parameter is used to enable or disable the connection pool. The possible values
are “true” or “false”.
<env-entry>
<env-entry-name>poolEnabled</env-entry-name>
<env-entry-value>false</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
2.
poolInitSize defines the initial size of the connections pool.
<env-entry>
<env-entry-name>poolInitSize</env-entry-name>
<env-entry-value>0</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
Deploying and Invoking ITPilot Wrapper Access Web Services
5
ITPilot 4.6
3.
Developer Guide
poolMaxActive defines the maximum number of active connections in the pool; when the number of
connections exceeds this parameter value, new requests will be queued until a free connection is
established.
<env-entry>
<env-entry-name>poolMaxActive</env-entry-name>
<env-entry-value>30</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
Deploying and Invoking ITPilot Wrapper Access Web Services
6
ITPilot 4.6
3
Developer Guide
ITPILOT DEVELOPMENT API
Denodo ITPilot incorporates a JAVA API for developing applications using the wrappers created with it.
Amongst other functions, this API facilitates connection to a Denodo ITPilot execution server, obtaining a reference to
a wrapper installed in said server and querying it. It also allows a series of additional tasks like obtaining the list of
wrappers installed in the server or activating automatic maintenance of a specific wrapper.
The first step in using the API is to connect to a Denodo ITPilot execution server. This is done by constructing an
instance of the class com.denodo.itpilot.client.HTMLWrapperServerProxy. Amongst other
tasks, said instance will allow to obtain a list of the available wrappers in the server, as well as a reference to a
specific wrapper, represented through an instance of the class HTMLWrapperProxy.
Said instance may be used to carry out various tasks on the wrapper, the most important of which is query execution.
When a query is invoked on the wrapper, the results are returned to the application in an asynchronous manner (i.e.
the first results of the query will be accessible to the application as they are obtained from the source, without
having to wait for all the results to be received).
The following subsections deal in more detail with each of the stages mentioned: connection to the server, obtaining
references to wrappers, executing actions on them and query processing. An exhaustive description of the API on a
programming level can be found in the Javadoc documentation [JDOC].
3.1
CONNECTING TO THE SERVER
There are two ways in which a connection to the ITPilot execution server can be added depending on whether
Denodo Virtual DataPort [DPORT] is installed in the same location as ITPilot.
If Denodo ITPilot has been installed separately, then the default server connection mode should be used (constructor
HTMLWrapperServerProxy(String host, int port)) indicating the machine and port in which
the server is executed.
If Denodo ITPilot is installed jointly with Denodo Virtual DataPort, then DataPort will be used as an execution server
for ITPilot. In this case, it is possible to specify any database created in the Virtual DataPort server in the connection
to the server and use any user defined in it. The actions allowed for the user will be coherent with the permissions
assigned to said user in the DataPort server for the specified database (see [DPORT] for more information on the
structure of databases, permissions and users of Denodo Virtual DataPort).
In this case the constructor HTMLWrapperServerProxy(String host, int port, String
dbName, String login, String password) may be used. In this constructor, in addition to the
machine and port in which the server is executed, the name of the database of the Virtual DataPort server to which
the connection is to be made should be specified as well as the user ID with which access is to be made and the
associated password.
It is important to highlight that, even if Virtual DataPort is installed, it is equally possible to access the server using
the default mode (constructor HTMLWrapperServerProxy(String host, int port)). In this
case, a default database called ‘itpilot’ will be accessed. The predefined user ‘admin’ (with the initial password
‘admin’) will be used to gain access.
ITPilot Development API
7
ITPilot 4.6
3.2
Developer Guide
OBTAINING WRAPPERS
As mentioned in the preceding section, connection to the execution server consists of creating an instance of the
class com.denodo.itpilot.client.HTMLWrapperServerProxy. This class incorporates
methods for obtaining data on the execution server and accessing wrappers present in it:
•
•
•
•
•
•
3.3
Collection getHTMLWrapperNames(). Obtains a collection with the name of the wrappers
present in the execution server. Note that if Virtual DataPort is being used as execution server, the
connection will have been made to a Virtual DataPort database and only those wrappers associated with
said database will be obtained.
HTMLWrapperProxy getHTMLWrapper(String wpName). Obtains a reference to the
wrapper of the name specified as parameter.
Collection getDatabaseNames(). This method can only be invoked by users with
administration rights in Virtual DataPort. It returns a collection with the name of the databases that exist in
the server.
void deleteWrapper(String wpName). Deletes the wrapper which name is specified as
parameter from the Server.
void loadWrapper(String vql). Takes as input argument the VQL that defines a collection
of wrappers, that are loaded in the execution server.
String getVQL(). Returns the VQL description of all wrappers in the ITPilot execution server.
USING WRAPPERS
Once a reference to a wrapper has been obtained (instance of the class HTMLWrapperProxy) various
operations can be carried out on it, through the methods of said class.
To execute a query to a wrapper we will use the method:
HTMLWrapperResultIterator query(Map params).
The query to be executed is represented as a map of pairs name of attribute/value. The attribute names must match
the names of the input parameters specified during the creation of the wrapper.
The values must be specified as character strings, even when the input parameters expected by the wrapper belong
to other type. For example, if a wrapper is expecting a float-type parameter, and we want to assign the value
3.25 when invoking it, we must pass the “3.25” string. . In the case of float, double and date data types,
it is important to make sure that the values are provided according to the internationalization configuration specified
in the wrapper Init component or, in case of date data types, the date pattern if it was set.
It is important to take into account that for the query to execute correctly, a value must be specified for all the
mandatory attributes. See [GENER] for more information on the process of generating wrappers in ITPilot.
Although most of the applications will not require this, a wrapper schema can be obtained using the method:
HTMLWrapperMetaRegisterRawVO getSchema()
This method returns the schema of the results returned by the wrapper and the characteristics of the atomic fields
that form part of said schema. The schema was defined during the generation of the wrapper (see [GENER]).
The results returned by a wrapper follow a hierarchical structure. Each output tuple contains a value for every
attribute contained in the wrapper response. Each attribute may be either atomic or compound. The value of atomic
attributes can be of any of the basic data types available in ITPilot: int, long, float, double, text, date,
ITPilot Development API
8
ITPilot 4.6
Developer Guide
Boolean or blob. The value of a compound attribute is always an array of registers. In the same form, each
register will be composed of several fields and, again, these fields may be either atomic or compound.
For example, a wrapper that returns data on movies may have a schema in which each result is comprised of the
fields TITLE, DIRECTOR and EDITIONS. TITLE and DIRECTOR are atomic fields and EDITIONS is a compound field
containing data on various editions available of the movie (DVD, VHS, director’s cut, etc.). The value of EDITIONS is
an array of registers, where each register contains the fields FORMAT, PRICE and DESCRIPTION, all of which are
atomic.
The invocation to getSchema() returns an instance of the class HTMLWrapperMetaRegisterRawVO,
which represents the schema of a “hierarchical” register of the type described above. See the Javadoc
documentation for a detailed description of the methods provided by HTMLWrapperMetaRegisterRawVO.
It is also possible to access the characteristics of the various atomic fields that comprise the schema. Information
about these atomic fields is represented as instances of the class HTMLWrapperMetaSimpleRawVO.
Specifically, the following information can be obtained from an atomic field: its type, by using the method
java.lang.Class getType(), whether the value is obtained from the source or not (that is, to know if it
is a searchable field that can not be found in the output schema, using the method boolean
isSearchStatus()) and, in that case, whether it is mandatory or not (method boolean
isMandatoryStatus()). Furthermore, if they have been defined during the generation process, it is also
possible to obtain the regular expression (method java.lang.String getRegexp()) and the aliases
defined for each field (method java.util.List getTextValues()).
Finally, the methods:
void setMaintenance(boolean value)
void setMaintenance(boolean maintenance, boolean regenerate,
boolean autodeploy)
allows setting via API whether a wrapper should be automatically maintained or not by ITPilot automatic
maintenance server. The regenerate parameter indicates if ITPilot should try automatically generating a new
wrapper when a change in the source is detected. The autodeploy parameter indicates if the regenerated
wrapper should be automatically installed in the ITPilot server replacing the old one. If this last parameter is set to
false, then the new wrapper will be stored in the path DENODO_HOME/metadata/maintenanceregenerations.
The
replaced
versions
of
the
wrapper
are
stored
in
the
DENODO_HOME/metadata/maintenance-backup path (the replacement date is added to the name of
the wrapper to generate the file name).
If the first method is used (without the regenerate and autodeploy parameters), the wrapper will be
regenerated and auto-deployed in the ITPilot server. See [USER] for more information about the automatic
maintenance process in ITPilot.
3.4
PROCESSING QUERY RESULTS
The query method for executing queries to a wrapper returns as a result an instance of the class
com.denodo.itpilot.client.HTMLWrapperResultIterator. This class (which implements
the interface java.util.Iterator) provides asynchronous access to the results of the query made.
Results being accessed in an asynchronous manner means that the server will return results of the query as they are
obtained from the source (it is important to remember that the wrapper obtains the data from the source in real time
through the network).
ITPilot Development API
9
ITPilot 4.6
Developer Guide
The method hasNext() allows to check if there are still elements to return. Due to the asynchronous behavior of
this case, this method must be used before accessing each element, to make sure that data elements are available.
The method next() of HTMLWrapperResultIterator obtains the next result. In this case, each result is
an instance of the class
com.denodo.vdb.vdbinterface.client.printer.standard.StandardRowVO.
The value associated with each field will be obtained by invoking the method:
com.denodo.vdb.vdbinterface.common.clientResult.vo.sentences.ValueVO
getValue (String fieldname)
, where fieldname is the name of the desired field.
The method next() will throw an exception of type NoSuchElementException if there are no available
data at that moment, even if the wrapper still has results to return. Thus the necessity of using the method
hasNext().
As mentioned in the preceding section, the value of a field can be atomic or compound. If it is atomic, the instance of
ValueVO belongs to the subclass SimpleVO. SimpleVO is an abstract class which subclasses are related to
the basic types available in ITPilot: TextVO, IntVO, LongVO, FloatVO, DoubleVO, DateVO,
BooleanVO, BlobVO. The subclasses IntVO, LongVO, FloatVO, DoubleVO and BooleanVO
provide a method getXXX (where XXX represents the name of the data type) to access their values. For example,
IntVO provides the method:
java.lang.Integer getInt()
In the case of BlobVO, the following method is provided:
java.lang.Byte[] getBytes()
In the case of DateVO, this is the method:
long getTime()
In addition, the SimpleVO superclass provides a representation of the value as a character string accessible
through the getValue() method. See Javadoc documentation for detail [JDOC].
If the value is compound, the instance of ValueVO represents an array of registers (subclass ArrayVO). Using
its method getValues() a list of the registers it contains can be obtained (instances of the subclass
RegisterVO). See the Javadoc documentation to see more detailed information on the methods and properties
of the class ValueVO and its subclasses.
Another important aspect of processing queries is dealing with any errors that may arise (e.g. error connecting to the
data source). There are two methods for this of the class HTMLWrapperResultIterator:
•
•
Boolean checkErrors(). Allows you to check if an error has occurred during query execution.
Returns ‘true’ if an error has occurred and ‘false’ if not.
String getErrorDescription(). Where errors have occurred, this allows you to obtain a
textual description of it. Otherwise, it returns null. The custom error messages specified by the wrapper
creator for the ‘raise error handler (see [GENER]) in the Wrapper Generator Tool are accessed through this
method.
ITPilot Development API
10
ITPilot 4.6
3.4.1
Developer Guide
Canceling Queries
The following method from the class HTMLWrapperResultIterator can be used to cancel the execution
of an ongoing query:
void cancel()
3.5
EXAMPLE OF USE
This section shows a simple example of how to use the API.
The application starts connecting to an execution server installed in the ‘acme’ machine in port 9999. Next, a
reference to the wrapper called “Movies” is obtained whose schema is the same used as an example in the
preceding section:
{TITLE, DIRECTOR, EDITIONS {FORMAT, PRICE, DESCRIPTION}},
where TITLE and DIRECTOR are optional search fields.
Then, a query is issued to the wrapper using the input parameter DIRECTOR with the value “Woody Allen”, and the
results are processed and shown in the standard output.
To process the results, the hierarchical structure of ValueVO elements is navigated. First, the objects SimpleVO
are obtained that represent the atomic fields TITLE and DIRECTOR. Then the compound field EDITIONS, which is
represented by an object ArrayVO that contains an object RegisterVO for each edition of the film. Each of
these registers contains the atomic fields FORMAT, PRICE and DESCRIPTION. All atomic fields are of the type text
except the field PRICE, which is a double.
Finally, any possible errors produced during execution are checked.
ITPilot Development API
11
ITPilot 4.6
Developer Guide
package com.denodo.itpilot.client;
import java.util.List;
import java.util.HashMap;
import java.util.Map;
import java.util.Iterator;
import com.denodo.vdb.vdbinterface.common.clientResult.vo.sentences.ValueVO;
import com.denodo.vdb.vdbinterface.common.clientResult.vo.sentences.SimpleVO;
import com.denodo.vdb.vdbinterface.common.clientResult.vo.sentences.ArrayVO;
import
com.denodo.vdb.vdbinterface.common.clientResult.vo.sentences.RegisterVO;
import com.denodo.vdb.vdbinterface.client.printer.standard.StandardRowVO;
public class ITPilotExample {
public static void main(String args[]) {
try {
// Connect to server
HTMLWrapperServerProxy server = new HTMLWrapperServerProxy
("acme",9999);
// Get Wrapper
HTMLWrapperProxy wrapper = server.getHTMLWrapper("Movies");
// Prepare query params
Map queryParams = new HashMap ();
queryParams.put ("DIRECTOR","Woody Allen");
// Execute query
HTMLWrapperResultIterator results = wrapper.query(queryParams);
// Iterate results
int numOfTuples = 0;
while (results.hasNext()) {
numOfTuples++;
StandardRowVO tuple = (StandardRowVO) results.next();
/* Process each tuple */
System.out.print(numOfTuples + ". ");
//Get and print atomic fields: TITLE, DIRECTOR
SimpleVO titleVO = (SimpleVO)tuple.getValue("TITLE");
String title = (String)titleVO.getValue();
System.out.println("TITLE:"+ title);
ITPilot Development API
12
ITPilot 4.6
Developer Guide
SimpleVO directorVO = (SimpleVO)tuple.getValue("DIRECTOR");
String director = (String)directorVO.getValue();
System.out.println("DIRECTOR:" + director);
// Get EDITIONS array
ArrayVO editionsVO = (ArrayVO)tuple.getValue("EDITIONS");
// Iterate over EDITION registers
int numEditions=0;
Iterator editions = editionsVO.getValues().iterator();
while (editions.hasNext()) {
numEditions++;
System.out.println("EDITION: " + numEditions);
RegisterVO editionVO = (RegisterVO)editions.next();
Map edition = editionVO.getValues();
SimpleVO formatVO = (SimpleVO)editionVO.get("FORMAT");
String format = (String)formatVO.getValue();
System.out.println("\t FORMAT:" + format);
DoubleVO priceVO = (DoubleVO)editionVO.getValue("PRICE");
Double price = priceVO.getDouble();
System.out.println("\t PRICE:" + price);
SimpleVO
descriptionVO=(SimpleVO)editionVO.getValue("DESCRIPTION");
String description = (String)descriptionVO.getValue();
System.out.println("\tDESCRIPTION:" + description);
}
System.out.println("");
}
// Check errors
if (results.checkErrors())
System.out.println("Error: " + results.getErrorDescription());
} catch(Exception e) {
System.err.println("Error trying to access server ... ");
} finally {
// ...
}
}
}
Figure 1 Example of query execution to a wrapper
ITPilot Development API
13
ITPilot 4.6
4
Developer Guide
CREATING CUSTOM ITPILOT FUNCTIONS
Custom functions let users extend the set of functions available in ITPilot.
Custom functions are Java classes included in a Jar file that are added to ITPilot so they can be used in the same
way as other functions such as MAX, MIN, SUM, etc.
Denodo4E, an Eclipse plug-in which provides tools for creating, debugging and deploying Denodo extensions,
including custom ITPilot functions, is included in the Denodo Platform. Please read the README in
$DENODO_HOME/tools/denodo4e for more information.
Each function must be in a different Java class, but it is possible to group them in a single Jar.
We recommend developing custom functions using Java annotations, although it is also possible to do it using name
conventions.
Although custom functions can be created without dependencies on Denodo libraries, the use of Java annotations is
recommended. The annotations and compound types and values required to create custom functions are located in:
$DENODO_HOME/lib/contrib/denodo-custom.jar
These are the rules that every custom function must follow to work properly:
• Functions with the same name are not allowed. If a jar contains one or more function with name conflicts,
nothing in that jar will be loaded in the server.
• All custom functions stored in the same jar are added or removed together by uploading/removing the jar in
the server.
• Each function can have many signatures. Each signature is defined by an execution method in the Java
class defining the custom function.
• Functions can have arity n but only the last parameter of the signature can be repeated n times.
A custom function is defined in a Java class containing all its implementation; the name of the function will be
extracted from that Java class. A function can contain several signatures: different combinations of arguments
(different number, types or both). For each signature of the function, this class must define a Java method
implementing the functionality of the function with those arguments, and one additional method in case the
signature returns a different type depending on the parameters or the return type is compound (array or register).
When defining custom functions simple types are mapped directly from Java objects to Virtual DataPort data objects.
The following table shows how the mapping works and which Java types can be used:
Java
java.lang.Integer
java.lang.Long
java.lang.Float
java.lang.Double
java.lang.Boolean
java.lang.String
java.util.Calendar
byte[]
ITPilot
int
long
float
double
boolean
text
date
binary
Equivalency between Java and ITPilot data types
Note: The parameters of a custom functions cannot be basic types: int, long, double, etc.
Creating Custom ITPilot functions
14
ITPilot 4.6
4.1
Developer Guide
NAMING CONVENTIONS AND ANNOTATIONS
The following naming conventions allow the definition of some custom functions without the need of Java
annotations, even if it is recommended to use them. All the names used in the naming conventions are case
sensitive.
To make a Java class to recognizable as a custom function without Java annotations, its name must match the
following pattern:
• <FunctionName> + “ItpFunction”
This way, a Java class named Concat_SampleItpFunction will be interpreted as a function named Concat_Sample.
All Java methods implementing the function signatures must have the name execute. The signature associated with
each method will be extracted from the Java method parameters. For example a class named
Concat_SampleItpFunction with a method execute(valueA:String, valueB:String):String will generate the function
signature CONCAT_SAMPLE(arg1:text, arg2:text).
To define a parameter with arity n in a custom function, the last parameter has to be an array. E.g., the class
Concat_SampleItpFunction with a method declared as public String execute(String … inputs).
Custom functions which return type depends on the type of their input parameters or return an array or register, can
define an additional method with equivalent signature to the one of execute. This additional method must be named
executeReturnType. The definition of this method is optional. If it is not present, the execute method will be called
and the return type will be obtained from the results of the execution. The advantage of defining the method
executeReturnType is that in some cases calculating the return type is much less complex and time consuming than
actually executing the function, thus by providing this method the performance is improved.
Naming conventions only cover a subset of all the possible custom functions. In order to prevent the limitations using
naming conventions it is recommended to use the Java annotations provided by Denodo in the jar file
$DENODO_HOME/lib/contrib./denodo-custom.jar. These annotations are:
• com.denodo.common.custom.annotations.CustomElement. Class annotation used to define the
class as a custom function. The annotation requires the parameters
•
name: name of the custom function.
•
type: In ITPilot it must be CustomElementType.ITPFUNCTION.
• com.denodo.common.custom.annotations.CustomExecutor. Method annotation used to specify the
method as a function signature. This method will be executed when using the function with the appropriate
arguments. The annotation has an optional variable syntax in order to specify the syntax of the function
signature when presenting it to the user at the Wrapper Generation Tool.
• com.denodo.common.custom.annotations.CustomExecutorReturnType. Method annotation used
to specify the method as the one used to compute the return type of a function signature before executing a
query.
• com.denodo.common.custom.annotations.CustomParam. Parameter annotation with the parameter
name, used to make more user friendly the auto generated syntax description of the signature. If this annotation
is not used, the syntax will use the names arg1, arg2, etc. to represent the input parameters.
4.2
COMPOUND TYPES
Compound types and values in the custom functions are defined by the following Java classes:
Creating Custom ITPilot functions
15
ITPilot 4.6
Developer Guide
• com.denodo.common.custom.elements.CustomRecordType. Class representing a register data type.
It stores the type name and a set of name-type pairs where the name is a string and the type is either a
java.lang.Class of some of the Java classes used for simple types, or a Denodo compound type
(CustomRecordType or CustomArrayType).
• com.denodo.common.custom.elements.CustomRecordValue. Class representing a register data
value. It stores a set of name-value pairs where the name is a string and the value is either an instance of a
simple type (java.lang.String, java.lang.Integer, etc.), or another compound value (CustomRecordValue or
CustomArrayValue).
com.denodo.common.custom.elements.CustomArrayType. Class representing an array data type. It
•
stores the type name and an instance of CustomRecordType, that defines the type of the elements of the array.
• com.denodo.common.custom.elements.CustomArrayValue. Class representing an array value. It
stores a list of CustomRecordValue instances.
• com.denodo.common.custom.elements.CustomElementsUtil. Helper class with methods to
instantiate compound types and values, if needed.
4.3
PAGE TYPE
ITPilot custom functions can also receive a PageValue object in their arguments. The type of this object is
com.denodo.common.custom.elements.CustomPageValue and it contains the URL of the last page, method
and POST parameters and the page cookies.
4.4
CUSTOM FUNCTION RETURN TYPE
As explained before, custom functions which return type depends on input values or functions returning compound
types can implement an additional method in order to compute the return type without executing the function. This is
entirely optional, but it provides better performance when the execution of the function is slower or more memory
intensive than the return type calculation.
This additional method must follow a few rules:
1. When the execute method returns a non-constant compound type (a record whose fields -number of
fields, and their names and/or types- depend on the input parameters) or a java.lang.Object then the
additional method must be implemented. In other situations it is optional (the return type is obtained from the
method directly).
2. The execution method must have the same number of parameters as the additional method.
3. Each parameter of the additional method must have the same or equivalent type, as its respective
parameter in the execute method:
If the execute method returns a basic Java type, the additional method has to return the same basic Java
class.
I.e. If the execute method returns a String object, the additional method has to return
java.lang.String.class.
If the execute method returns a CustomRecordValue object, the additional method has to return a
CustomRecordType object.
If the execute method returns a CustomArrayValue object, the additional method has to return a
CustomArrayType object.
See table ‘Equivalency between Java and ITPilot data types’ at the beginning of section 4 to know the type that
these return parameters will have in ITPilot.
Creating Custom ITPilot functions
16
ITPilot 4.6
4.5
Developer Guide
EXAMPLE
Example of a function with annotations, that returns an array: SPLIT which splits strings around matches of a given
regular expression and returns the array of these substrings.
import com.denodo.common.custom.annotations.*;
import com.denodo.common.custom.elements.*;
import java.util.*;
@CustomElement(type=CustomElementType.ITPFUNCTION, name="SPLIT_SAMPLE")
public class Split {
private static final String STRING_FIELD = "string";
@CustomExecutor()
public CustomArrayValue split_sample(@CustomParam(name="regexp")String
regex, @CustomParam(name="valuer")String value) {
if(value == null || regex == null) {
return null;
}
String []result = value.split(regex);
LinkedHashMap<String,
Object>
results
=
new
LinkedHashMap<String, Object>(1);
List<CustomRecordValue>
arrayValues
=
new
ArrayList<CustomRecordValue>(result.length);
for (String string : result) {
results.put(STRING_FIELD, string);
CustomRecordValue
recordValue
=
CustomElementsUtil.createCustomRecordValue(results);
arrayValues.add(recordValue);
}
return CustomElementsUtil.createCustomArrayValue(arrayValues);
}
@CustomExecutorReturnType
public CustomArrayType split_sampleReturnType(String regex, String
value) {
LinkedHashMap<String, Object> props = new LinkedHashMap<String,
Object>();
props.put(STRING_FIELD, String.class);
CustomRecordType
record
=
CustomElementsUtil.createCustomRecordType(props);
CustomArrayType
array
=
CustomElementsUtil.createCustomArrayType(record);
return array;
}
}
Figure 2 ITPilot Custom Function Sample
Creating Custom ITPilot functions
17
ITPilot 4.6
5
5.1
Developer Guide
DEVELOPING ITPILOT WRAPPERS WITH JAVASCRIPT
INTRODUCTION
Although Denodo provides a graphical component-based wrapper generation tool that enables the creation of
wrapper programs to access semi-structured sources (web, Adobe PDF or Microsoft Word) with no need for
development, ITPilot allows the user to generate his/her own wrappers in a complete manner by means of the
JavaScript programming language.
The JavaScript version supported by Denodo ITPilot is 1.5, which is compliant with the ECMA 3.0 standard
[ECMA262]. The following sections assume some previous basic knowledge of the JavaScript language.
Section 5.2 will introduce the JavaScript representation format of the ITPilot wrappers. This will allow to understand
how to interact in a wrapper with the predefined ITPilot components in section 5.3, and how to develop complete
JavaScript wrappers by following the indications shown in section 5.4.1.
5.2
REPRESENTATION FORMAT OF A WRAPPER
An ITPilot wrapper is structured in JavaScript as it is shown in Figure 3.
function getInit() {
var start = new Init();
start.setText("INITPARAM", OBLIGATORY);
return start;
}
function getOutputSchema() {
var structureOutput = new Record_Structure("OUT_REC");
structureOutput.setText("ATTRIBUTE_1");
structureOutput.setText("ATTRIBUTE_2");
structureOutput.setText("ATTRIBUTE_3");
return structureOutput;
}
function main() {
...
}
Figure 3 ITPilot Wrapper Skeleton in JavaScript
There are three possible functions in each script, one mandatory and two optional ones:
1. main() function: it is the only mandatory one, and contains the component implementation.
2. getInit() function: this must be used to return the set of searchable parameters.
3. getOutputSchema() function: this function is used to return the structure of the output objects, if they exist 1.
The functions are somehow linked with the definition of the process as components, with the input parameters
defined in the Initialization component, and the output record defined just as it is received by the output component.
1
Since version 4.0SP1, this function, previously known as getMetadata, has been renamed to getOutputSchema.
There is backwards compatibility, but the use of the new name is strongly recommended.
Developing ITPilot Wrappers with JavaScript
18
ITPilot 4.6
5.2.1
Developer Guide
Initialization of Searchable Parameters
This function is used to describe the input parameters of the ITPilot wrapper. In the example, the first line of the
function, var start = new Init(); is the one responsible for creating a new parameter initialization object.
This object is described further on in section 5.3 (the Component Catalog).
5.2.2
Main Function
This is the place where the wrapper business logic is developed. In this function different object instances are
created each of which represents an ITPilot component, either predefined or custom (see [GENER] for more
information about how to create custom component with ITPilot). The published functions for every ITPilot predefined
component are described and explain in section 5.3.
5.2.3
Generating the Output Structure
This is the function that determines, if it exists, which is the wrapper’s output structure. The structure is a data
record implemented by the RecordStructure object, and defined in the section 5.3 catalog.
5.3
PREDEFINED ITPILOT COMPONENT GUIDE
5.3.1
Introduction
This chapter provides the list of pre-defined ITPilot components. Each component is represented as an instantiable
object in JavaScript, with a series of functions that are described and explained below.
NOTE: Some of the parameters used in the described functions can be omitted (by invoking the method with fewer
input arguments). A parameter can not be omitted if the value of another input argument at its right has to be
defined. When a parameter is optional, its default value will be indicated in the function description. For example, for
the object RECORD_STRUCTURE (see section 5.3.2.1):
rs.setText("FIELD") is equivalent to rs.setText("FIELD", ".*", OPTIONAL)
rs.setText("FIELD", OBLIGATORY) is not valid. The following must be used:
rs.setText("FIELD", ".*", OBLIGATORY) ".
5.3.2
Data Structures
ITPilot defines List and Record (a data record defined by the Record Structure object) as data structures. The
following sections will define them.
5.3.2.1
Record Structure
•
Object: Record_Structure
•
Description: This represents a data structure that allows the definition of the structure of a specific
record. This is often used in the getOutputSchema() function of the wrapper (see 5.2.3).
•
Functions:
o
Constructor(name)
•
o
name: name of the structure.
setText(field, regexp, type): creation of a new character string field in the record.
Developing ITPilot Wrappers with JavaScript
19
ITPilot 4.6
o
o
o
o
o
o
o
Developer Guide
•
field: name of the new field.
•
regexp (optional): regular expression of the character string generation. By default, if no
constraint exists, its value is “.*”.
•
type (optional): defines whether the parameter is mandatory or not. By default, it is
assumed that the field is optional.
setLink(field, type): new Link-type field in the record.
•
field: name of the new field.
•
type (optional): defines whether the parameter is mandatory or not. By default, the field
is optional.
setInt(field, type): creation of a new Integer-type field in the record.
•
field: name of the new field.
•
type (optional): defines whether the parameter is mandatory or not. By default, the field
is optional.
setBoolean(field, type): creation of a new boolean-type field in the record.
•
field: name of the new field.
•
type (optional): defines whether the parameter is mandatory or not. By default, the field
is optional
setLong(field, type): creation of a new Long-type field in the record.
•
field: name of the new field.
•
type (optional): defines whether the parameter is mandatory or not. By default, the field
is optional.
setFloat(field, type): this creates a new Float-type field in the record.
•
field: name of the new field.
•
type (optional): defines whether the parameter is mandatory or not. By default, the field
is optional.
setDouble(field, type): creation of a new Double-type field in the record.
•
field: name of the new field.
•
type (optional): defines whether the parameter is mandatory or not. By default, the field
is optional.
setBlob(field, type): creation of a new BLOB-type (Binary Large Object) field in the record.
•
field: name of the new field.
•
type (optional): defines whether the parameter is mandatory or not. By default, the field
is optional.
Developing ITPilot Wrappers with JavaScript
20
ITPilot 4.6
o
o
o
o
Developer Guide
setDate(field, regexp, format, type): creation of a new Date-type field in the record.
•
field: name of the new field.
•
regexp (optional): regular expression of the character string generation. By default, if no
constraint exists, its value is “.*”.
•
format (optional): date format, following [DATEFORMAT]. By default, its value is "dMMM-yyyy H'h' m'm' s's'".
•
type (optional): defines whether the parameter is mandatory or not. By default, the field
is optional.
setRegister(record, type): creation of a new Record-type field in the record.
•
record: record name.
•
type (optional): defines whether the parameter is mandatory or not. By default, the field
is optional.
setArray(name, structure, type): creation of a new Array-type field in the record.
•
name: name of the array.
•
structure: data structure that represents the record structure contained in the array.
•
type (optional): defines whether the parameter is mandatory or not. By default, the field
is optional.
toString(). This transforms the record into a string of characters for their representation.
When a custom component is created (see section 5.4) from an ITPilot wrapper program , a Record Structure is
defined to represent the input values to the custom component.
NOTA: to assign values to the fields of a record, the RECORD_CONSTRUCTOR, as explained in section 5.3.22, must
be used, except in the cases of Text, Integer, Float and Link-type fields, for which specific functions apply.
5.3.2.2
Record List
•
Object: List
•
Functions:
o
setListName(listName): name of the list
•
o
add(obj): addition of an element to the list.
•
o
listName: name of the list.
obj: element to add.
toArray(): transforms the list into a JavaScript object array.
Developing ITPilot Wrappers with JavaScript
21
ITPilot 4.6
5.3.3
Developer Guide
Common functions
Some of these functions are common to all or almost all components and are, therefore, shown in this first section.
The catalog explains the components that do not contain some of the “common” functions.
5.3.3.1
•
onError function
onError(errorId, errorAction). This informs the component of its behavior in the event of any type of error.
The onError function can be invoked several times with different errorId parameter values.
o
o
errorId: This indicates the type of error for which the behavior is to be managed. The possible
values are:
•
RUNTIME_ERROR: error while the component is being run.
•
CONNECTION_ERROR: error that occurs when there is some kind of connection
problem with the Web source.
•
HTTP_ERROR: error produced by an http error.
•
TIMEOUT_ERROR: This error is caused if the Web source takes time in answering. The
waiting time is configurable. Where the wrapper is used in the run environment, this
parameter is configured in the browser pool used (see [USER]). In the generation
environment
in
question,
this
value
is
configured
in
the
ITPAdminConfiguration.properties file available in <DENODO_HOME>/conf/itp-admintool, with the property IEBrowser.MAX_DOWNLOAD_TIME.1 for Internet Explorer,
IEBrowser.MAX_DOWNLOAD_TIME.2
for
Firefox
and
IEBrowser.MAX_DOWNLOAD_TIME.3 for http browser.
•
SEQUENCE_ERROR: error produced when there is a problem with the sequence (the
sequence is not correctly written or some command could not be run, etc.).
errorAction: action to be taken when the error indicated in the previous parameter arises. The
possible values are:
•
ON_ERROR_RAISE: stop wrapper run, indicating the source of the error.
•
ON_ERROR_IGNORE: ignore the error, continuing with the wrapper run. In general, the
components having any kind of return value with return “null” in case there is an error,
except in the following cases: FILTER (5.3.13) and RECORD CONSTRUCTOR (5.3.22). In
the cases of LOOP (5.3.19), REPEAT (5.3.25) and CONDITION (5.3.5), even though they
return “null”, it will be evaluated as “false” if they are used in a condition expression.
•
ON_ERROR_RETRY: rerun the wrapper. The number of retries and time between retries
are configured in each parameter.
•
ON_ERROR_RETRY_IGNORE: rerun the wrapper as with the ON_ERROR_RETRY error
type, but continuing with the wrapper execution in case the error is still happening
after the retries.
Developing ITPilot Wrappers with JavaScript
22
ITPilot 4.6
5.3.3.2
•
Developer Guide
debugLevel function
debugLevel(level): This allows for the trace level to be used when running this component to be indicated.
The possible levels are defined as numbers from 0 to 5, where 0 means that no message will be written to
the log trace, and 5 means that all message types will be written to the log trace file. The log types are the
following:
o
TRACE
o
DEBUG
o
INFO
o
WARN
o
ERROR
o
FATAL
Developing ITPilot Wrappers with JavaScript
23
ITPilot 4.6
5.3.4
Developer Guide
Add Record To List
•
Object: Add_Object_To_List
•
Description: adds a record to a list.
•
Functions:
o
Constructor()
o
exec(record, list): executes the function
•
record: record to be added to the list
•
list: list to which the record is added
Developing ITPilot Wrappers with JavaScript
24
ITPilot 4.6
5.3.5
Developer Guide
Condition
•
Object: Condition
•
Description: allows a condition to be defined. Two output connections determine the process flow,
depending on whether the condition is met or not.
•
Functions:
o
Constructor(expr)
•
o
expr: this parameter defines the condition expression. It is expressed as a string of
characters (e.g. MyCondition = new Condition("($0 <= $1)" indicates that, of the list of
elements passing to the component in the exec function, the value of the first must be
less than or equal to the value of the second). To define the condition expression,
ITPilot provides a set of functions, defined in Appendix A of [GENER].
exec(elements): main function of the Condition component. This carries out the condition
operation, returning “true” or “false”, depending on whether the condition described in the
constructor is met when applied to the input parameter elements.
•
elements: this parameter, which must be in format “[ELEMENT1, ELEMENT2,…,
ELEMENTN]”, determines the elements on which the condition is made.
Developing ITPilot Wrappers with JavaScript
25
ITPilot 4.6
5.3.6
Developer Guide
Create List
•
Object: Create_List
•
Description: creates an empty list.
•
Functions:
o
Constructor(listname): creates an empty list.
•
o
listname: name of the list of records to be created.
exec(): runs the component.
Developing ITPilot Wrappers with JavaScript
26
ITPilot 4.6
5.3.7
Developer Guide
Create Persistent Browser
•
Object: Create_Persistent_Browser
•
Description: creates a persistent browser, that is, a browser that is kept running and active after the
execution of the wrapper that initiated it.
•
Functions
o
Constructor(): creates a persistent browser and returns its handler.
o
exec(): executes the component.
Developing ITPilot Wrappers with JavaScript
27
ITPilot 4.6
5.3.8
Developer Guide
Diff
•
Object: Diff
•
Description: the Diff component allows comparing two pages, returning the differences between them
regarding the retrieved HTML code.
•
Functions:
o
o
o
o
Constructor(additionPrefixLabel, additionSuffixLabel, deletionPrefixLabel, deletionSuffixLabel,
tokenSeparator)
•
additionPrefixLabel: prefix to use when generating the result page for the new content
(by default, green background HTML tag).
•
additionSuffixLabel: suffix to use when generating the result page for the new content
(by default, green background HTML end tag).
•
deletionPrefixLabel: prefix to use when generating the result page for the deleted
content (by default, red background HTML tag).
•
deletionSuffixLabel: prefix to use when generating the result page for the deleted
content (by default, red background HTML end tag).
•
tokenSeparator: indicates the character string used as HTML page element separator
when the result page is generated, so that each one of them can be adequately
identified.
diff (baseCode, finalCode): returns “true” if both pages are identical, “false” if they are different
•
baseCode: character string with the source page content.
•
finalCode: character string or page object with the target page content.
exec (baseCode, finalCode): executes the Diff component, returning a character string that
represents the HTML content of those pages, pointing out the differences between them.
•
baseCode: character string with the source page content.
•
finalCode: character string or page object with the target page content.
setAdditionPrefixLabel (additionPrefixLabel): modifies the additional data starting tag.
•
o
setAdditionSuffixLabel(additionSuffixLabel): modifies the additional data ending tag.
•
o
additionPrefixLabel: prefix to use when generating the result page for new content (by
default, green background HTML tag)
additionSuffixLabel: suffix to use when generating the result page for the new content
(by default, green background HTML end tag)
setDeletionPrefixLabel(deletionPrefixLabel): modifies the deleted data starting tag.
•
deletionPrefixLabel: prefix to use when generating the result page for the deleted
content (by default, red background HTML tag).
Developing ITPilot Wrappers with JavaScript
28
ITPilot 4.6
o
setDeletionSuffixLabel(deletionSuffixLabel): modifies the deleted data ending tag.
•
o
mergedDeletions: “true”: the delete content will be shown. If the value is “false”, the
configuration of the functions setDeletionPrefixLabel and setDeletionSuffixLabel will
not be taken into account.
addTokenReplacement(replacement): allows the addition of a regular expression to a list. These
regular expressions can be applied on HTML tokens of the source pages before comparing them.
•
o
toLowerCase: “true”, transforms all HTML content to lower case. “false” keeps the
content as is.
setShowRemovedContent(mergedDeletions): whether the delete content is shown in the result
page or not.
•
o
simplifyTags: “true” means that the HTML tag attributes will be ignored. With “false”,
they will not be ignored.
setCaseInsensitive (toLowerCase): used to establish whether the capitalization will be taken into
account when comparing the pages.
•
o
nullWhenEquals: “true” implies that “null” will be returned when both pages are equal;
“false” means that the result page will be returned.
setIgnoreTagAttributes(simplifyTags): the component will not take into account the HTML tag
attributes when comparing both pages.
•
o
deletionSuffixLabel: prefix to use when generating the result page for the deleted
content (by default, red background HTML endtag).
setNullWhenEquals(nullWhenEquals): if the result page is identical to any of the two input pages,
the component will return “null” instead of the page itself.
•
o
Developer Guide
replacement: Perl [PERL] regular expression.
addIgnoredToken(regexp): allows the addition of a regular expression to the list. These regular
expressions can be applied on HTML tokens of the page. Those that match the regular expression
will be discarded before starting the comparison.
•
regexp: Perl [PERL] regular expression.
Developing ITPilot Wrappers with JavaScript
29
ITPilot 4.6
5.3.9
Developer Guide
ExecuteJS
•
Description: ITPilot provides a component called ExecuteJS that lets the user execute a JavaScript
expression as part of a navigation sequence. This component is transformed into a Sequence command
(see section 5.3.27) that executes the ExecuteJS NSEQL command (see [NSEQL]).
var Execute_JavaScript_1 = null;
var Execute_JavaScript_1_output = null;
Execute_JavaScript_1 = new SEQUENCE("sequence://ExecuteJS(<JavaScript
code here>);", SEQUENCE_IEBROWSER);
Execute_JavaScript_1.onError(RUNTIME_ERROR, ON_ERROR_RAISE);
Execute_JavaScript_1.onError(CONNECTION_ERROR, ON_ERROR_RAISE);
Execute_JavaScript_1.onError(SEQUENCE_ERROR, ON_ERROR_RAISE);
Execute_JavaScript_1.onError(HTTP_ERROR, ON_ERROR_RAISE);
Execute_JavaScript_1.onError(TIMEOUT_ERROR, ON_ERROR_RAISE);
Execute_JavaScript_1.setRetries(3);
Execute_JavaScript_1.setRetryDelay(3000);
Execute_JavaScript_1_output = Execute_JavaScript_1.exec([]);
Figure 4 Using the ExecuteJS NSEQL command
Developing ITPilot Wrappers with JavaScript
30
ITPilot 4.6
5.3.10
Developer Guide
Expression
•
Object: Expression
•
Description: allows an expression to be defined (based on constants and/or use of functions provided by
ITPilot) that will be assessed at an output value.
•
Functions:
o
Constructor(expression)
•
o
expression: object that defines the condition expression. This object is expressed as a
string of characters (e.g. MyCondition = new CONDITION("($0 <= $1)" indicates that, of
the list of elements passing to the component in the exec method, the value of the first
must be less than or equal to the value of the second). To define the condition
expression, ITPilot provides a set of functions, defined in Appendix A of [GENER].
exec(exprInput): method running the component and returning the value resulting from the
expression indicated in the component constructor.
•
exprInput: list of zero or more values, zero or more records, or zero or more record lists
that are used as part of the expression.
Developing ITPilot Wrappers with JavaScript
31
ITPilot 4.6
5.3.11
Developer Guide
Extractor
•
Object: Extractor
•
Description: this is responsible for extracting structured data from an HTML page, thus generating a
DEXTL program ([DEXTL]).
•
Functions:
o
Constructor(name, page, specification, structure)
•
name: name of the Extractor component instance.
•
page: page-type ITPilot structure from where data is to be extracted.
•
specification: DEXTL data extraction specification (see [DEXTL]).
•
structure: name of the record (previously created) that will be used to return the data
extracted by the specification.
o
exec(): main extractor method running the specification indicated in the constructor. This function
returns a list of records of the type defined in the constructor in the structure parameter.
o
setMergePatterns(merge): This applies the technique of merging patterns for greater system
optimization (see [GENER] for further information).
•
o
merge: Boolean parameter, “true” if the pattern merge technique is to be applied or
“false” if not. This is “true” by default.
setI18n(i18n): Function that updates the process internationalization.
•
i18n: type of internationalization to use. ITPilot provides different types of
internationalization options such as ES_EURO, US_PST, GB, and so on. See
[GENER] for more information about internationalization in ITPilot.
Developing ITPilot Wrappers with JavaScript
32
ITPilot 4.6
5.3.12
Developer Guide
Fetch
•
Object: Fetch
•
Description: this obtains the contents of the URL or page used as the input argument and returns them in
binary or text format.
•
Functions:
o
o
Constructor(url, sequenceType, reusableConnection, binary, page)
•
url: URL where the resource to be downloaded can be found (OPTIONAL).
•
sequenceType: type of pool to use. The possible values are:
•
SEQUENCE_FTP
•
SEQUENCE_LOCAL
binary: “true”: The object is binary, “false”: The object to be downloaded is in text
format.
•
page: Optionally, the page from which the http request is launched can be indicated.
exec(page). This runs the component, returning the string- or binary-type value obtained.
page: Optionally, the page from which the http request is launched can be indicated.
setEncoding(encoding): allows the user to determine the MIME type [MIME] of the information to
send.
encoding: MIME type of the information to send.
syncWithPost(flag): this function lets the user set the method for recovering the page state.
ITPilot will send a POST message to the page URL with the POST parameters that were used to
initially access that page. This is the default synchronization method.
flag: “true” means that this synchronization function must be used. If it is ‘false’, ITPilot
checks whether a back sequence exists or not, defined by the setBackSequence
function; if it does not exist, ITPilot executes a Back() NSEQL command.
setBackSequence(back): this function lets the user optionally set an explicit browse sequence to
the page it comes from which more information extraction operations are going to be executed
against.
•
o
SEQUENCE_HTTP_BROWSER
•
•
o
•
reusableConnection: This indicates whether the connection will be reused (“true”) or
not (“false”). See [GENER] for further information.
•
o
SEQUENCE_IEBROWSER
•
•
o
•
back: back sequence NSEQL program.
setReusingConnection(reusingConnection): this function indicates whether connections will be
reused or not.
Developing ITPilot Wrappers with JavaScript
33
ITPilot 4.6
•
Developer Guide
reusingConnection: if the value is set to “true”, the connection coming from previous
components is reused; if set to ‘false’, a new browser will be launched, importing
information from the previous session.
o
setBackPages(pages): this function determines the number of pages ITPilot can go back when a
Back() NSEQL command is being executed if neither back sequence has been defined, nor has
been defined as a POST navigation.
o
setBrowserType(browserType): this function determines the browser implementation to use in the
component. The accepted values are:
•
0: default browser implementation.
•
1: Internet Explorer browser implementation.
•
2: Firefox browser implementation.
•
3: Denodo HTTP browser implementation.
Developing ITPilot Wrappers with JavaScript
34
ITPilot 4.6
5.3.13
Developer Guide
Filter
•
Object: Filter
•
Description: this carries out a filtering operation from a list of records, returning those meeting a given
condition.
•
Functions:
o
Constructor(expr, auxiliaryRecords)
o
expr: regular expression of the filtering operation for a list of records, which are described in the
exec function.
•
auxiliaryRecords: record list that participates in the filter condition, but which are not
the records to filter.
o
exec(inputRecords, auxiliaryRecords): function receiving a list of records and returning the
subgroup complying with the selection expression indicated in the constructor.
o
inputRecords: list of input records.
•
auxiliaryRecords: record list that participates in the filter condition, but which are not
the records to filter.
NOTE: If the error handler or this component is set to ON_ERROR_IGNORE, FILTER will return the list of filtered
elements, except for the one that caused the error.
Developing ITPilot Wrappers with JavaScript
35
ITPilot 4.6
5.3.14
Developer Guide
Form Iterator
•
Object: Form_Iterator
•
Description: this allows a run loop to be generated for a specific form, where predetermined values for
each of the fields included are used in each run.
•
Functions:
o
o
Constructor(findForm, submitForm, sequenceType, reusableConnection, baseElements, inputPage,
parallelIterator)
•
findForm: NSEQL program that allows for the form to be used as the basis of the
iteration to be found (see [NSEQL] for further information on NSEQL).
•
submitForm: NSEQL program that allows for the form to be invoked (see [NSEQL] for
further information on NSEQL).
•
sequenceType: type of pool to use. The possible values are:
•
SEQUENCE_IEBROWSER
•
SEQUENCE_HTTP_BROWSER
•
SEQUENCE_FTP
•
SEQUENCE_LOCAL
•
reusableConnection: this indicates whether the connection will be reused (“true”) or
not (“false”). See [GENER] for further information.
•
baseElements: optional list of records that can be employed as variables to use in the
different NSEQL browsing sequences used in this component.
•
inputPage: input page from which the selected form can be iteratively invoked.
•
parallelIterator: “true”: the component will execute its iterations in parallel.
selectMultiplePositions(field, position, positionsArray, clickedArray): indicates what positions are
selected in a multiple selection field in the target form.
•
field: name of the multiple selection field.
•
position: position related to the field between those of the same name, starting with
position 0.
•
positionsArray: list that indicates the position held for each valuesArray element in the
event of replicated values.
•
clickedArray: list that indicates whether each valuesArray element can be marked, not
marked or both. There are certain JavaScript constants defined for this:
•
CLICKED_ELEMENT: mark the element.
•
NON_CLICKED_ELEMENT: leave the element as unmarked.
Developing ITPilot Wrappers with JavaScript
36
ITPilot 4.6
•
o
o
o
Developer Guide
CLICKED_AND_NON_CLICKED_ELEMENT: generates two combinations: one
with the element marked and another with the element unmarked.
selectMultipleTexts(field, position, valuesArray, positionsArray, equalsArray, clickedArray): this
indicates the values selected from a multiple selection field for the chosen form.
•
field: name of the multiple selection field.
•
position: position related to the field between those of the same name, starting with
position 0.
•
valuesArray: list of values that must be selected in the field.
•
positionsArray: list that indicates the position held for each valuesArray element in the
event of replicated values.
•
equalsArray: list that indicates whether the value of each valuesArray element must be
identical to that appearing in the selection field (equals = true) or contained therein
(equals = false).
•
clickedArray: list that indicates whether each valuesArray element can be marked, not
marked or both. There are certain JavaScript constants defined for this:
•
CLICKED_ELEMENT: mark the element.
•
NON_CLICKED_ELEMENT: leave the element as unmarked.
•
CLICKED_AND_NON_CLICKED_ELEMENT: generates two combinations: one
with the element marked and another with the element unmarked.
selectPositions(field, position, positions): this indicates the values selected from a selection field
for the chosen form.
•
field: name of the HTML selection field.
•
position: position occupied in the event of more than one field element with the same
name.
•
positions: values of the elements on which the component must iterate.
selectTexts(field, position, values, positions, equal): this indicates the values to be used in the
different iterations on a text field.
•
field: name of the HTML text field.
•
position: position of the field in the event of several on the form with the same value.
•
values: list of values that must be selected in the field.
•
positions: list that indicates the position held for each value element in the event of
replicated values.
•
equals: boolean value which indicates if the field values must exactly match those
provided by the function, and might be contained.
Developing ITPilot Wrappers with JavaScript
37
ITPilot 4.6
o
o
o
Developer Guide
click(field, value, state): function that allows for an element to be selected and a “click” event run
on it.
•
field: name of the HTML field on which the click is to be made.
•
value: when this function is run on Radio Buttons, this parameter indicates the
elements selected as a list (e.g. [0, 1]). When run on Checkboxes, it indicates the value
of the selectable element.
•
state: when this function is run on Radio Buttons, this parameter is not used. When run
on Checkboxes, it indicates the status of the element:
•
CLICKED_ELEMENT: mark the element.
•
NON_CLICKED_ELEMENT: leave the element as unmarked.
•
CLICKED_AND_NON_CLICKED_ELEMENT: generates two combinations: one
with the element marked and another with the element unmarked.
input(field, position, values): function that indicates the values added to an input field.
•
field: name of the HTML input field.
•
position: position of the field in the event of several on the form with the same name.
•
values: list of values that must be selected in the field.
textarea(field, position, values): this indicates the values added to a text area.
•
field: name of the HTML input field.
•
position: position of the field in the event of several on the form with the same name.
•
values: list of values that must be selected in the field.
o
toList(): returns the list with the NSEQL sequences used in each iteration.
o
setMaxIterations(count): sets the maximum number of iterations that can be executed.
•
o
setRetries(count): update method for the number of retries in the event of failures.
•
o
mseconds: this indicates the waiting time between retries in milliseconds.
setParallelIterator(flag): the component launches the iteration in parallel.
•
o
count: number of retries.
setRetryDelay(mseconds): this allows for the waiting time between retries to be indicated.
•
o
count: number that determines the maximum number of iterations.
flag: “true”: the iterations will be executed in parallel.
next(inputPage): this returns the page resulting from running a component iteration.
•
inputPage: optional parameter that allows for a new starting page to be indicated on
which a new component iteration is run.
Developing ITPilot Wrappers with JavaScript
38
ITPilot 4.6
Developer Guide
o
hasNext(): function that determines whether there are more results. The function returns “true” if
there is at least one more result or “false” if there is not.
o
close(): function that closes the iterator.
o
syncWithPost(flag): this function indicates whether, to retrieve the status of the page, a POST
message must be issued to the page URL containing the POST parameters with which it arrived.
This is the default synchronization method.
•
o
setBackSequence(back): this function optionally allows for a browsing sequence explicit to its
source page to be indicated for more data extraction operations to be carried out.
•
o
flag: “true” indicates that this synchronization function is to be used. If it is “false”,
ITPilot checks whether there is a back sequence defined with a setBackSequence
function. If there is not, an NSEQL Back() command is run.
back: NSEQL back program.
setReusingConnection(reusingConnection): this indicates whether the connection will be reused
or not.
•
reusingConnection: if “true”, the connection from previous components will be reused.
With the parameter set to “false”, a new browser is opened and the data imported
from the previous session.
o
setBackPages(pages): determines the number of pages that ITPilot must browse back when the
NSEQL Back() command must be run because no back sequence has been explicitly defined, nor a
post navigation has been configured as back sequence.
o
setBrowserType(browserType): this function determines the browser implementation to use in the
component. The accepted values are:
•
0: default browser implementation.
•
1: Internet Explorer browser implementation.
•
2: Firefox browser implementation.
•
3: Denodo HTTP browser implementation.
Developing ITPilot Wrappers with JavaScript
39
ITPilot 4.6
5.3.15
Developer Guide
Get Page
•
Object: Get_Page
•
Description: obtains an active browser from the browser pool, from a previously retrieved identification
code.
•
Functions:
o
Constructor(browserUuid): obtains (or, optionally creates) the handler to an active browser from
its identification.
•
o
browserUuid: browser id.
exec(pageType, lastURL, lastURLMethod, lastURLPostParameters, cookie, proxyUser,
proxyPassword, proxyDomain): executes the component and returns a Page object with
information about the browser’s current state. It is possible to execute the function with no
parameters, for later browsing by using a Sequence object (see section 5.3.27).
•
pageType: type of browser used to access the page:
•
SEQUENCE_IEBROWSER = 1
•
SEQUENCE_HTTP_BROWSER = 2
•
lastURL: last URL where the page is coming from.
•
lastURLMethod: access method (GET, POST) of the URL the page is coming from.
•
lastURLPostParameters: POST-method parameters of the URL the page is coming from.
•
cookie: information storage “cookies”.
•
proxyUser: user name to access the Proxy, if required.
•
proxyPassword: user password to access the Proxy, if required.
•
proxyDomain: Proxy domain, if required.
Developing ITPilot Wrappers with JavaScript
40
ITPilot 4.6
5.3.16
Developer Guide
Init
•
Object: Init
•
Description: is responsible for storing the structure of the input data, which is the data that the wrapper
will receive from the calling application.
•
Functions:
o
o
Constructor(input, output)
•
input: input record of the component. Optionally used, only when custom components
are created (see section 5.4). In the case of standard processes, ITP takes this
information from the JavaScript context.
•
output: name of the output record of the component, which represents the query
parameters of the wrapper. Its use is optional; in the standard process main function, if
not specified, the record will be generated at runtime (with the exec() function).
get(name): this returns the value of a record field created as a group of initialization parameters.
•
o
setText(field, obl, fixedValue): this creates a text-type field in the initialization record.
•
field: name of the field to create.
•
obl: parameter that indicates the compulsory nature of the query on the field to create.
The possible values are:
•
o
name: name of the record field.
•
OPTIONAL (default value): The parameter is optional, it does not need a value
to be assigned in each wrapper query.
•
OBLIGATORY: The parameter is obligatory in any query made on the wrapper.
•
FIXED: the parameter has a constant value; this value is assigned by the
fixedValue parameter, described below.
fixedValue: optional parameter that indicates a constant value assigned to the field.
setInt(field, obl, fixedValue): this creates an integer-type field in the initialization record.
•
field: name of the field to create.
•
obl: parameter that indicates the compulsory nature of the query on the field to create.
The possible values are:
•
•
OPTIONAL (default value): The parameter is optional, it does not need a value
to be assigned in each wrapper query.
•
OBLIGATORY: The parameter is obligatory in any query made on the wrapper.
•
FIXED: the parameter has a constant value; this value is assigned by the
fixedValue parameter, described below.
fixedValue: optional parameter that indicates a constant value assigned to the field.
Developing ITPilot Wrappers with JavaScript
41
ITPilot 4.6
o
setLong(field, obl, fixedValue): this creates a long-type field in the initialization record.
•
field: name of the field to create.
•
obl: parameter that indicates the compulsory nature of the query on the field to create.
The possible values are:
•
o
OPTIONAL (default value): The parameter is optional, it does not need a value
to be assigned in each wrapper query.
•
OBLIGATORY: The parameter is obligatory in any query made on the wrapper.
•
FIXED: the parameter has a constant value; this value is assigned by the
fixedValue parameter, described below.
fixedValue: optional parameter that indicates a constant value assigned to the field.
•
field: name of the field to create.
•
obl: parameter that indicates the compulsory nature of the query on the field to create.
The possible values are:
•
OPTIONAL (default value): The parameter is optional, it does not need a value
to be assigned in each wrapper query.
•
OBLIGATORY: The parameter is obligatory in any query made on the wrapper.
•
FIXED: the parameter has a constant value; this value is assigned by the
fixedValue parameter, described below.
fixedValue: optional parameter that indicates a constant value assigned to the field.
setDouble(field, obl, fixedValue): this creates a double-type field in the initialization record.
•
field: name of the field to create.
•
obl: parameter that indicates the compulsory nature of the query on the field to create.
The possible values are:
•
o
•
setFloat(field, obl, fixedValue): this creates a floating-type field in the initialization record.
•
o
Developer Guide
•
OPTIONAL (default value): The parameter is optional, it does not need a value
to be assigned in each wrapper query.
•
OBLIGATORY: The parameter is obligatory in any query made on the wrapper.
•
FIXED: the parameter has a constant value; this value is assigned by the
fixedValue parameter, described below.
fixedValue: optional parameter that indicates a constant value assigned to the field.
setBlob(field, obl, fixedValue): this creates a BLOB-type (binary large object) field in the
initialization record.
•
field: name of the field to create.
Developing ITPilot Wrappers with JavaScript
42
ITPilot 4.6
•
•
o
•
OPTIONAL (default value): The parameter is optional, it does not need a value
to be assigned in each wrapper query.
•
OBLIGATORY: The parameter is obligatory in any query made on the wrapper.
•
FIXED: the parameter has a constant value; this value is assigned by the
fixedValue parameter, described below.
fixedValue: optional parameter that indicates a constant value assigned to the field.
•
field: name of the field to create.
•
obl: parameter that indicates the compulsory nature of the query on the field to create.
The possible values are:
•
OPTIONAL (default value): The parameter is optional, it does not need a value
to be assigned in each wrapper query.
•
OBLIGATORY: The parameter is obligatory in any query made on the wrapper.
•
FIXED: the parameter has a constant value; this value is assigned by the
fixedValue parameter, described below.
fixedValue: optional parameter that indicates a constant value assigned to the field.
setLink(field, obl, fixedValue): this creates a URL-type field in the initialization record.
•
field: name of the field to create.
•
obl: parameter that indicates the compulsory nature of the query on the field to create.
The possible values are:
•
o
obl: parameter that indicates the compulsory nature of the query on the field to create.
The possible values are:
setBoolean(field, obl, fixedValue): this creates a Boolean-type field in the initialization record.
•
o
Developer Guide
•
OPTIONAL (default value): The parameter is optional, it does not need a value
to be assigned in each wrapper query.
•
OBLIGATORY: The parameter is obligatory in any query made on the wrapper.
•
FIXED: the parameter has a constant value; this value is assigned by the
fixedValue parameter, described below.
fixedValue: optional parameter that indicates a constant value assigned to the field.
setDate(field, format, obl, fixedValue): this creates a date-type field in the initialization record.
•
field: name of the field to create.
•
format: representation format of the date field. This format is optional, but becomes
compulsory, if completed. Otherwise, the wrapper may not be run. This representation
format is defined in [DATEFORMAT].
•
obl: parameter that indicates the compulsory nature of the query on the field to create.
The possible values are:
Developing ITPilot Wrappers with JavaScript
43
ITPilot 4.6
•
o
OPTIONAL (default value): The parameter is optional, it does not need a value
to be assigned in each wrapper query.
•
OBLIGATORY: The parameter is obligatory in any query made on the wrapper.
•
FIXED: the parameter has a constant value; this value is assigned by the
fixedValue parameter, described below.
fixedValue: optional parameter that indicates a constant value assigned to the field.
name: new component name.
setI18n(i18n): function which updates the process i18n.
•
o
•
setName(name): update function for the component name.
•
o
Developer Guide
i18n: type of internationalization to be used. ITPilot provides different types of i18n
configurations, such as ES_EURO, US_PST, GB, etc. See [GENER] for more
information about internationalization in ITPilot.
exec(): main function for running the component, returning a record representing the wrapper
initialization parameters.
Developing ITPilot Wrappers with JavaScript
44
ITPilot 4.6
5.3.17
Developer Guide
Iterator
•
Object: Iterator
•
Description: component that iterates on a list of records one by one.
•
Functions:
o
Constructor(list)
•
list: list of records on which to iterate.
o
hasNext(): this determines whether there are more results on which to iterate. “true” is returned,
if there is at least one more result.
o
next(): this returns the next iteration element. The list is a sorted sequence of records.
The “Parallel Execution” option existing in the ITPilot graphic interface becomes the next JavaScript structure using
the Thread object described in section 5.3.29.
var _thread0 = new Thread();
while(iterator.hasNext()) {
recordInstance = iterator.next();
_thread0.execute("_functionIterator_1",
structureInstance, recordInstance);
Figure 5 Using threads in the Iterator component
Developing ITPilot Wrappers with JavaScript
45
ITPilot 4.6
5.3.18
Developer Guide
JDBCExtractor
•
Object: JDBCExtractor
•
Description: These functions allow sending a query to any source available via JDBC, and return a record
list with the obtained results.
•
Functions:
o
o
o
Constructor (uuid, uri, driver, userName, password, structure, baseRecords, maxPoolSize,
initialPoolSize, checkQuery, query)
•
uuid: component unique identifier.
•
uri: connection URL to the database.
•
driver: driver class to use to connect to the data source.
•
userName: user name.
•
password: user password.
•
structure: structure of the component’s output record list. It is defined as a record of
values.
•
baseRecords: record list to be used.
•
maxPoolSize: maximum number of connections that can be manager by the browser
pool at the same time.
•
initialPoolSize: initial number of browser pool connections. A number of idle
connections as established, ready to be used.
•
checkQuery: SQL query used by the pool to verify the status of the currently cached
connections. It is required that the query is simple and that the queried table exists.
•
query: SQL query that returns the results required by the component.
exec(query, baseRecords): executes the JDBCExtractor component.
•
query: SQL query that returns the results required by the component.
•
baseRecords: record list to be used.
setPoolConfig(maxPoolSize, initialPoolSize, pingQuery): updates the pool configuration.
•
maxPoolSize: maximum number of connections that can be manager by the browser
pool at the same time.
•
initialPoolSize: initial number of browser pool connections. A number of idle
connections as established, ready to be used.
•
pingQuery: SQL query used by the pool to verify the status of the currently cached
connections. It is required that the query is simple and that the queried table exists.
Developing ITPilot Wrappers with JavaScript
46
ITPilot 4.6
o
disablePool(): disables the connection pool.
o
addDriverProperty(propname, propvalue): adds a JDBC driver property.
•
propname: property name.
•
propvalue: property value.
Developing ITPilot Wrappers with JavaScript
Developer Guide
47
ITPilot 4.6
5.3.19
•
Developer Guide
Loop
Description: This allows loops to be made in the flow. The loop will be repeated, as long as the given
condition is met (WHILE… DO). The loop component is implemented in JavaScript using a while loop, with
a Condition object used as the loop output condition. The Condition object is defined in section 5.3.5. To
define the condition expression, ITPilot provides a set of functions, defined in Appendix A of [GENER].
var loop = null;
loop = new Condition(<output_condition>);
loop.onError(RUNTIME_ERROR, ON_ERROR_RAISE);
while(loop.exec([])) {
<loop operations>
…
}
Figure 6 Using the Loop function
Developing ITPilot Wrappers with JavaScript
48
ITPilot 4.6
5.3.20
Developer Guide
Next Interval Iterator
•
Object: Next_Interval_Iterator
•
Description: this allows for iteration by different inter-related pages by one or by different browsing
sequences.
•
Functions:
o
o
Constructor(sequences, iterations, sequenceType, reuse, inputPage)
•
sequences: list of browsing sequences to use. If there is only one sequence, it will try
to use it in all iterations. If there is more than one sequence, it will use one in each
iteration.
•
iterations: this indicates, for every sequence, the number of iterations to be made; the
size of this list must be equal to the size of the list provided in the sequences
parameter. This parameter is only valid when a single browsing sequence is indicated
for use in the sequences parameter.
•
sequenceType: type of pool to use. The possible values are:
•
SEQUENCE_IEBROWSER
•
SEQUENCE_HTTP_BROWSER
•
SEQUENCE_FTP
•
SEQUENCE_LOCAL
•
reuse: boolean value that indicates whether the browser used to date is reused or
whether a new browser is launched, maintaining the session’s information.
•
inputPage: this indicates the page from which the next browsing sequence is to be
made.
next(inputRecords, inputPage): this returns the next iteration element.
•
inputRecords: list of input records that can be used as parameters within the browsing
sequences at the next interval.
•
inputPage: this indicates the page from which the next pages are to be accessed.
o
close(): this closes the iterator.
o
setRetries(count): this configures the number of retries in the event of error in accessing the next
page.
•
o
count: number of retries.
setRetryDelay(count): this configures the interval between two retries.
•
count: interval in milliseconds.
Developing ITPilot Wrappers with JavaScript
49
ITPilot 4.6
o
syncWithPost(flag): this function indicates whether, to retrieve the status of the page, a POST
message must be issued to the page URL containing the POST parameters with which it arrived.
This is the default synchronization function.
•
o
flag: “true” indicates that this synchronization function is to be used. If it is “false”,
ITPilot checks whether there is a back sequence defined with a setBackSequence
method. If there is not, an NSEQL Back() method is run.
setBackSequence(back): this function optionally allows for a browsing sequence explicit to its
source page to be indicated for more data extraction operations to be carried out.
•
o
Developer Guide
back: NSEQL back program.
setReusingConnection(reusingConnection): this indicates whether the connection will be reused
or not.
•
reusingConnection: if “true”, the connection from previous components will be reused.
With the parameter set to “false”, a new browser is opened and the data imported
from the previous session.
o
setBackPages(pages): determines the number of pages that ITPilot must browse back when the
NSEQL Back() command must be run because no back sequence has been explicitly defined, nor a
post navigation has been configured as back sequence.
o
setBrowserType(browserType): this function determines the browser implementation to use in the
component. The accepted values are:
•
0: default browser implementation.
•
1: Internet Explorer browser implementation.
•
2: Firefox browser implementation.
•
3: HTTP browser implementation.
Developing ITPilot Wrappers with JavaScript
50
ITPilot 4.6
5.3.21
Developer Guide
Output
•
Object: Output
•
Description: this places a record in the wrapper output.
•
Functions:
o
Constructor(structure)
•
o
structure: parameter that indicates the component input record to be used as the
wrapper result.
add(record): this allows for the component input record to be used as the wrapper result to be
subsequently added.
•
record: record to use.
Developing ITPilot Wrappers with JavaScript
51
ITPilot 4.6
5.3.22
Developer Guide
Record Constructor
•
Object: Record_Constructor
•
Description: this allows a record to be constructed using other records generated in the flow as well as
generating new attributes derived from already existing ones.
•
Functions:
o
o
o
Constructor(recordsObj, name)
•
recordsObj: list of input elements. Each element from the list can be a record or a list of
records.
•
name: name of the output record of the Record Constructor component.
add(fieldName, expression, errorAction): method for adding a new field to the record under
construction.
•
fieldname: name of the field.
•
expression: field definition expression, e.g. “$0.PARAM1” indicates that the field will
contain the field PARAM1 from the first input record of the recordsObj list entered in
the constructor. To define the condition expression, ITPilot provides a set of functions,
defined in Appendix A of [GENER].
•
errorAction: action to be run in the event of it not being possible to assess the
expression correctly. The possible values are:
•
ON_ERROR_RAISE: stop wrapper run, indicating the source of the error.
•
ON_ERROR_IGNORE: ignore the error, continuing with the wrapper run.
exec(): this runs the Record Constructor component instance, returning an object that represents
the record obtained.
NOTE: If the error handler or this component is set to ON_ERROR_IGNORE, RECORD CONSTRUCTOR will return
the list of filtered elements, except for the one that caused the error.
Developing ITPilot Wrappers with JavaScript
52
ITPilot 4.6
5.3.23
Developer Guide
Record Sequence or Extractor Sequence
•
Object: Record_Sequence
•
Description: This creates a browsing sequence created from the results of a record. It allows sequences
to be created for access to other pages from pages processed by the Extractor component.
•
Functions:
o
Constructor(sequences, sequenceDepends, sequenceType, reuse, inputPage)
•
sequences: ordered and sequential list of the NSEQL browsing sequences to be used by
the component.
•
sequenceDepends: ordered and sequential list of the DEXTL tags associated with each
NSEQL browsing sequence from the sequences list.
•
sequenceType: type of pool to use. The possible values are:
•
SEQUENCE_IEBROWSER
•
SEQUENCE_HTTP_BROWSER
•
SEQUENCE_FTP
•
SEQUENCE_LOCAL
•
reuse: Boolean value that indicates whether the browser used to date is reused or
whether a new browser is launched, maintaining the session’s information. In general,
this value will be “true”, although in some cases it may not be a good option, if the
previous iterator is run in parallel to it.
•
inputPage: optional, this allows for a homepage to be indicated.
o
exec(): this returns a page object that represents the target page of the browsing sequence/s.
o
All of the methods offered by the Sequence component.
Developing ITPilot Wrappers with JavaScript
53
ITPilot 4.6
5.3.24
Developer Guide
Release Persistent Browser
•
Object: Release_Persistent_Browser
•
Description: accepts a browser id or a page as browser identifier and releases that specific browser.
•
Functions:
o
Constructor(page):
•
o
Constructor(browserUuid):
•
o
page: page loaded on the browser that is going to be released.
browserUuid: browser identifier.
exec(): executes the component.
Developing ITPilot Wrappers with JavaScript
54
ITPilot 4.6
5.3.25
•
Developer Guide
Repeat
Description: This allows for loops to be made in the flow. The loop is repeated until the given condition is
met (REPEAT… UNTIL). The Repeat component is implemented in JavaScript using a do… while loop, with
a Condition object used as the loop output condition. The Condition object is defined in section 5.3.5. To
define the condition expression, ITPilot provides a set of functions, defined in Appendix A of [GENER].
var repeat = null;
repeat = new Condition(<output_condition>);
repeat.onError(RUNTIME_ERROR, ON_ERROR_RAISE);
do {
<loop_operations>
…
} while(repeat.exec([]));
Figure 7 Using the Repeat function
Developing ITPilot Wrappers with JavaScript
55
ITPilot 4.6
5.3.26
•
Developer Guide
Script
Description: The component allows for part of the description logic of an ITPilot wrapper to be written in
JavaScript. This component has no specific JavaScript function associated. When this component is used
from the generation graphic interface, it becomes a JavaScript function that is invoked from the place held
within the process flow.
Developing ITPilot Wrappers with JavaScript
56
ITPilot 4.6
5.3.27
Developer Guide
Sequence
•
Object: Sequence
•
Description: This creates a browsing sequence in NSEQL language (see [NSEQL]).
•
Functions:
o
Constructor(sequence, sequenceType, reusableConnection, inputPage)
•
•
•
•
o
•
SEQUENCE_IEBROWSER
•
SEQUENCE_HTTP_BROWSER
•
SEQUENCE_FTP
•
SEQUENCE_LOCAL
reusableConnection: this indicates whether the connection will be reused (“true”) or
not (“false”). See [GENER] for further information.
inputPage: optional parameter, this indicates the starting page. If not, the NSEQL
program is run directly.
inputValues: list of values that can be used as input parameters within the browsing
sequence.
inputPage: optional parameter, this describes the page from which the component
browsing sequence is run.
setRetries(count): update function for the number of retries in the event of failures.
•
o
•
exec(inputValues, inputPage): this runs the Sequence component, returning the last page that the
browsing sequence has reached.
•
o
sequence: NSEQL browsing program (see [NSEQL]).
sequenceType: type of pool to use. The possible values are:
count: number of retries.
setRetryDelay(mseconds): this allows for the waiting time between retries to be indicated.
•
mseconds: this indicates the waiting time between retries in milliseconds.
o
close(): this closes the connection with the running browser.
o
syncWithPost(flag): this method indicates whether, to retrieve the status of the page, a POST
message must be issued to the page URL containing the POST parameters with which it arrived.
This is the default synchronization function.
•
flag: “true” indicates that this synchronization function must be used. If it is “false”,
ITPilot checks whether there is a back sequence defined with a setBackSequence
method. If there is not, an NSEQL Back() command is run.
Developing ITPilot Wrappers with JavaScript
57
ITPilot 4.6
o
setBackSequence(back): this function optionally allows for a browsing sequence explicit to its
source page to be indicated for more data extraction operations to be carried out.
•
o
back: NSEQL back program.
setReusingConnection(reusingConnection): this indicates whether the connection will be reused
or not.
•
o
Developer Guide
reusingConnection: if “true”, the connection from previous components will be reused.
With the parameter set to “false”, a new browser is opened and the data imported
from the previous session.
setBackPages(pages): determines the number of pages that ITPilot must browse back when the
NSEQL Back() command must be run because no back sequence has been explicitly defined, nor a
post navigation has been configured as back sequence.
•
pages: number of back pages.
o
toString(): this returns the NSEQL (see [NSEQL]) sequence.
o
setBrowserType(browserType): this function determines the browser implementation to use in the
component. The accepted values are:
•
•
•
•
0: default browser implementation.
1: Internet Explorer browser implementation.
2: Firefox browser implementation.
3: Denodo HTTP browser implementation.
Developing ITPilot Wrappers with JavaScript
58
ITPilot 4.6
5.3.28
Developer Guide
Store File
•
Object: StoreFile
•
Description: this stores the contents entered as the input parameter in a file.
•
Functions:
o
Constructor(content, file)
•
content: string- or binary-type value that indicates the contents to be stored. A page
value is also supported as input. In that case, the page content will be stored.
•
file: path and name of the file, where the contents are to be stored.
o
exec(): runs the component.
o
setGenerateFilename(generate): this function determines if the output file name should be
automatically generated when the input file is null or is a directory.
•
o
setRetries(count): update function for the number of retries in the event of failures.
•
o
generate: indicates if the file name should be automatically generated.
count: number of retries.
setRetryDelay(mseconds): this allows for the waiting time between retries to be indicated.
•
mseconds: this indicates the waiting time between retries in milliseconds.
Developing ITPilot Wrappers with JavaScript
59
ITPilot 4.6
5.3.29
Developer Guide
Thread
•
Object: Thread
•
Description: this represents a Thread in the ITPilot wrapper. It is often used when the subsequent
processing on each of the records obtained in an extraction operation is carried out concurrently.
•
Functions:
o
wait(): This causes the thread to enter standby, until all executions invoked with the function
execute have been finished.
o
execute(functionName, <list of arguments>): this launches the run thread on the described
function.
o
•
functionName: name of the JavaScript function to be run.
•
<list of arguments>: list of arguments separated by commas, which must match the
arguments of the JavaScript function.
setMaxConcurrentThreads(int): allows to configure the maximum number of Thread instances
that will be used in parallel. Later requests will be queued until the ongoing executions finish.
•
int: maximum number
Developing ITPilot Wrappers with JavaScript
60
ITPilot 4.6
5.4
Developer Guide
USE OF CUSTOM COMPONENTS IN JAVASCRIPT WRAPPERS
5.4.1
Developing Custom Components
Custom components can be graphically developed by using the wrapper generation tool (see [GENER]), but they can
also be developed in JavaScript. To achieve it, a file with .js suffix must be created and stored in the path
<DENODO_HOME>/metadata/itp-custom-components, with the following functions:
•
mycustom_main(mycustom_input) { var mycustom_output = null; …
return mycustom_output; }
o
•
mycustom_getInputStructure() { … }
o
•
This function allows to define the input schema.
mycustom_getOutputType() { return <TYPE> }
o
•
This is the main function, where “ mycustom” is the name of the custom component.
This is the function that defines the component output type. The possible values are:
•
LIST_TYPE = 1
•
PAGE_TYPE = 2
•
RECORD_TYPE
•
SIMPLE_TYPE = 4
•
ARRAY_TYPE = 5
•
BINARY_TYPE = 6
•
BOOLEAN_TYPE = 7
•
DATE_TYPE = 8
•
DOUBLE_TYPE = 9
•
FLOAT_TYPE = 10
•
INT_TYPE = 11
•
LONG_TYPE = 12
•
STRING_TYPE = 13
•
URL_TYPE = 14
•
BROWSER_ID_TYPE = 15
= 3
mycustom_getOutputStructure) { … }
Developing ITPilot Wrappers with JavaScript
61
ITPilot 4.6
o
5.4.2
Developer Guide
This function is responsible for defining the output structure that will be returned by the
component. It is necessary only when the output type defined by the function
myCustom_getOutputType is of type RECORD_TYPE or LIST_TYPE.
Using Custom Components
If a custom component developed in JavaScript is to be used, then it should be stored in JavaScript format (with .js
extension) in the <DENODO_HOME>/metadata/itp-custom-components directory. Each
component is represented as a js file, the name of which matches the name of the custom component.
The main function of the custom component is <component>_main(Inputelement), where
<component> is the name of the custom component, as mentioned in the previous section.
To use a custom component from a wrapper developed in JavaScript the following piece of code should be used:
try {
SCOPE.create();
mycustom = new CUSTOM_COMPONENT(<customcomponent_type>);
mycustom.setComponentName(<component_name>);
mycustom_output = mycustom.exec(<input_parameters>);
} finally {
SCOPE.close();
}
Figure 8 Using custom components from JavaScript
where:
•
•
•
5.5
<customcomponent_type> is the type of the custom component to be used.
<component_name> represents the name of the component.
<input_parameters> is the list of input parameters the custom component receives as input.
WRAPPER DEVELOPMENT
Once the script has been developed, creating a wrapper is very simple, as the VQL statement has simply to be written
as follows:
CREATE WRAPPER ITP <name> [MAINTENANCE FALSE] 'jscode'
where jscode is the recently generated JavaScript code.
NOTE: The VQL syntax uses quotes to delimit the JavaScript code, so if quotes are to be used internally, they must
be escaped with the ‘\’ character.
Developing ITPilot Wrappers with JavaScript
62
ITPilot 4.6
Developer Guide
REFERENCES
[AXIS] Apache Axis Web Server. http://ws.apache.org/axis/
[DATEFORMAT] Java Format Representation for dates.
http://java.sun.com/j2se/1.5.0/docs/api/java/text/SimpleDateFormat.html
[DEXTL] Denodo DEXTL 4.6 Manual. Denodo Technologies, 2011.
[DOTNET] Microsoft .NET Framework, http://www.microsoft.com/net/
[DPORT] Denodo Virtual DataPort 4.6 Administration Guide. Denodo Technologies, 2011.
[ECMA262] Standard ECMA-262. ECMAScript Language Specification, 3.0.
[GENER] Denodo ITPilot 4.6 Generation Environment Guide. Denodo Technologies, 2011.
[JDOC] Javadoc documentation of the Developer API.
[MIME] RFC 2045. Multipurpose Internet Mail Extensions (MIME).
[NSEQL] Denodo ITPilot 4.6 NSEQL Manual (Navigation SEQuence Language). Denodo Technologies, 2011.
[PERL] PERL Language. http://www.perl.com
[USER] Denodo ITPilot 4.6 User Guide. Denodo Technologies, 2011.
[SOAP] SOAP Version 1.2. W3C Recommendation. http://www.w3.org/TR/soap/
[VQL] Denodo Virtual DataPort 4.6 Advanced VQL Guide. Denodo Technologies, 2011.
[WSDL] Web Services Description Language (WSDL) 1.1. W3C Note, http://www.w3.org/TR/wsdl
References
63
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement