Programmer`s Guide

Programmer`s Guide
HP Vertica Analytics Platform 6.1.x
Programmer's Guide
Doc Revision 3
Copyright© 2006-2013 Hewlett-Packard
Date of Publication: Monday, September 23, 2013
Contents
Programmer's Guide
11
Syntax Conventions
12
HP Vertica Client Libraries
13
Client Driver and Server Version Co mpatib ility ...................................................................................................13
Installing the HP Vertica Client Drivers ..................................................................................................................15
Driver Prerequisites ........................................................................................................................................15
Installing the Client Drivers ..........................................................................................................................19
Upgrading the HP Vert ica Client Drivers ...................................................................................................28
Using Legacy Drivers ....................................................................................................................................29
Creat ing an ODBC Data Source Name (DSN).......................................................................................................29
Creat ing an ODBC DSN for Linu x, So laris, AIX, and HP-UX..............................................................29
Creat ing an ODBC DSN for Windows Clients .........................................................................................33
Creat ing an ODBC DSN for Macintosh OS X Clients.............................................................................41
DSN Parameters ..............................................................................................................................................43
Setting DSN Parameters ................................................................................................................................48
Additional ODBC Driver Configuration Settings..................................................................................................49
Location of the Additional Driver Settings ................................................................................................49
Additional Parameter Settings ......................................................................................................................51
Programming ODBC Client Applicat ions...............................................................................................................54
ODBC Arch itecture ........................................................................................................................................54
ODBC Feature Support..................................................................................................................................55
Updating ODBC Client Code fro m Previous Driver Versions ...............................................................55
HP Vert ica-specific ODBC Header File .....................................................................................................58
Connecting to HP Vertica ..............................................................................................................................59
Pro mpting Windows Users for Missing Connection Parameters ...........................................................63
Setting the Locale for ODBC Sessions .......................................................................................................64
AUTOCOMMIT and ODBC Transactions ................................................................................................68
Retrieving Data through ODBC ...................................................................................................................71
Handling NULL Values in ODBC...............................................................................................................76
Loading Data through ODBC .......................................................................................................................81
Programming JDBC Client Applications ............................................................................................................. 107
JDBC Feature Support ................................................................................................................................ 107
Updating JDBC Client Code fro m Prev ious Driver Versions.............................................................. 108
Creat ing and Configuring a Connection .................................................................................................. 114
JDBC Data Types ........................................................................................................................................ 127
Executing Queries Through JDBC ........................................................................................................... 136
Loading Data Th rough JDBC .................................................................................................................... 138
Programming ADO.NET Applications................................................................................................................. 161
Updating ADO.NET Client Code fro m Previous Driver Versions ..................................................... 161
Setting the Locale for A DO.NET Sessions ............................................................................................. 163
Connecting to the Database........................................................................................................................ 163
Querying the Database using ADO.NET ................................................................................................. 173
Handling Messages (ADO.NET) .............................................................................................................. 200
Getting Table Metadata (ADO.Net) ......................................................................................................... 202
-ii-
Contents
ADO.NET Data Types................................................................................................................................ 204
Microsoft Connectivity Pack ..................................................................................................................... 205
Programming Python Client Applications ........................................................................................................... 211
Configuring the ODBC Run-t ime Environ ment on Linu x ................................................................... 212
Querying the Database Using Python ...................................................................................................... 212
Programming Perl Client Applications ................................................................................................................. 215
Perl Client Prerequisites ............................................................................................................................. 215
Connecting to HP Vertica Using Perl....................................................................................................... 218
Executing Statements Using Perl .............................................................................................................. 222
Batch Loading Data Using Perl ................................................................................................................. 223
Using COPY LOCA L to Load Data in Perl ............................................................................................ 226
Querying HP Vert ica Using Perl ............................................................................................................... 229
Conversions between Perl and HP Vertica Data Types ........................................................................ 232
Perl Un icode Support .................................................................................................................................. 236
Programming PHP Client Applications ................................................................................................................ 239
PHP Un icode Support ................................................................................................................................. 240
Querying the Database Using PHP ........................................................................................................... 241
Using vsql
244
Installing the vsql Client ......................................................................................................................................... 245
vsql Notes for Windows Users .................................................................................................................. 246
Connecting Fro m the Administration Tools ........................................................................................................ 246
Connecting fro m the Co mmand Line.................................................................................................................... 247
Co mmand Line Options.............................................................................................................................. 248
Connecting Fro m a Non-Cluster Host...................................................................................................... 251
Meta-Co mmands ...................................................................................................................................................... 252
! [ COMMAND ] ......................................................................................................................................... 253
?....................................................................................................................................................................... 253
a....................................................................................................................................................................... 254
b ...................................................................................................................................................................... 254
c (or \connect) [ dbname [ username ] ] ................................................................................................... 254
C [ STRING ] ............................................................................................................................................... 255
cd [ DIR ]....................................................................................................................................................... 255
The \d [ PATTERN ] meta-co mmands .................................................................................................... 255
e \edit [ FILE ] .............................................................................................................................................. 263
echo [ STRING ] ....................................................................................................................................... 263
f [ string ]....................................................................................................................................................... 263
g ...................................................................................................................................................................... 263
H ..................................................................................................................................................................... 264
h \help ............................................................................................................................................................ 264
i FILE ............................................................................................................................................................. 265
l ....................................................................................................................................................................... 265
locale .............................................................................................................................................................. 265
o ...................................................................................................................................................................... 266
p ...................................................................................................................................................................... 267
password [ USER ] ...................................................................................................................................... 267
pset NAME [ VA LUE ] .............................................................................................................................. 267
q ...................................................................................................................................................................... 269
qecho [ STRING ]........................................................................................................................................ 269
r ....................................................................................................................................................................... 269
s [ FILE ] ....................................................................................................................................................... 269
-iii-
Programmer's Guide
set [ NAME [ VA LUE [ ... ] ] ] ................................................................................................................. 270
t ....................................................................................................................................................................... 271
T [ STRING ]................................................................................................................................................ 271
timing ............................................................................................................................................................. 271
unset [ NAM E ] ............................................................................................................................................ 271
w [ FILE ]...................................................................................................................................................... 272
x ...................................................................................................................................................................... 272
z....................................................................................................................................................................... 272
Variables .................................................................................................................................................................... 272
AUTOCOMMIT .......................................................................................................................................... 273
DBNAM E ..................................................................................................................................................... 274
ECHO............................................................................................................................................................. 274
ECHO_ HIDDEN ......................................................................................................................................... 274
ENCODING ................................................................................................................................................. 274
HISTCONTROL.......................................................................................................................................... 274
HISTSIZE ..................................................................................................................................................... 275
HOST ............................................................................................................................................................. 275
IGNOREEOF ............................................................................................................................................... 275
ON_ ERROR_ STOP .................................................................................................................................... 275
PORT ............................................................................................................................................................. 275
PROMPT1 PROM PT2 PROMPT3 .......................................................................................................... 276
QUIET ........................................................................................................................................................... 276
SINGLELINE............................................................................................................................................... 276
SINGLESTEP .............................................................................................................................................. 276
USER ............................................................................................................................................................. 276
VERBOSITY................................................................................................................................................ 276
VSQL_HOM E.............................................................................................................................................. 276
Pro mpting................................................................................................................................................................... 277
Co mmand Line Edit ing ........................................................................................................................................... 278
vsql Environ ment Variab les ................................................................................................................................... 279
Locales........................................................................................................................................................................ 279
Files............................................................................................................................................................................. 280
Expo rting Data Using vsql...................................................................................................................................... 280
Copying Data Using vsql ........................................................................................................................................ 282
Output Formatting Examp les ................................................................................................................................. 283
Writing Queries
285
Historical (Snapshot) Queries ................................................................................................................................ 285
Temporary Tables..................................................................................................................................................... 286
SQL Queries .............................................................................................................................................................. 286
Subqueries.................................................................................................................................................................. 290
Subqueries Used in Search Conditions .................................................................................................... 290
Subqueries in the SELECT List ................................................................................................................ 301
WITH Clauses in SELECT ........................................................................................................................ 302
Noncorrelated and Correlated Subqueries ............................................................................................... 303
Flattening FROM Clause Subqueries and Views ................................................................................... 304
Subqueries in UPDATE and DELETE Statements................................................................................ 305
Subquery Examp les ..................................................................................................................................... 311
Subquery Restrictions ................................................................................................................................. 314
Joins ............................................................................................................................................................................ 315
The ANSI Join Syntax ................................................................................................................................ 316
Join Condit ions vs. Filter Conditions ....................................................................................................... 316
-iv-
Contents
Inner Joins ..................................................................................................................................................... 317
Outer Joins .................................................................................................................................................... 322
Range Joins ................................................................................................................................................... 324
Pre-join Pro jections and Join Predicates.................................................................................................. 326
Join Notes and Restrictions ........................................................................................................................ 327
Using SQL Analytics
329
How Analytic Functions Work .............................................................................................................................. 329
Analytic Functions versus Aggregate Functions................................................................................................. 331
The Window OVER() Clause................................................................................................................................. 334
Window Partit ioning ................................................................................................................................... 334
Window Ordering ........................................................................................................................................ 336
Window Framing ......................................................................................................................................... 337
Naming Windows ..................................................................................................................................................... 353
Analytic Query Examples ....................................................................................................................................... 354
Calculating a Median Value ....................................................................................................................... 354
Getting Price Differential for Two Stocks ............................................................................................... 355
Calculating the Moving Average .............................................................................................................. 356
Avoiding GROUPBY HA SH with Pro jection Design .......................................................................... 357
Getting Latest bid and ask Results ............................................................................................................ 359
Event-based Windows ............................................................................................................................................. 360
Sessionization with Event-based Windows ......................................................................................................... 365
Using Time Series Analytics
368
Gap Filling and Interpolation (GFI) ...................................................................................................................... 369
Constant Interpolation................................................................................................................................. 370
The TIM ESERIES Clause and Aggregates ............................................................................................. 370
Linear Interpolation ..................................................................................................................................... 372
GFI Examples ............................................................................................................................................... 373
When Time Series Data Contains Null Values ................................................................................................... 378
Event Series Joins
380
Sample Schema for Event Series Joins Examples .............................................................................................. 380
Writing Event Series Joins...................................................................................................................................... 383
Event Series Pattern Matching
386
Optimizing Query Performance
389
First Steps for Improving Query Perfo rmance .................................................................................................... 389
Optimizing Encoding to Improve Query Performance ...................................................................................... 390
Improving the Co mpression of FLOAT Colu mns ................................................................................. 391
Using Run Length Encoding (RLE) to Improve Query Performance ................................................ 391
Optimizing Project ions for Queries with Predicates .......................................................................................... 392
Optimizing GROUP BY Queries ........................................................................................................................... 394
Avoiding GROUPBY HA SH with Pro jection Design .......................................................................... 394
Avoiding Resegmentation During GROUP BY Optimizat ion with Pro ject ion Design .................. 396
-v-
Programmer's Guide
Optimizing DISTINCT in a SELECT Query List .............................................................................................. 397
Optimizing JOIN Queries ....................................................................................................................................... 399
Hash Joins vs. Merge Joins ........................................................................................................................ 399
Optimizing for Merge Jo ins ....................................................................................................................... 399
Using Equality Predicates to Optimize Joins .......................................................................................... 400
Specifying INNER and OUTER Tables to Optimize Jo ins .................................................................. 400
Avoiding Resegmentation During Jo ins .................................................................................................. 401
Optimizing ORDER BY Queries ........................................................................................................................... 404
Pre-sorting Pro jections to Optimize ORDER BY Clauses ................................................................... 404
Optimizing SQL-99 Analytic Functions............................................................................................................... 406
Avoiding Single-Node Execution by Avoiding Empty OVER() Clauses.......................................... 406
NULL Placement by Analytic Functions................................................................................................. 406
Designing Tables to Minimize Run-Time Sorting of NULL Values in Analytic Functions .......... 407
Optimizing LIMIT Queries with ROW_NUM BER Pred icates........................................................................ 409
Optimizing INSERT-SELECT Operations .......................................................................................................... 410
Optimizing INSERT-SELECT Queries for Tab les with Pre-join Pro jections .................................. 410
Optimizing INSERT-SELECT Queries by Matching Sort Orders ...................................................... 411
Avoiding Resegmentation of INSERT -SELECT Queries .................................................................... 412
Optimizing DELETE and UPDATE Queries ...................................................................................................... 413
Performance Considerations for DELETE and UPDATE Queries ..................................................... 413
Optimizing DELETEs and UPDATEs for Performance ....................................................................... 414
Using External Procedures
416
Implementing External Procedures ....................................................................................................................... 417
Requirements for External Procedures..................................................................................................... 418
Installing External Procedure Executable Files ...................................................................................... 419
Creat ing External Procedures .................................................................................................................... 420
Executing External Procedures .............................................................................................................................. 421
Dropping External Procedures ............................................................................................................................... 422
Using User-Defined SQL Functions
423
Creat ing User-Defined SQL Functions................................................................................................................. 423
Altering and Dropping User-Defined SQL Functions ....................................................................................... 424
Managing Access to SQL Functions ..................................................................................................................... 425
Viewing Informat ion About User-Defined SQL Functions .............................................................................. 426
Migrating Built-in SQL Functions ........................................................................................................................ 427
Developing and Using User Defined Functions
429
How UDFs Work...................................................................................................................................................... 429
User Defined Extensions and User Defined Functions...................................................................................... 430
Types of UDFs .......................................................................................................................................................... 430
Fenced Mode ............................................................................................................................................................. 431
Setting up a UDF Develop ment Environ ment..................................................................................................... 433
The HP Vert ica SDK ............................................................................................................................................... 433
The HP Vert ica SDK API Documentation........................................................................................................... 434
Developing a User Defined Function in C++ ...................................................................................................... 434
HP Vert ica C++ SDK Data Types ............................................................................................................ 435
Developing a User Defined Scalar Function........................................................................................... 435
Developing a User Defined Transform Function in C++ ..................................................................... 444
-vi-
Contents
Developing a user Defined Aggregate Function .................................................................................... 459
Developing a User Defined Analytic Function....................................................................................... 468
Developing User Defined Load (UDL) Functions ................................................................................. 475
Co mpiling Your UDF ................................................................................................................................. 493
Handling Different Nu mbers and Types of Arguments ........................................................................ 494
UDF Parameters ........................................................................................................................................... 502
UDF Resource Use ...................................................................................................................................... 507
Handling Errors ............................................................................................................................................ 514
Handling Cancel Requests ......................................................................................................................... 515
UDF Debugging Tips .................................................................................................................................. 517
Developing a User Defined Function in R ........................................................................................................... 518
User Defined Functions in R Notes and Limitations ............................................................................. 518
Installing/Upgrading the R Language Pack for HP Vertica ................................................................. 519
R Packages .................................................................................................................................................... 521
Using the HP Vert ica SDK R Examp les .................................................................................................. 522
Creat ing R Functions................................................................................................................................... 522
Updating UDF Libraries.......................................................................................................................................... 539
UDF Library Co mpatib ility with New Server Versions ....................................................................... 540
Determining if a UDF Signature Has Changed ...................................................................................... 541
Deploying A New Version of You r UDF Library ................................................................................. 541
Listing the UDxs Contained in a Library ............................................................................................................. 542
Using the Hadoop Connector
544
Prerequisites .............................................................................................................................................................. 544
How Hadoop and HP Vertica Work Together..................................................................................................... 544
Hadoop Connector Features ................................................................................................................................... 545
Hadoop Connector Installation Procedure ........................................................................................................... 545
Accessing HP Vertica Data fro m Hadoop............................................................................................................ 547
Selecting HP Vert icaInputFormat ............................................................................................................. 547
Setting the Query to Retrieve Data fro m HP Vertica ............................................................................ 548
Writing a Map Class that Processes HP Vertica Data ........................................................................... 551
Writing Data to HP Vert ica fro m Hadoop ........................................................................................................... 552
Configuring Hadoop to Output to HP Vert ica ........................................................................................ 552
Defining the Output Table.......................................................................................................................... 552
Writing the Reduce Class ........................................................................................................................... 553
Passing Parameters to the Hadoop Connector at Run Time .............................................................................. 558
Example Hadoop Connector Application............................................................................................................. 559
Co mpiling and Running the Example Application ............................................................................................. 564
Using Hadoop Streaming with HP Vert ica's Hadoop Connector..................................................................... 567
Accessing HP Vertica fro m Pig ............................................................................................................................. 572
Using the Hadoop Distributed File System Connectors
575
HDFS Connector Requirements............................................................................................................................. 575
Secure HDFS Connector Requirements ............................................................................................................... 576
Installing and Configuring Kerberos on Your HP Vertica Cluster ..................................................... 576
Preparing Keytab Files for the Secure HDFS Connector...................................................................... 577
-vii-
Programmer's Guide
Testing Your Hadoop WebHDFS Configuration................................................................................................ 578
Installing the HDFS Connector Package .............................................................................................................. 580
Loading the HDFS User Defined Source ............................................................................................................. 581
Loading Data Using the Standard HDFS Connector.......................................................................................... 582
Loading Data Using the Secure HDFS Connector ............................................................................................. 584
Creat ing an External Table Based on HDFS Files .............................................................................................. 586
HDFS Connector Troubleshooting Tips ............................................................................................................... 588
Using Informatica PowerCenter
590
Installing the HP Vertica Plug-in fo r Po werCenter ............................................................................................ 590
Registering the Plug-in's Metadata ........................................................................................................... 591
Preparing the PowerCenter Client............................................................................................................. 594
Copying the Plug-in Lib rary on the Server ............................................................................................. 596
Using the HP Vert ica Plug-in for PowerCenter .................................................................................................. 596
Configuring PowerCenter to Read Data fro m HP Vert ica ................................................................................ 602
Setting PowerCenter's Buffer Size ........................................................................................................................ 603
Resolving a Message Involving a Missing DLL ................................................................................................. 605
Appendix: Error Codes
608
SQLSTATEs and Error Codes ............................................................................................................................... 609
HP Vert ica SQLSTATE List .................................................................................................................................. 609
Warning Messages Associated with SQLSTATE 01000 ..................................................................... 620
Warning Messages Associated with SQLSTATE 01006 ..................................................................... 621
Warning Messages Associated with SQLSTATE 01007 ..................................................................... 621
Warning Messages Associated with SQLSTATE 01V01..................................................................... 622
Error Messages Associated with SQLSTATE 08000............................................................................ 622
Error Messages Associated with SQLSTATE 08001............................................................................ 623
Error Messages Associated with SQLSTATE 08003............................................................................ 623
Error Messages Associated with SQLSTATE 08006............................................................................ 624
Error Messages Associated with SQLSTATE 08V01........................................................................... 624
Error Messages Associated with SQLSTATE 0A 000........................................................................... 625
Error Messages Associated with SQLSTATE 0B000 ........................................................................... 637
Error Messages Associated with SQLSTATE 0LV01 .......................................................................... 637
Error Messages Associated with SQLSTATE 22000............................................................................ 638
Error Messages Associated with SQLSTATE 22001............................................................................ 638
Error Messages Associated with SQLSTATE 22003............................................................................ 639
Error Messages Associated with SQLSTATE 22004............................................................................ 640
Error Messages Associated with SQLSTATE 22007............................................................................ 640
Error Messages Associated with SQLSTATE 22008............................................................................ 641
Error Messages Associated with SQLSTATE 22009............................................................................ 641
Error Messages Associated with SQLSTATE 2200B ........................................................................... 641
Error Messages Associated with SQLSTATE 2200D ........................................................................... 642
Error Messages Associated with SQLSTATE 22011............................................................................ 642
Error Messages Associated with SQLSTATE 22012............................................................................ 643
Error Messages Associated with SQLSTATE 22015............................................................................ 643
Error Messages Associated with SQLSTATE 22019............................................................................ 643
Error Messages Associated with SQLSTATE 2201B ........................................................................... 644
Error Messages Associated with SQLSTATE 2201G ........................................................................... 644
Error Messages Associated with SQLSTATE 22021............................................................................ 645
Error Messages Associated with SQLSTATE 22023............................................................................ 645
Error Messages Associated with SQLSTATE 22025............................................................................ 655
-viii-
Contents
Error Messages Associated with SQLSTATE 22906............................................................................ 655
Error Messages Associated with SQLSTATE 22V02........................................................................... 655
Error Messages Associated with SQLSTATE 22V03........................................................................... 656
Error Messages Associated with SQLSTATE 22V04........................................................................... 657
Error Messages Associated with SQLSTATE 22V0B .......................................................................... 658
Error Messages Associated with SQLSTATE 22V21........................................................................... 658
Error Messages Associated with SQLSTATE 22V23........................................................................... 659
Error Messages Associated with SQLSTATE 22V24........................................................................... 659
Error Messages Associated with SQLSTATE 23502............................................................................ 661
Error Messages Associated with SQLSTATE 23503............................................................................ 661
Error Messages Associated with SQLSTATE 23505............................................................................ 661
Error Messages Associated with SQLSTATE 25V01........................................................................... 662
Error Messages Associated with SQLSTATE 28000............................................................................ 662
Error Messages Associated with SQLSTATE 2BV01 .......................................................................... 663
Error Messages Associated with SQLSTATE 40V01........................................................................... 663
Error Messages Associated with SQLSTATE 42501............................................................................ 663
Error Messages Associated with SQLSTATE 42601............................................................................ 666
Error Messages Associated with SQLSTATE 42602............................................................................ 674
Error Messages Associated with SQLSTATE 42611............................................................................ 675
Error Messages Associated with SQLSTATE 42622............................................................................ 675
Error Messages Associated with SQLSTATE 42701............................................................................ 675
Error Messages Associated with SQLSTATE 42702............................................................................ 676
Error Messages Associated with SQLSTATE 42703............................................................................ 676
Error Messages Associated with SQLSTATE 42704............................................................................ 677
Error Messages Associated with SQLSTATE 42710............................................................................ 680
Error Messages Associated with SQLSTATE 42712............................................................................ 681
Error Messages Associated with SQLSTATE 42723............................................................................ 681
Error Messages Associated with SQLSTATE 42725............................................................................ 682
Error Messages Associated with SQLSTATE 42803............................................................................ 682
Error Messages Associated with SQLSTATE 42804............................................................................ 683
Error Messages Associated with SQLSTATE 42809............................................................................ 684
Error Messages Associated with SQLSTATE 42830............................................................................ 685
Error Messages Associated with SQLSTATE 42846............................................................................ 685
Error Messages Associated with SQLSTATE 42883............................................................................ 686
Error Messages Associated with SQLSTATE 42939............................................................................ 687
Error Messages Associated with SQLSTATE 42P20 ........................................................................... 687
Error Messages Associated with SQLSTATE 42V01........................................................................... 689
Error Messages Associated with SQLSTATE 42V02........................................................................... 690
Error Messages Associated with SQLSTATE 42V03........................................................................... 690
Error Messages Associated with SQLSTATE 42V04........................................................................... 691
Error Messages Associated with SQLSTATE 42V06........................................................................... 691
Error Messages Associated with SQLSTATE 42V07........................................................................... 691
Error Messages Associated with SQLSTATE 42V08........................................................................... 692
Error Messages Associated with SQLSTATE 42V09........................................................................... 692
Error Messages Associated with SQLSTATE 42V10........................................................................... 692
Error Messages Associated with SQLSTATE 42V11........................................................................... 693
Error Messages Associated with SQLSTATE 42V13........................................................................... 693
Error Messages Associated with SQLSTATE 42V15........................................................................... 695
Error Messages Associated with SQLSTATE 42V16........................................................................... 695
Error Messages Associated with SQLSTATE 42V17........................................................................... 696
Error Messages Associated with SQLSTATE 42V18........................................................................... 697
Error Messages Associated with SQLSTATE 42V21........................................................................... 698
Error Messages Associated with SQLSTATE 42V25........................................................................... 698
-ix-
Programmer's Guide
Error Messages Associated with SQLSTATE 42V26........................................................................... 699
Error Messages Associated with SQLSTATE 53000............................................................................ 700
Error Messages Associated with SQLSTATE 53100............................................................................ 700
Error Messages Associated with SQLSTATE 53200............................................................................ 701
Error Messages Associated with SQLSTATE 54000............................................................................ 702
Error Messages Associated with SQLSTATE 54001............................................................................ 702
Error Messages Associated with SQLSTATE 54011............................................................................ 703
Error Messages Associated with SQLSTATE 54023............................................................................ 703
Error Messages Associated with SQLSTATE 55000............................................................................ 704
Error Messages Associated with SQLSTATE 55006............................................................................ 709
Error Messages Associated with SQLSTATE 55V02........................................................................... 710
Error Messages Associated with SQLSTATE 55V03........................................................................... 710
Error Messages Associated with SQLSTATE 55V04........................................................................... 711
Error Messages Associated with SQLSTATE 57014............................................................................ 711
Error Messages Associated with SQLSTATE 57V01........................................................................... 712
Error Messages Associated with SQLSTATE 57V03........................................................................... 712
Error Messages Associated with SQLSTATE 58030............................................................................ 713
Error Messages Associated with SQLSTATE 58V01........................................................................... 714
Error Messages Associated with SQLSTATE V1001........................................................................... 715
Error Messages Associated with SQLSTATE V1002........................................................................... 715
Error Messages Associated with SQLSTATE V1003........................................................................... 716
Error Messages Associated with SQLSTATE V2000........................................................................... 716
Error Messages Associated with SQLSTATE V2001........................................................................... 717
Error Messages Associated with SQLSTATE VC001 .......................................................................... 717
Error Messages Associated with SQLSTATE VD001.......................................................................... 718
Error Messages Associated with SQLSTATE VP000 .......................................................................... 725
Error Messages Associated with SQLSTATE VP001 .......................................................................... 726
Error Messages Associated with SQLSTATE VX001.......................................................................... 727
Error Messages Associated with SQLSTATE VX002.......................................................................... 728
Error Messages Associated with SQLSTATE VX003.......................................................................... 729
Copyright Notice
730
-x-
Programmer's Guide
Welcome to the HP Vertica Programmer's Guide. This book describes how to connect to an HP
Vertica Analytics Platform database and run SQL statements.
Prerequisites
This document assumes that you have installed and configured HP Vertica as described in the
Installation Guide. It assumes that you are familiar with the basic concepts and terminology of the
SQL language and relational database management systems.
-11-
12
Syntax Conventions
The following are the syntax conventions used in the HP Vertica documentation.
Syntax Convention
Description
Text without brackets/braces
Indicates content you type, as shown.
< Text inside angle brackets > Represents a placeholder for which you must supply a value. The
variable is usually shown in italics. See Placeholders below.
[ Text inside brackets ]
Indicates optional items; for example, CREA TE TABLE
[schema_name.]table_name
The brackets indicate that the schema_name is optional. Do not type
the square brackets.
{ Text inside braces }
Indicates a set of options from which you choose one; for example:
QUOTES { ON | OFF } indicates that exactly one of ON or OFF must
be provided. You do not type the braces: QUOTES ON
Backslash \
Represents a continuation character us ed to indicate text that is too
long to fit on a single line.
Ellipses ...
Indicate a repetition of the previous parameter. For example,
option[,...] means that you can enter multiple,
comma-separated options.
Showing ellipses in code examples might also mean that part of the
text has been omitted for readability, such as in multi-row result sets.
Indentation
Is an attempt to maximize readability; SQL is a free-form language.
Placeholders
Represent items that must be replaced wit h appropriate identifiers or
expressions and are usually shown in italics.
Vertical bar |
Is a separator for mutually exclusive items. For example: [ ASC |
DESC ]
Choose one or neither. You do not type the square brackets.
-12-
HP Vertica Client Libraries
The HP Vertica client driver libraries provide interfaces for connecting your client applications (or
third-party applications such as Cognos and MicroStrategy) to your HP Vertica database. The
drivers simplify exchanging data for loading, report generation, and other common database
tasks.
There are three separate client drivers:



Open Database Connectivity (ODBC)—the most commonly-used interface for third-party
applications and clients written in C, Python, PHP, Perl, and most other languages.
Java Database Connectivity (JDBC)—used by clients written in the Java programming
language.
ActiveX Data Objects for .NET (ADO.NET)—used by clients developed using Microsoft's .NET
Framework and written in C#, Visual Basic .NET, and other .NET languages.
Client Driver Standards
The HP Vertica client drivers are compatible with the following driver standards:


The ODBC driver complies with version 3.5.1 of the ODBC standard.
HP Vertica's JDBC driver is a type 4 driver that complies with the JDBC 3.0 standard. It is
compiled using JDK version 1.5, and is compatible with client applications compiled using JDK
versions 1.5 and 1.6.
 ADO.NET drivers conform to .NET framework 3.0 specifications.
The drivers do not support some of the optional features in the standards. See ODBC Feature
Support (page 55) and JDBC Feature Support (page 107) and Using ADO.NET (page 161) for
details.
Client Driver and Server Version Compatibility
Usually, each version of the HP Vertica server is compatible with the previous version of the client
drivers. This compatibility lets you upgrade your HP Vertica server without having to immediately
upgrade your client software. However, some new features of the new server version may not be
available through the old drivers.
The following table summarizes the compatibility of each recent version of the client drivers with
the HP Vertica server versions.
Client Driver Version
Compatible Server Versions
4.0
4.0, 4.1
4.1
4.1, 5.0, 5.1, 6.0, 6.1
5.1
5.1, 6.0, 6.1
Notes
There was no 5.0 client driver
release, so the 5.0 server shipped
with the 4.1 drivers.
-13-
Programmer's Guide
Client Driver Version
Compatible Server Versions
6.0
6.0, 6.1
6.1
6.1
Notes
Version 4.1 to 5.1 Client Driver Transition
The client driver libraries were completely rewritten for HP Vertica 5.1 to improve standards
compatibility and support more platforms. As a result, some of the classes, functions, properties,
and other elements of the driver APIs have been renamed or deprecated in favor of standard
ones. See Updating ODBC Client Code from Previous Driver Versions (page 55), Updating
ADO.NET Client Code from Previous Driver Versions (page 161), and Updating JDBC Client
Code from Previous Driver Versions (page 108) for details on updating your pre-5.1 client code
to work with the new client libraries.
HP Vertica ODBC/JDBC Client Installers
The ODBC/JDBC client drivers are a separate installation from the ADO.NET drivers. (ADO.NET
support is not available in Community Edition.) As noted in the compatibility table, the 6.x
ODBC/JDBC client drivers do not support access to a non HP Vertica 6.x database and
above. For example, you cannot use the new 6.x ODBC/JDBC client drivers to access an HP
Vertica 5.x database. If you plan on having a mixed HP Vertica environment supporting both 5.x
and 6.x HP Vertica database, consider keeping the 5.x drivers installed.
ODBC/JDBC Multiple Version Installations
The following ODBC/JDBC drivers are supported on a single machine:


4.x and 5.x ODBC/JDBC drivers can be installed on the same machine.
4.x and 6.x ODBC/JDBC drivers can be installed on the same machine.
It is not possible to have both 5.x and 6.x ODBC drivers on a single machine. If you install the 6.x
version, it automatically overlays the existing 5.x installation, and any DSN defined against a 5.x
HP Vertica database is not supported.
HP Vertica ADO.NET Client Installers
Prior to version 6.x, ADO.Net drivers must be uninstalled prior to installing a later version of the
driver. The 6.x ADO.Net drivers require the HP Vertica database to be 6.0.0 or above. The
ADO.NET 6.x driver only supports access to an HP Vertica 6.x server. The ADO.NET 4.x plug-in
does not work with an HP Vertica 6.x server. If you plan on also using the ODBC bridge and you
need to access both HP Vertica 5.x and 6.x databases, consider keeping the 5.x versions of the
ODBC/JDBC drivers for the reasons stated previously.
-14-
15
Installing the HP Vertica Client Drivers
You must install the HP Vertica client drivers to access HP Vertica from your client application.
The drivers create and maintain connections to the database and provide APIs that your
applications use to access your data. The client drivers support connections using JDBC, ODBC,
and ADO.NET (page 161).
Client Driver Standards
The client drivers support the following standards:



ODBC drivers conform to ODBC 3.5.1 specifications.
JDBC drivers conform to JDK 5 specifications.
ADO.NET drivers conform to .NET framework 3.0 specifications.
The remainder of this section explain the requirements for the HP Vertica client drivers, and the
procedure for downloading, installing, and configuring them.
Driver Prerequisites
The following topics explain the system requirements for the client drivers. You need to ensure
that your client system meets these requirements before installing and using the client drivers.
ODBC Prerequisites
There are several requirements your client systems must meet before you can install the HP
Vertica ODBC drivers.
Operating System
The HP Vertica ODBC driver requires a supported platform. The list of currently-supported
platforms can be found on the myVertica portal http://my.vertica.com/.
ODBC Driver Manager
The HP Vertica ODBC driver requires that the client system have a supported driver manager.
See the myVertica portal http://my.vertica.com/ for a list of supported driver managers.
UTF-8, UTF-16 and UTF-32 Support
The HP Vertica ODBC driver is a universal driver that supports UTF-8, UTF-16, and UTF-32
encoding. The default setting depends on the client platform. (see Additional ODBC Driver
Configuration Settings (page 49) for more information).
When using the driver with the DataDirect Connect driver manager, DataDirect Connect adapts to
the ODBC driver's text encoding settings. You should configure the ODBC driver to use the
encoding method that your application requires. This allows strings to be passed between the
driver and the application without intermediate conversion.
-15-
Programmer's Guide
See Also
Client Driver Install Procedures (page 19)
Programming ODBC Client Applications (page 54)
Creating an ODBC Data Source Name (DSN) (page 29)
Microsoft Connectivity Pack/ADO.NET Prerequisites
The HP Vertica driver for the Microsoft Connectivity Pack and ADO.NET requires the following
software and hardware components:
Operating System
The HP Vertica Microsoft Connectivity Pack and ADO.NET driver requires a supported Windows
operating system. The list of supported platforms can be found in the Supported Platforms
document at http://www.vertica.com/documentation
(http://www.vertica.com/documentation).
Memory
HP Vertica suggests a minimum of 512MB of RAM.
.NET Framework
The requirements for the .NET framework for the Microsoft Connectivity Pack and ADO.NET in HP
Vertica can be found in the Supported Platforms document at
http://www.vertica.com/documentation (http://www.vertica.com/documentation).
If you are using Windows 2008 R2, then the .NET framework is included as part of the operating
system. However, the .NET framework is not enabled by default. You must enable the .NET
framework on Windows 2008 R2. See the following page for details on enabling the .NET
framework in Windows 2008 R2:
http://blogs.msdn.com/b/sqlblog/archive/2010/01/08/how-to-install-net-framework-3-5-sp1on-windows-server-2008-r2-environments.aspx
http://blogs.msdn.com/b/sqlblog/archive/2010/01/08/how-to-install-net-framework-3-5-sp1on-windows-server-2008-r2-environments.aspx
Visual Studio (optional)
The Microsoft Connectivity Pack and ADO.NET installer provides a Visual Studio 2008 plugin
called Visual Studio Shell Integration. The plugin allows you to use HP Vertica as a Visual Studio
Data Source. The connection properties for the plugin are the same as ADO.NET connection
properties (page 166). You must have Visual Studio 2008 and the Visual Studio 2008 SDK
installed prior to installing the HP Vertica Microsoft Connectivity Pack/ADO.NET client
driver. Once the plugin is installed, you can access it from within Visual Studio 2008 at Data > Add
New Data Source... > Database.
The Microsoft Connectivity Pack/ADO.NET installer provides Microsoft Business Intelligence
Studio (BIDS) components. These component require Visual Studio be installed and SQL Server
is installed with the BIDS shared feature. See below for additional details about BIDS.
-16-
HP Vertica C lient Libraries
Microsoft SQL Server (optional)
Version 6.0.0.1+ of the Microsoft Connectivity Pack/ADO.NET installer provides an enhanced HP
Vertica .NET data-provider that enables support for the following:


Microsoft Analysis Services (SSAS)
Integration Services (SSIS) - which provides SQL Server Type Mappings to map data types
between HP Vertica and SQL Server.
 Reporting Services (SSRS)
 Microsoft Business Intelligence Development Studio (BIDS)
To use the enhanced HP Vertica .NET data provider, SQL Server must be installed prior to
installing the MICROSOFT CONNECTIVITY PACK/ADO.NET client driver, and the following
optional components must be installed on the SQL Server:




For SSAS functionality - install the Analysis Services Instance Feature
For SSRS - install the Reporting Services Instance Feature
For SSIS (Data Type Mappings) - install the SQL Server Integration Services Shared Feature
For BIDS - install the Business Intelligence Development Studio Shared Feature (also requires
that Microsoft Visual Studio 2008 be installed prior to installing the MICROSOFT
CONNECTIVITY PACK/ADO.NET Client Driver)
See Also
Installing ADO.NET Driver on Windows
Microsoft Connectivity Pack
Using ADO.NET (page 161)
Python Prerequisites
Python is a free, agile, object-oriented, cross-platform programming language designed to
emphasize rapid development and code readability. Python has been released under several
different open source licenses.
HP Vertica's ODBC driver is tested with Python version 2.4. It should work with versions 2.4
through 2.7. It may also work with Python version 3.0, but this is untested.
Python Driver
HP Vertica requires the pyodbc driver module. See your system's Python documentation for
installation and configuration information.
Supported Operating Systems
The HP Vertica ODBC driver requires one of the operating systems listed in ODBC Prerequisites
(page 15).
For usage and examples, see Using Python (page 211).
-17-
Programmer's Guide
Perl Prerequisites
Perl is a free, stable, open source, cross-platform programming language licensed under its
Artistic License, or the GNU General Public License (GPL).
Your Perl scripts access HP Vertica through its ODBC driver, using the Perl Database Interface
(DBI) module with the ODBC Database Driver (DBD::ODBC). The HP Vertica ODBC driver is
known to be compatible with these versions of Perl:


5.8
5.10
Later Perl versions may also work.
Perl Drivers
The following Perl driver modules have been tested with the HP Vertica ODBC driver:


The DBI driver module, version 1.609
The DBD::ODBC driver module, version 1.22
Other versions may also work.
Supported Client Systems
The HP Vertica ODBC driver requires one of the operating systems and driver managers listed in
ODBC Prerequisites (page 15).
PHP Prerequisites
PHP is a widely-used general-purpose scripting language that is especially suited for Web
development and can be embedded into HTML. PHP is licensed under the PHP License, an
open-source BSD-style license certified by the Open Source Initiative.
PHP Modules
The following PHP modules are required:





php
php-odbc
php-pdo
UnixODBC (if you are using the Unix ODBC driver)
libiodbc (if you are using the iODBC driver)
Supported Client Systems
The HP Vertica ODBC driver requires one of the operating systems and driver managers listed in
ODBC Prerequisites (page 15).
-18-
HP Vertica C lient Libraries
Installing the Client Drivers
How you install client drivers depends on the client's operating system:



For Linux and UNIX clients, you must first install a Linux driver manager (page 19). After
you have installed the driver manager, there are two different ways to install the client drivers:
 On Red Hat Enterprise Linux 5, 64-bit and SUSE Linux Enterprise Server 10/11 64-bit, you
can use the HP Vertica client RPM package to install the ODBC and JDBC drivers as well
as the vsql client.
 On other Linux platforms and UNIX-like platforms you can download the ODBC and JDBC
drivers and install them individually.
Note: The ODBC and JDBC client drivers are installed by the server .rpm files. If you have
installed HP Vertica Analytics Platform on your Linux system for development or testing
purposes, you do not need to download and install the client drivers on it—you just need to
configure the drivers. To use ODBC, you need to create a DSN (see Creating an ODBC DSN
for Linux, Solaris, AIX, and HP-UX (page 29)). To use JDBC, you need to add the JDBC
client driver to the Java CLASSPATH (see Modifying the Java CLASSPATH (page 26)).
On Windows clients, download the 32-bit or 64-bit client installer. The installer provides the
ODBC and and JDBC drivers.
There is an additional Windows installer for the MICROSOFT CONNECTIVITY
PACK/ADO.NET client driver. 32-bit and 64-bit versions of the installer are available.
ADO.NET is only available for the Enterprise edition of HP Vertica.
The remainder of this section describes how to install client drivers on different operating systems.
Installing Driver Managers Linux and other UNIX-like Platforms
If your client platform does not already have a ODBC driver manager, you need to install one
before you can use the HP Vertica ODBC client driver. The driver manager provides an interface
between your client operating system and the ODBC drivers. See the Supported Platforms
(http://myvertica.vertica.com) documentation at http://www.vertica.com/documentation
(http://www.vertica.com/documentation) for a list of driver managers that are supported on
each of the client platforms.
HP Vertica provides reference implementations of supported driver managers in binary format.
These binaries are built by HP Vertica and used when testing the product. The driver managers
are not modified by HP Vertica and are built with the default settings.
The driver managers can be downloaded from the Download tab of the myVertica portal
http://my.vertica.com/.
HP Vertica provides these binaries as a convenience for users in the event that the supported
versions of the driver managers cannot be easily obtained from the original developers. There is
no benefit to using the HP Vertica provided binaries if you installed the same or a compatible
version of the supported driver-manager as part of your operating system.
-19-
Programmer's Guide
The binaries provided by HP Vertica must be installed and configured manually. For example, the
.so files must be in a directory that is in the operating system’s "lib" search path, for example,
/usr/lib, or other location specified by the shell's library search path variable
(LD_LIBRARY_PATH). HP Vertica does not provide instructions for installing and configuring
these third party binaries. See the respective websites for the driver managers for installation and
configuration information:


UnixODBC: http://www.unixodbc.org/ http://www.unixodbc.org/
iODBC: http://www.iodbc.org http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/
Installing ODBC Drivers on Linux, Solaris, AIX, and HP-UX
Note: For additional details about supported platforms, see the Vertica Supported Platforms
document.
Read Driver Prerequisites (page 15) before you proceed.
For Red Hat Enterprise Linux and SUSE Linux Enterprise Server, you can download and install a
client RPM that installs both the ODBC and JDBC driver as well as the vsql client. See Installing
the Client RPM on Red Hat and SUSE (page 21).
Note: The ODBC and JDBC client drivers are installed by the server .rpm files. If you have
installed HP Vertica Analytics Platform on your Linux system for development or testing
purposes, you do not need to download and install the client drivers on it—you just need to
configure the drivers. To use ODBC, you need to create a DSN (see Creating an ODBC DSN
for Linux, Solaris, AIX, and HP-UX (page 29)). To use JDBC, you need to add the JDBC
client driver to the Java CLASSPATH (see Modifying the Java CLASSPATH (page 26)).
On the myVertica portal http://my.vertica.com/, the ODBC driver installation packages are
broken down by client platform. The package's filename is named based on its operating system
and architecture (for example, vertica_6.1.x.xx_odbc_x86_64_linux.tar.gz)
Installation Procedure
1 Open a Web browser and log in to myVertica portal http://my.vertica.com/.
2 Click the Download tab and locate and download the driver package that corresponds to your
client system.
3 If you did not directly download to the client system, transfer the downloaded file to it.
4 Log in to the client system as root.
5 If the directory /opt/vertica/ does not exist, create it:
# mkdir -p /opt/vertica/
6 Copy the downloaded file to the /opt/vertica/ directory. For example:
# cp vertica_6.1.x.xx_odbc_x86_64_linux.tar.gz /opt/vertica/
7 Change to the /opt/vertica/ directory:
-20-
HP Vertica C lient Libraries
# cd /opt/vertica/
8 Uncompress the file you downloaded. For example:
$ tar vzxf vertica_6.1.x.xx _odbc_x86_64_linux.tar.gz
Two folders are created: one for the include file, and one for the library files. The path of the
library file depends on the processor architecture: lib for 32-bit libraries, and lib64 for 64-bit
libraries. So, a 64-bit driver client download creates the directories:
 /opt/vertica/include, which contains the header file
 /opt/vertica/lib64, which contains the library file
Post Driver Installation Configuration
You must configure the ODBC driver before you can use it. There are two required configuration
files:

The odbc.ini configuration file defines the Data Source Names (DSNs) that tell the ODBC
how to access your HP Vertica databases. See Creating an ODBC Data Source Name (page
29) for instructions to create this file.
 The vertica.ini configuration file defines some HP Vertica-specific settings required by
the drivers. See Additional ODBC Driver Configuration Settings (page 49) for instructions
to create this file.
Note: If you are upgrading your ODBC driver, you must either update your DSNs to point to the
newly-installed driver or create new DSNs. If your odbc.ini file references drivers defined in
an odbcinst.ini file, you just need to update the odbcinst.ini file. See Creating an
ODBC Data Source Name (DSN) (page 29) for details.
Installing the Client RPM on Red Hat and SUSE
For Red Hat Enterprise Linux and SUSE Linux Enterprise Server, you can download and install a
client driver RPM that installs both the ODBC and JDBC driver libraries and the vsql client.
To install the client driver RPM package:
1 Open a Web browser and log in to the myVertica portal http://my.vertica.com/.
2 Click the Download tab and download the client RPM file that matches your client platform's
architecture.
Note: The 64-bit client driver RPM installs both the 64-bit and 32-bit ODBC driver libraries, so
you do not need to install both on your 64-bit client system.
3 If you did not directly download the RPM on the client system, transfer the file to the client.
4 Log in to the client system as root.
5 Install the RPM package you downloaded:
# rpm -Uvh package_name.rpm
-21-
Programmer's Guide
Note: You receive one or more conflict error messages if there are existing HP Vertica client
driver files on your system. This can happen if you are trying to install the client driver package
on a system that has the server package installed, since the server package also includes the
client drivers. In this case, you don't need to install the client drivers, and can instead use the
drivers installed by the server package. If the conflict arises from an old driver installation or
from a server installation for an older version, you can use the rpm command's --force switch
to force it to overwrite the existing files with the files in the client driver package.
Once you have installed the client package, you need to create a DSN (see Creating an ODBC
DSN for Linux, Solaris, AIX, and HP-UX (page 29)) and set some additional configuration
parameters (see Additional ODBC Driver Configuration Settings (page 49)) to use ODBC. To
use JDBC, you need to modify your class path (see Modifying the Java CLASSPATH (page 26))
before you can use JDBC.
You may also want to add the vsql client to your PATH environment variable so that you do not
need to enter its full path to run it. You add it to your path by adding the following to the .profile
file in your home directory or the global /etc/profile file:
export PATH=$PATH:/opt/vertica/bin
Installing JDBC Driver on Linux, Solaris, AIX, and HPUX
Note: The ODBC and JDBC client drivers are installed by the server .rpm files. If you have
installed HP Vertica Analytics Platform on your Linux system for development or testing
purposes, you do not need to download and install the client drivers on it—you just need to
configure the drivers. To use ODBC, you need to create a DSN (see Creating an ODBC DSN
for Linux, Solaris, AIX, and HP-UX (page 29)). To use JDBC, you need to add the JDBC
client driver to the Java CLASSPATH (see Modifying the Java CLASSPATH (page 26)).
Note: For additional details about supported platforms, see the Vertica Supported Platforms
document.
The JDBC driver is available for download from myVertica portal http://my.vertica.com/. There
is a single .jar file that works on all platforms and architectures. To download and install the file:
1 Open a Web browser and log in to myVertica portal http://my.vertica.com/.
2 Click the Download tab and locate and download the JDBC driver.
3 You need to copy the .jar file you downloaded file to a directory in your Java CLASSPATH
http://en.wikipedia.org/wiki/Classpath_%28Java%29 on every client system with which you
want to access HP Vertica. You can either:
 Copy the .jar file to its own directory (such as /opt/vertica/java/lib) and then add
that directory to your CLASSPATH (recommended). See Modifying the Java
CLASSPATH (page 26) for details.
 Copy the .jar file to directory that is already in your CLASSPATH (for example, a
directory where you have placed other .jar files on which your application depends).
-22-
HP Vertica C lient Libraries
Note: In the directory where you copied the .jar file, you should create a symbolic link named
vertica_jdk_5.jar to the .jar file. You can reference this symbolic link anywhere you
need to use the name of the JDBC library without having to worry any future upgrade
invalidating the file name. This symbolic link is automatically created on server installs. On
clients, you need to create and manually maintain this symbolic link yourself if you installed the
driver manually. The Installing the Client RPM on Red Hat and SUSE (page 21) create this
link when they install the JDBC library.
Installing ODBC/JDBC Client Drivers and vsql Client on Windows
This topic details how to download and install the HP Vertica ODBC/JDBC client drivers and vsql
client for Windows systems. The installer can be run as a regular windows installer or silently. The
Windows client drivers (ODBC, JDBC, etc.) are all installed using a single installer. There are
32-bit and 64-bit installers available. The 32-bit installer provides a 32-bit driver. The 64-bit
installer provides both 32-bit and 64-bit drivers.
Read Driver Prerequisites (page 15) before you proceed.
Note: If you are uninstalling a previous release of the HP Vertica ODBC/JDBC drivers, delete
any DSNs associated with those drivers before you uninstall. Windows requires that the driver
files be present when a DSN is removed. If you uninstall the driver first, then the DSN cannot be
removed. For releases after 5.1 you do not need to uninstall the drivers, as the installation
program upgrades the existing 5.1+ drivers in place.
To download the Windows client-drivers:
1 Open a Web browser and log in to myVertica portal http://my.vertica.com/.
2 Click the Download tab and select the ODBC/JDBC Windows installer that you want to install
(32-bit or 64-bit) and follow the on-screen prompts to download the installer.
To install the Windows client-drivers and vsql client:
1 As a Windows Administrator, double-click the installer to start the install process.
2 The introduction screen appears. Click Next to begin the installation.
3 Read the license agreement and check the appropriate radio box. Click Next to continue.
4 Optionally change the installation directory and click Next. The default directory is C:\Program
Files (x86)\Vertica Systems\.
5 Select the components to install and click Next. By default all components are selected.
6 Click Install to install the options you selected.
7 Click Finish.
To silent-install the Windows client-drivers and vsql client:
1 As a Windows Administrator, open a command-line session and change directory to the folder
that contains the installer.
2 Run the command vertica_client_drivers_[VERSION].exe /S /v/qn
-23-
Programmer's Guide
3 The drivers are silently installed in C:\Program Files\Vertica Systems\. If you install the 32-bit
drivers on a 64-bit system, then those drivers are installed to C:\Program Files (x86)\Vertica
Systems\. The driver appears in the list of installed programs and is now available in the ODBC
control panel.
After you install:
 The client drivers are available in the ODBC and JDBC folders of the installation directory.
 There is no shortcut for the vsql client. vsql is added to the windows PATH environment
variable. Start a command window and type vsql -? at the command prompt to start vsql and
show the help list. See vsql Notes for Windows Users for important details about using vsql
in a Windows console.
You must perform an additional step for some of the client drivers before you use them:


For ODBC, create a new Data Source Name (page 29) (DSN).
For JDBC, Modifying the Java CLASSPATH (page 26).
Installing the Microsoft Connectivity Pack and ADO.NET Driver on Windows
This topic details how to download and install the HP Vertica the Microsoft Connectivity Pack and
ADO.NET client driver for Windows systems. The installer can be run as a regular windows
installer or silently. There are 32-bit and 64-bit installers available. Use the correct installer for
your operating system, i.e. 32-bit systems must use the 32-bit installer.
Read Microsoft Connectivity Pack/DO.NET Prerequisites (page 16) before you proceed.
For more details about the Microsoft Connectivity Pack, see Microsoft Connectivity Pack in the
Programming ADO.NET Applications section of this document.
NOTE: If you are upgrading the driver from a version prior to 6.0, then you must first uninstall
the current driver before installing the new driver. The installer does not upgrade old versions
of the driver. For example, if you have the 5.1 ADO.NET client driver installed, and you want to
install the 5.1.5 Microsoft Connectivity Pack/ADO.NET driver, then you must first use the
Windows uninstall utility in control panel to uninstall the existing 5.1 ADO.NET driver.
NOTE: If you are installing the SSAS, SSIS, or SSRS components and the server contains
more than one installation of SQL Server, then the installer installs the components on the first
installation of SQL Server based on the order provided in the Windows registry.
To download the Microsoft Connectivity Pack/ADO.NET client-driver:
1 Open a Web browser and log in to the myVertica portal http://my.vertica.com/.
2 Click the Download tab and select the ADO.NET Windows installer that you want to install
(32-bit or 64-bit) and follow the on-screen prompts to download the installer. Use the 64-bit
version on 64-bit Windows machines and the 32-bit installer on 32-bit Windows machines.
To install the Microsoft Connectivity Pack/ADO.NET client-driver:
1 As a Windows Administrator, double-click the installer to start the install process.
2 The introduction screen appears. Click Next to begin the installation.
-24-
HP Vertica C lient Libraries
3 Optionally change the installation directory and click Next. The default directory is C:\Program
Files\Vertica Systems\.
Note: The driver is also installed into the Windows Global Assembly Cache (GAC).
4 Select the components to install and click Next. By default all components are selected. In
Version 6.0.0.1+, additional components can be selected for installation depending on which
Microsoft components are already installed on the server. If the prerequisites software is not
installed, then the additional components are not shown. See Microsoft Connectivity
Pack/ADO.NET Prerequisites (page 16) for details on optional prerequisites for the
ADO.NET client driver. The image below details all of the components that can be installed
using the 64-bit installer:
5 Click Install to install the options you selected.
6 Click Finish.
To silent-install the Windows client-drivers:
1 As a Windows Administrator, open a command-line session and change directory to the folder
that contains the installer.
2 Run the command [ADO.NET_INSTALLER_FILENAME].exe /S /v/qn
3 The drivers are silently installed in C:\Program Files\Vertica Systems\ and also in the Global
Assembly Cache.
After you install:
The drivers are available in the ADO.NET folder of the installation directory.
See Also
HP Vertica - Microsoft Integration
-25-
Programmer's Guide
Modifying the Java CLASSPATH
The CLASSPATH environment variable contains the list of directories where the Java run time
looks for library class files. For your Java client code to access HP Vertica, you need to add the
directory where the HP Vertica JDBC .jar file is located.
Note: In your CLASSPATH, use the symbolic link vertica_jdk_5.jar that points to the
JDBC library .jar file, rather than the .jar file itself. Using the symbolic link ensures that any
updates to the JDBC library .jar file (which will use a different filename) will not invalidate
your CLASSPATH setting, since the symbolic link's filename will remain the same. You just
need to update the symbolic link to point at the new .jar file.
Linux, Solaris, AIX, HP-UX, and OS X
If you are using the Bash shell, use the export command to define the CLASSPATH variable:
# export CLASSPATH=/opt/vertica/java/lib/vertica_jdk_5.jar
If environment variable CLASSPATH is already defined, use the following command to prevent it
from being overwritten:
# export CLASSPATH=$CLASSPATH:/opt/vertica/java/lib/vertica_jdk_5.jar
If you are using a shell other than Bash, consult its documentation to learn how to set environment
variables.
You need to either set the CLASSPATH environment variable for every login session, or insert the
command to set the variable into one of your startup scripts (such as ~/.profile or
/etc/profile).
Windows
Provide the class paths to the .jar, .zip, or .class files.
C:> SET CLASSPATH=classpath1;classpath2...
For example:
C:> SET CLASSPATH=C:\java\MyClasses\vertica_jdk_5.jar
As with the Linux/UNIX settings, this setting only lasts for the current session. To set the
CLASSPATH permanently, set an environment variable:
1
2
3
4
5
On the Windows Control Panel, click System.
Click Advanced or Advanced Systems Settings.
Click Environment Variables.
Under User variables, click New.
In the Variable name box, type CLASSPATH.
6 In the Variable value box, type the path to the HP Vertica JDBC .jar file on your system (for
example, C:\Program Files (x86)\HP Systems\JDBC\vertica_jdk_5.jar)
-26-
HP Vertica C lient Libraries
Specifying the Library Directory in the Java Command
There is an alternative way to tell the Java run time where to find the HP Vertica JDBC driver other
than changing the CLASSPATH environment variable: explicitly add the directory containing the
.jar file to the java command line using either the -cp or -classpath argument. For example,
on Linux, start your client application using:
# java -classpath /opt/vertica/java/lib/vertica_jdk_5.jar myapplication.class
Your Java IDE may also let you add directories to your CLASSPATH, or let you import the HP
Vertica JDBC driver into your project. See your IDE documentation for details.
Installing the JDBC Driver on Macintosh OS X
To install the HP Vertica JDBC driver on your Macintosh OS X client system, download the
cross-platform JDBC driver .jar file to your system and ensure OS X's Java installation can find
it.
Downloading the JDBC Driver
To download the HP Vertica JDBC driver on Macintosh OS X:
1 On your Macintosh client system, open a browser and log into the myVertica portal
http://my.vertica.com/.
2 Navigate to the Downloads page, scroll to the Client Software download section, and click the
download link for the JDBC driver.
3 Accept the license agreement and wait for the download to complete.
Ensuring Java Can Find the JDBC Driver
In order for your Java client application to use the HP Vertica JDBC driver, the Java interpreter
needs to be able to find its library file. Choose one of these methods to tell the Java interpreter
where to look for the library:

Copy the JDBC .jar file you downloaded to either the system-wide Java Extensions folder
(/Library/Java/Extensions) or your user Java Extensions folder
(/Users/username/Library/Java/Extensions).

Add the directory containing the JDBC .jar file to the CLASSPATH environment variable
(see Modifying the Java CLASSPATH (page 26)).
Specify the directory containing the JDBC .jar using the -cp argument in the Java
command line you use to start your Java command line.

Installing the ODBC Driver on Macintosh OS X
The HP Vertica ODBC driver for Macintosh OS X is packaged as a gzipped tar archive (.tar.gz).
This driver works with both 32-bit and 64-bit applications. You need to extract the library files from
this archive onto your system.
Download the Driver
Follow these steps to download the HP Vertica ODBC driver for Macintosh:
-27-
Programmer's Guide
1 On your Macintosh client system, open a browser and log into the the myVertica portal
http://my.vertica.com/.
2 Navigate to the Downloads tab, scroll to the Client Software section, then click the download
link for the Macintosh OS X ODBC driver package.
3 Accept the license agreement, and wait for the download to complete.
Decide where to Install the Driver
Where you decide to install the driver depends on which users on the client Macintosh need to use
the HP Vertica ODBC driver:


If multiple users need to use the driver, install it in /Library/ODBC/vertica. You need to be
logged into an administrator account to install the driver in this location.
If just a single user needs to use the driver (or you do not have administrative privileges on the
client OS X system), install the driver here: /Users/username/Library/ODBC/vertica.
Unpack the Driver
To unpack the driver:
1 Log into the client Macintosh either with an administrator account (if installing the driver for
system-wide use) or as the user who needs to use the HP Vertica ODBC driver.
2 Open a Terminal window (in the Finder, click Applications > Utilities > Terminal).
3 Enter one of the following commands to create the target directory for the driver files:
 To install system-wide: mkdir -p /Library/ODBC/vertica
 To install for the current user: mkdir -p ~/Library/ODBC/vertica
4 Change to the directory you just created:
 For system-wide installs: cd /Library/ODBC/vertica
 For current user installs: cd ~/Library/ODBC/vertica
5 Unpack the .tar.gz file containing the ODBC driver using the command:
tar -xzf ~/Downloads/vertica_6.1.x.x_odbc_mac_tar.gz
Note: If you downloaded the driver .tar.gz file to a directory other than your Downloads
directory, or you downloaded it using another user account, change the path in the above tar
command to the path of the downloaded file.
After installing the ODBC driver, must create a DSN to be able to connect to your HP Vertica
database. See Creating an ODBC DSN for Macintosh OS X Clients (page 41).
Upgrading the HP Vertica Client Drivers
The HP Vertica client driver are usually updated for each new release of the HP Vertica server.
The client driver installation packages include the version number of the corresponding HP Vertica
server release. Usually, the drivers are forward-compatible with the next release, so your client
applications are still be able to connect using the older drivers after you upgrade to the next
version of HP Vertica Analytics Platform server. See Client Driver and Server Version
Compatibility (page 13) for details on which client driver versions work withe each version of HP
Vertica server.
-28-
HP Vertica C lient Libraries
You should upgrade your clients as soon as possible after upgrading your server, to take
advantage of new features and to maintain maximum compatibility with the server.
To upgrade your drivers, follow the same procedure you used to install them in the first place. The
new installation will overwrite the old. See the specific instructions for installing the drivers on your
client platform for any special instructions regarding upgrades.
Note: Installing new ODBC drivers does not alter existing DSN settings. You may need to
change the driver settings in either the DSN or in the odbcinst.ini file, if your client system
uses one. See Creating an ODBC Data Source Name (page 29) for details.
Using Legacy Drivers
The HP Vertica server supports connections from the previous version of the client drivers. For
example, the HP Vertica version 5.1 server works with the 4.1 client drivers, since they were the
drivers distributed with the previous version of the server. This backwards compatibility lets you
upgrade your HP Vertica database first, then later upgrade your clients.
If you have not yet updated your code to work with the new version of the HP Vertica client drivers,
you can continue to use the older drivers until you do. If you need to install your client application
on a new client system, you can download and install the older drivers. See myVertica portal
http://my.vertica.com/ to download the installers; find installation documentation at
http://www.vertica.com/documentation (http://www.vertica.com/documentation).
For detailed information on which the compatibility of different versions of the HP Vertica server
and HP Vertica client, see Client Driver and Server Version Compatibility (page 13).
Note: The support for a previous version of the drivers is usually eliminated in the next release
of HP Vertica. For example, the HP Vertica version 5.1 server does not support the version 4.0
drivers. You should update your client application to work with the new client drivers as soon as
possible.
Creating an ODBC Data Source Name (DSN)
A Data Source Name (DSN) is the logical name that is used by Open Database Connectivity
(ODBC) to refer to the driver and other information that is required to access data from a data
source. Whether you are developing your own ODBC client code or you are using a third-party tool
that needs to access HP Vertica using ODBC, you need to configure and test a DSN. The method
you use depends upon the client operating system you are using.
Creating an ODBC DSN for Linux, Solaris, AIX, and HP-UX
DSNs are defined on Linux, Solaris, and other UNIX-like platforms in a text file. Your client's driver
manager reads this file to determine how to connect to your HP Vertica database. The driver
manager usually looks for the DSN definitions in two places:

/etc/odbc.ini
-29-
Programmer's Guide

~/.odbc.ini (a file named .odbc.ini in the user's home directory)
The structure of these files is the same, only their location differs. If both files are present, the
~/.odbc.ini file usually overrides the system-wide /etc/odbc.ini file.
Note: See your ODBC driver manager's documentation for details on where these files should
be located and any other requirements.
odbc.ini file Structure
The odbc.ini is a text file that contains two types of lines:


Section definitions, which are text strings enclosed in square brackets.
Parameter definitions, which contain a parameter name, an equal sign, and then the
parameter's value.
The first section of the file is always named [ODBC Data Sources], and contains a list of all the
DSNs that the odbc.ini file defines. The parameters in this section are the names of the DSNs,
which appear as section definitions later in the file. The value is a text description of the DSN, and
has no function. For example, an odbc.ini file that defines a single DSN named HP VerticaDSN
could have this ODBC Data Sources section:
[ODBC Data Sources]
HPVerticaDSN = "vmartdb"
After the ODBC data sources section are sections that define each DSN. The name of a DSN
section must match one of the names defined in the ODBC Data Sources section.
Configuring the odbc.ini file:
To create or edit the DSN definition file:
1 Using the text editor of your choice, open odbc.ini or ~/.odbc.ini.
2 Create an ODBC Data Sources section and define a parameter whose name is the name of
the DSN you want to create and whose value is a description of the DSN. For example, to
create a DSN named VMart, you would enter:
[ODBC Data Sources]
VMart = "VMart database on HP Vertica"
3 Create a section whose name matches the DSN name you defined in step 2. In this section,
you add parameters that define the DSN's settings. The most commonly-defined parameters
are:
 Description – Additional information about the data source.
 Driver – The location and designation of the HP Vertica ODBC driver, or the name of a
driver defined in the odbcinst.ini file (see below). For future compatibility, you should
use the name of the symbolic link in the library directory (/opt/vertica/lib on 32-bit
clients, and /opt/vertica/lib64 on 64-bit clients), rather than the library file. For
example, the symbolic link for the 64-bit ODBC driver library is:
/opt/vertica/lib64/libverticaodbc.so
The symbolic link always points to the most up-to-date version of the HP Vertica client
ODBC library. Using the link ensures that you do not need to update all of your DSNs when
you update your client drivers.
-30-
HP Vertica C lient Libraries
Database – The name of the database running on the server. This example uses vmartdb
for the vmartdb.
 ServerName — The name of the server where HP Vertica is installed. Use localhost if HP
Vertica is installed on the same machine.
 UID – Either the database superuser (same name as database administrator account) or a
user that the superuser has created and granted privileges. This example uses the user
name dbadmin.
 PWD – The password for the specified user name. This example leaves the password field
blank.
 Port – The port number on which HP Vertica listens for ODBC connections. For example,
5433.
 ConnSettings – Can contain SQL commands separated by a semicolon. These
commands can be run immediately after connecting to the server.
 SSLKeyFile – The file path and name of the client's private key. This file can reside
anywhere on the system.
 SSLCertFile – The file path and name of the client's public certificate. This file can reside
anywhere on the system.
 Locale – The default locale used for the session. By default, the locale for the database is
en_US@collation=binary (English as in the United States of America). Specify the locale
as an ICU Locale. See the ICU User Guide (http://userguide.icu-project.org/locale) for a
complete list of parameters that can be used to specify a locale.
For example:

[VMart]
Description = Vmart Database
Driver = /opt/vertica/lib64/libverticaodbc.so
Database = vmartdb
Servername = host01
UID = dbadmin
PWD =
Port = 5433
ConnSettings =
SSLKeyFile = /home/dbadmin/client.key
SSLCertFile = /home/dbadmin/client.crt
Locale = en_GB
See DSN parameters (page 43) for a complete list of parameters including HP Vertica-specific
ones.
Using an odbcinst.ini File
Instead of giving the path of the ODBC driver library in your DSN definitions, you can use the name
of a driver defined in the odbcinst.ini file. This is a useful method if you have many DSNs and
often need to update them to point to new driver libraries. It also allows you to set some additional
ODBC parameters, such as the threading model.
Just as in the odbc.ini file, odbcinst.ini has sections. Each section defines an ODBC driver
that can be referenced in the odbc.ini files.
In a section, you can define the following parameters:
-31-
Programmer's Guide


Description — Additional information about the data source.
Driver — The location and designation of the HP Vertica ODBC driver. For example:
/opt/vertica/lib64/libverticaodbc.so
For example:
[HPVertica]
Description = HP Vertica ODBC Driver
Driver = /opt/vertica/lib64/libverticaodbc.so
Then in your odbc.ini file, you use the name of the section you created in the odbcinst.ini
file that describes the driver you want to use. For example:
[VMart]
Description = HP Vertica Vmart database
Driver = HPVertica
If you are using the unixODBC driver manager, you should also add an ODBC section to override
its standard threading settings. By default, unixODBC will serialize all SQL calls through ODBC,
which prevents multiple parallel loads. To change this default behavior, add the following to your
odbcinst.ini file:
[ODBC]
Threading = 1
Configuring Additional ODBC Settings
On Linux and UNIX systems, you need to configure some additional driver settings before you can
use your DSN. See Additional ODBC Driver Configuration Settings (page 49) for details.
Testing a DSN Using isql
The unixODBC driver manager includes a utility named isql, which is a simple ODBC
command-line client. It lets you to connect to a DSN to send commands and receive results,
similarly to vsql.
To use isql to test a DSN connection:
1 Run the following command:
$ isql –v DSNname
SQL>
Where DSNname is the name of the DSN you created.
A connection message and a SQL prompt display. If they do not, you could have a
configuration problem or you could be using the wrong user name or password.
2 Try a simple SQL statement. For example:
SQL> SELECT table_name FROM tables;
The isql tool returns the results of your SQL statement.
Note: If you have not set the ErrorMessagesPath in the additional driver configuration settings,
any errors during testing will trigger a missing error message file ("The error message
NoSQLGetPrivateProfileString could not be found in the en-US locale"). See Additional
ODBC Driver Configuration Settings (page 49) for more information.
-32-
HP Vertica C lient Libraries
Creating an ODBC DSN for Windows Clients
Creating a DSN for Microsoft Windows clients consists of:


Setting up a DSN (page 33)
Testing the DSN using Excel 2007 (page 38)
Setting Up a DSN
A Data Source Name (DSN) is the logical name that is used by Open Database Connectivity
(ODBC) to refer to the drive and other information that is required to access data. The name is
used by Internet Information Services (IIS) for a connection to an ODBC data source.
This section describes how to use the HP Vertica ODBC Driver to set up an ODBC DSN. This topic
assumes that the driver is already installed, as described in Installing Client Drivers on
Windows (page 23).
To set up a DSN:
1 Open the ODBC Administrator (For example, Start > Control Panel > Administrative Tools >
Data Sources (ODBC)).
Note: The method you use depends on the version of Windows you are using. Differences
between Windows versions and Start Menu customizations could require a different action to
open the ODBC Administrator
-33-
Programmer's Guide
2 If you want all users on your client system to be able to access to the DSN for the HP Vertica
database, click the System DSN tab.
Otherwise, click the User DSN tab to create a DSN that is only usable by your Windows user
account.
3 Click Add to create a new DSN to connect to the HP Vertica database.
4 Scroll through the list of drivers in the Create a New Data Source dialog to locate the HP
Vertica driver. Select the driver, and then click Finish.
-34-
HP Vertica C lient Libraries
Note: If you have installed more than one version of the HP Vertica client drivers on your
Windows client system, you may see multiple versions of the driver in this list. Choose the
version that you know is compatible with your client application. When in doubt, use the latest
version.
-35-
Programmer's Guide
5 Enter your data source information in the HP Vertica ODBC Driver Setup dialog.
The following list describes all the fields in the HP Vertica ODBC Driver Setup dialog:
 Data Source — The name for the DSN. Clients will use this name to identify the DSN to
which they want to connect.
 Description — Additional information that helps identify the data source.
 Server — The hostname or IP address of any active node within an HP Vertica database.
 Port — The port number on which HP Vertica listens for ODBC connections (5433, by
default).
 Database — The name of the HP Vertica database.
 User Name — The name of the user account to use when connecting to the database.
This account name is used to log into the database if the application does not supply its
own user name when connecting to the DSN.
 Password — The password for the account entered in the User Name field.
Note: Storing a password in the DSN is very insecure. You should only use this field to
temporarily enter a password to test the DSN.
-36-
HP Vertica C lient Libraries
Session Label — Sets a label for the connection on the server. This value appears in the
session_id column of the V_MONITOR.SESSIONS system table.
 Log Path — The absolute path to the log file.
 Result Buffer Size (KB) — Size of memory buffer for the large result sets in streaming
mode.
 Connection Settings — A string containing SQL commands that the driver should
execute immediately after connecting to the server. This parameter is often used to
configure the connection in some manner, such as setting a schema search path.
Note: In the connection string ';' is a reserved symbol. If you need to set multiple
parameters as part of the Connection Settings parameter, then use '%3B' in place of ';' and
use '+' for spaces.
 Read Only — When selected, prevents users of this data source from writing to the
database.
 Use Three-Part Naming — This setting controls how the database metadata APIs reports
the catalog name. When checked, the database name is returned as the catalog name in
the database metadata. When not checked, NULL is returned as the catalog name. Enable
this option if your client software expects to be able to get the catalog name from the
database metadata and use it as part of a three-part name reference.
 Log Level — The sort of messages between the client application and HP Vertica are
saved in a log. The default value is Off, meaning no messages are ever logged. The
options are listed in decreasing order of severity: Fatal (messages that indicate the
connection has failed) through Trace (all messages passed between the client and server
are logged).
 SSLMode — Can be set to prefer (try to connect using SSL first, then try to connect without
SSL), disable (do not use SSL), accept (try to connect using a non-SSL connection first,
then try to connect with SSL), or require (only use SSL).
 Transaction Isolation Level — Sets the transaction isolation level for the connection. See
Transactions in the Concepts Guide for an explanation of the different transaction
isolations.
The DSN and Database fields are required to be able to save the DSN.
6 If you want to test the connection, be sure to enter a valid User Name and Password, then
click Test Connection.

-37-
Programmer's Guide
7 When you have finished editing and testing the DSN, click Save. The Vertica ODBC Driver
Setup window closes, and your new DSN is listed in the ODBC Data Source Administrator
window.
8 Click OK to close the ODBC Data Source Administrator.
After creating the DSN, you can test it using Microsoft Excel 2007 (page 38).
Setting up a 32-Bit DSN on 64-Bit Versions of Windows
On 64-bit versions of Windows, the default ODBC Data Source Administrator creates and edits
DSNs that are tied to the 64-bit HP Vertica ODBC library. Attempting to use these 64-bit DSNs
with a 32-bit client application results in an architecture mismatch error. You need to create a
specific 32-bit DSN for 32-bit clients by running the 32-bit ODBC Administrator usually located at:
c:\Windows\SysWOW64\odbcad32.exe
This administrator window edits a set of DSNs that are tied to the 32-bit ODBC library. You can
then use your 32-bit client applications with the DSNs you create with this version of the ODBC
administrator.
Testing a DSN Using Excel 2007
This section uses Microsoft Excel 2007 to verify that an application can connect to an ODBC data
source. You can accomplish the same thing with any ODBC application.
1 Open Excel.
-38-
HP Vertica C lient Libraries
2 From the menu, select Data > Get External Data > From Other Sources > From Microsoft
Query.
3 Select VMart_Schema*, make sure the "Use the Query Wizard" check box is deselected and
click OK.
4 When the Add Tables window loads, click Close.
5 The Microsoft Query window opens; click the SQL button.
6 In the SQL window write any simple query to test your connection. This example uses the
following query:
SELECT DISTINCT calendar_year FROM date_dimension;
7 If you see the caution, "SQL Query can't be represented graphically. Continue anyway?" click
OK.
-39-
Programmer's Guide
The data values 2003, 2004, 2005, 2006, 2007 indicate that you successfully connected to and
ran a query through ODBC.
8 Click File > Return Data to Microsoft Office Excel.
9 In the Import Data dialog, click OK.
The data is now available for use in an Excel worksheet.
Editing DSN Entries
The HP Vertica ODBC user interface lets you set the most important parameters in your DSN. To
view and change DSN parameters:
1 Open the ODBC Data Source Administrator (Control Panel > Administrative Tools > Data
Sources (ODBC)).
2 Select the DSN you want to view or edit on the User DSN or System DSN tab.
3 Click Configure.
4 Make the changes you want to the DSN, then click OK.
-40-
HP Vertica C lient Libraries
Editing DSN Parameters Directly in the Windows Registry
There are some less-common DSN parameters that are not available through the user interface of
the HP Vertica ODBC client driver for Windows, such as Locale. You can add entries for these
parameters directly to the DSN's definition within the Window's registry.
To add a new entry to a DSN:
1 Click the Windows Start menu, click Run.
2 In the Run window, type regedit in the Open box.
3 Navigate to the the DSN you want to edit:
 System DSNs are located in HKEY_LOCAL_MACHINE\Software\ODBC\ODBC.INI
 User DSNs are located in HKEY_CURRENT_USER\Software\ODBC\ODBC.INI
4 Right-click in the right pane of the Registry Window, and select New > String Value.
5 Enter the name of the DSN parameter you want to enter.
6 Double-click the entry you just created, and enter a value for the parameter.
Creating an ODBC DSN for Macintosh OS X Clients
DSNs are defined on OS X in a text file named odbc.ini. The ODBC driver manager in OS X
reads this file to determine how to connect to your HP Vertica database. The driver manager looks
for the odbc.ini file in two locations:


System DSNs (those available for all users on the OS X client) are defined in
/Library/ODBC/odbc.ini
User DSNs (defined for just a single user) are defined in
/Users/username/ODBC/odbc.ini
The structure of these files is the same, only their location differs. If both files are present, the
entries in the user DSN usually override those in the system DSN.
odbc.ini file Structure
The odbc.ini is a text file that contains two types of lines:


Section definitions, which are text strings enclosed in square brackets.
Parameter definitions, which contain a parameter name, an equal sign, and then the value for
the parameter.
The first section of the file is always named [ODBC Data Sources], and contains a list of all the
DSNs that the odbc.ini file defines. The parameters in this section are the names of the DSNs,
which appear as section definitions later in the file. The value is a text description of the DSN, and
has no function. For example, an odbc.ini file that defines a single DSN named HP VerticaDSN
could have this ODBC Data Sources section:
[ODBC Data Sources]
HPVerticaDSN = VMart Database
Following the ODBC Data Sources section are sections that define each DSN. The name of a
DSN section must match one of the parameter names defined in the ODBC Data Sources section.
-41-
Programmer's Guide
Configuring the odbc.ini file:
To create or edit the DSN definition file:
1 Using the text editor of your choice, open /Library/ODBC/odbc.ini (to create a system
DSN entry) or /Users/username/ODBC/odbc.ini to create a user DSN entry).
Note: You must be a system administrator in order to define a system DSN.
2 Create an ODBC Data Sources section and create an entry for the DSN you want to create (for
example, HPVerticaDSN). This entry establishes the name by which the new data source is
referred. The value you assign to the DSN entry is just a comment that describes the DSN. For
example:
[ODBC Data Sources]
HPVerticaDSN = "vmartdb connection"
3 Create a section for the DSN, and add values that define the settings needed to connect to
your database. The following are the most commonly-defined parameters:
 Description – Additional information about the data source.
 Driver – The path to the HP Vertica ODBC driver library, or the name of a driver defined in
the odbcinst.ini file (see below). The location of the driver depends on whether it was
installed system-wide use, or just for an individual user:
If you installed the ODBC driver for system-wide use, its path should be
/Library/ODBC/vertica/lib/libverticaodbc.dylib
If you installed the ODBC driver for just the current user, its path should be
/Users/username/Library/ODBC/vertica/lib/libverticaodbc.dylib
The file name libverticaodbc.dylib is a symbolic link that always points to the most
up-to-date version of the HP Vertica client ODBC library. Using this link ensures that you do
not need to update all of your DSNs when you update your client drivers.
 Database – The name of the database running on the server.
 ServerName — The name of the server where HP Vertica is installed.
 UID – The HP Vertica user account to use when connecting.
 PWD – The password for the user given in the UID.
 Port – The port number on which HP Vertica listens for connections. This is usually 5433.
 Locale – The default locale used for the session. By default, the locale for the database is
en_US@collation=binary (English as in the United States of America). Specify the locale
as an ICU Locale. See the ICU User Guide http://userguide.icu-project.org/locale for a
complete list of parameters that can be used to specify a locale.
For example:
[VMart]
Description = Vmart Database
Driver = /Library/ODBC/vertica/lib/libverticaodbc.dylib
Database = vmartdb
Servername = host01
UID = dbadmin
PWD = password
Port = 5433
Locale = en_GB
-42-
HP Vertica C lient Libraries
See DSN parameters (page 43) for a complete list of parameters including HP Vertica-specific
ones.
Note: Instead of editing the odbc.ini and odbcinst.ini files in a text editor, you can install
and use Apple's ODBC Administrator Tool http://support.apple.com/kb/DL895.
Using an odbcinst.ini File
Instead of giving the path of the ODBC driver library in your DSN definitions, you can use the name
of a driver defined in the odbcinst.ini file. This is a useful method if you have many DSNs, and
often need to update them to point to new driver libraries. It also allows you to set some additional
ODBC parameters.
Just as in the odbc.ini file, odbcinst.ini has sections. Each section defines an ODBC driver
that can be referenced in the odbc.ini files.
In a section, you can define the following parameters:


Description — Additional information about the database driver.
Driver — The location of the HP Vertica ODBC driver library. For example:
/Library/ODBC/vertica/lib/libverticaodbc.dylib
For example:
[HPVerticaDriver]
Description = HP Vertica ODBC Driver
Driver = /Library/ODBC/vertica/lib/libverticaodbc.dylib
Then in your odbc.ini file, you use the name of the section you created in the odbcinst.ini
file that describes the driver you want to use. For example:
[VMart]
Description = HP Vertica Vmart database
Driver = HPVerticaDriver
Configuring Additional ODBC Settings
In addition to configuring the odbc.ini file, you need to configure some additional driver settings
before you can use your DSN. See Additional ODBC Driver Configuration Settings (page 49)
for details.
DSN Parameters
The following tables list the connection properties you can set in the DSNs for use with HP
Vertica's ODBC driver.
Required Connection Parameters
These connection parameters are the minimum required to create a connection the database. If
any parameters are missing from the DSN, they must be supplied by the function call or
connection string in order for the connection to HP Vertica to succeed.
-43-
Programmer's Guide
Parameters
Description
Default Value
Standard/HP Vertica
Driver
The file path and name of the
driver us ed.
none
Standard
Database
The name of the database
running on the server.
none
Standard
Servername
The hostname or IP address of
any active node within an HP
Vertica database; for example,
host01.
Note: you can also use the
aliases "server" and "host" for
this parameter.
none
Standard
Port
The port number on which HP
Vertica listens for ODB C
connections.
5433
Standard
Username
Either the database superuser
none
(same name as the database
administrator account) or a user
that the superuser has creat ed
and granted privileges.
Standard
Password
The password for the specified
user name. You may insert an
empty string to leave this
parameter blank.
Standard
none
Advanced Settings
Parameters
Description
Default
Standard/HP Vertica
AutoCommit
A Boolean value that controls
whet her the driver aut omatically
commits transactions after
executing a DML statement.
true
Standard
-44-
HP Vertica C lient Libraries
Parameters
Description
Default
ConnSettings
A string containing SQL
none
commands that the driver should
execute immediately aft er
connecting to the server. This
parameter is often used to
configure the connection in some
manner, such as setting a schema
search path.
Note: In the connection string ';' is
a reserved symbol. If you need to
set multiple paramet ers as part of
ConnSettings parameter use
'%3B' in place of ';'. Also use '+' for
spaces.
DirectBatchInsert
A Boolean that controls where
data inserted through the
connection is stored. When set to
true, HP Vertica directly inserts
data into ROS containers.
Otherwise, it stores data using
AUTO mode.
DriverStringConversi
ons
Cont rols whether the ODB C driver OUTPUT
performs type conversions on
strings sent between the ODB C
driver and the database. Possible
values are:
 NONE—no conversion in
either direction. this
results in the highest
performance.
 INP UT—strings sent from
the client to the server are
converted, but strings
sent from the server to the
client are not.
 OUTP UT—strings sent by
the server to the client are
converted, but strings
sent from the client to the
server are not.
 BOTH—strings are
converted in both
directions.
-45-
false
Standard/HP Vertica
HP Vertica
HP Vertica
HP Vertica
Programmer's Guide
Parameters
Description
Default
Locale
en_US@collation=binar HP Vertica
The locale used for the session.
y (English as in the United
Specify the locale as an ICU
Locale. See the ICU User Guide
States of America)
(http://userguide.icu-project.org/lo
cale) for a complete list of
parameters that can be used to
specify a locale.
ReadOnly
A true/false value that controls
whet her the connection can only
read data from HP Vertica.
ResultBufferSize
Size of memory buffer for the large 131072 (128KB)
result sets in streaming mode.
false
Standard/HP Vertica
HP Vertica
HP Vertica
Note: This parameter was
previously called
MaxMemoryCac he
Trans actionIsolation
Sets the transaction isolation for
the connection. Valid values are:
 Read Committed
 Serializable
 Server Default
Server Default
HP Vertica
See Changing Transaction
Isolation Levels in the
Administrator's Guide for an
explanation of transaction
isolation.
Identification
Parameters
Description
Default
Standard/HP Vertica
Description
An optional description for the
DSN entry.
none
Standard
none
HP Vertica
Insert an empty string to leave the
description empty.
Label / SessionLabel
Sets a label for the connection on
the server. This value appears in
the session_id column of the
V_MONITOR.SESSIONS system
table.
Note: Label and SessionLabel
are synonyms. They can be used
interchangeably.
-46-
HP Vertica C lient Libraries
Encryption
Parameters
Description
Default
Standard/HP Vertica
SSLMode
Cont rols whether the connection to prefer
the database uses SSL enc ryption.
Valid values are:
 require—Requires the
server to use SSL. If the
server cannot provide an
encrypted channel, the
connection fails.
 prefer—P refers the server
to use SSL. If the server
does not offer an encrypted
channel, the client
requests one. Note that the
first connection attempt to
the database tries to use
SSL. If that fails, a second
connection is attempted
over a clear channel.
 allow—Makes a
connection to the server
whet her the server uses
SSL or not. Not e that the
first connection attempt to
the database is attempted
over a clear channel. If that
fails, a second connection
is attempted over SSL.
 disable—Never connects
to the server using SSL.
This setting is typically
used for troubleshooting
HP Vertica
SSLKeyFile
The absolute path to the client's
private key file. This file can reside
anywhere on the system.
none
HP Vertica
SSLCertFile
The absolute path of the client's
public certific ate file. This file can
reside anywhere on the system.
none
HP Vertica
Third-Party Compatibility
Parameters
Description
Default
ColumnsAsChar
How character column types are
false
reported when the driver is in Unicode
mode. When set to false, the ODBC
-47-
Standard/HP Vertica
HP Vertica
Programmer's Guide
driver reports the data type of
character columns as WCHA R. If you
set ColumnsAsChar to true, the driver
identifies character column as CHA R.
This setting is normally used for
compatibility with some third-party
clients such as Informatica.
ThreePartNaming
A Boolean that controls how the
database metadata APIs reports the
catalog name. When set to true, the
database name is returned as the
catalog name in the database
metadat a. When set to false, NULL is
returned as the catalog name.
Enable this option of your client
software expects to be able to get the
catalog name from the database
metadat a and use it as part of a
three-part name reference.
false on Unix clients
HP Vertica
true on Windows clients
See Also
 Additional ODBC Driver Configuration Settings (page 49)
Setting DSN Parameters
The parameters in the following tables are common for all user and system DSN entries. The
examples provided are for Windows clients.
To edit DSN parameters:



On UNIX and Linux client platforms, you can edit the odbc.ini file. (See Creating an ODBC
DSN for Linux, Solaris, AIX, and HP-UX (page 29).) The location of this file is specific to the
driver manager.
On Windows client platforms, you can edit some DSN parameters using the HP Vertica ODBC
client driver interface. See Creating an ODBC DSN for Windows Clients (page 33).
You can also edit the DSN parameters directly by opening the DSN entry in the Windows
registry (for example, at HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\DSNname).
Directly editing the registry can be risky, so you should only use this method for parameters
cannot be set through the ODBC driver's user interface, or via your client code.
You can set parameters in the connection string when opening a connection using the
SQLDriverConnect() function:
sqlRet = SQLDriverConnect(sql_hDBC, 0,
(SQLCHAR*)"DSN=DSNName;Locale=en_GB",
SQL_NTS, szDNS, 1024,&nSize, SQL_DRIVER_NOPROMPT);
Note: In the connection string ';' is a reserved symbol. If you need to set multiple parameters as part
of ConnSettings parameter use '%3B' in place of ';'. Also use '+' instead of spaces.
-48-
HP Vertica C lient Libraries
For Example:
sqlRet = SQLDriverConnect(sql_hDBC, 0, (SQLCHAR*)"DSN=HP VerticaSQL;ConnSettings=
set+search_path+to+a,b,c%3Bset+locale=ch;SSLMode=prefer", SQL_NTS,
szDNS, 1024,&nSize, SQL_DRIVER_NOPROMPT);

Your client code can retrieve DSN parameter values after a connection has been made to HP
Vertica using the SQLGetConnectAttr() and SQLGetStmtAttr() API calls. Some
parameters can be set and using SQLSetConnectAttr() and SQLSetStmtAttr().
For details of the list of HP Vertica-specific parameters see HP Vertica-specific ODBC
Header File (page 58).
Additional ODBC Driver Configuration Settings
On Linux and UNIX platforms, in addition to the DSN settings, you need to provide some additional
settings to configure the HP Vertica ODBC client driver. These settings control the following:



The text encoding used by the driver manager (for example, UTF-8 or UTF-16).
The location of the directory containing the HP Vertica ODBC driver's error message files.
Whether and how the ODBC driver logs messages.
Note: The additional driver configuration settings are available on Windows platforms, but are
not necessary. You must supply the additional configuration settings on Linux/UNIX systems
before the ODBC drivers can function properly.
The topics in this section describe these settings in greater detail.
Location of the Additional Driver Settings
Where the additional driver settings are stored depends on your client platform:


On Linux and UNIX platforms, the settings are contained in a text file named vertica.ini
(although you can choose a different filename). You tell the HP Vertica ODBC driver where to
find this file using an environment variable named VERTICAINI.
On Windows platforms, the additional settings are set using the ODBC Data Source
Configuration window. The values for the settings are stored in the W indows registry under the
path HKEY_LOCAL_MACHINE\SOFTWARE\Vertica\Driver.
Creating a vertica.ini File
There is no standard location for the vertica.ini file—you can store the file anywhere that it is
convenient for you on your client system. One possible location is in the /etc directory if you have
multiple users on your client system that need to access it, or have a vertica.ini file in each
user's home directory so users can alter their own settings. Wherever you store it, be sure users
have read access to the file.
-49-
Programmer's Guide
The format of the vertica.ini file is similar to the odbc.ini file, with a section followed by
parameter definitions. Unlike the odbc.ini file, vertica.ini contains a single section named
Driver:
[Driver]
Following the section definition, you add setting definitions, one per line. A setting definition
consists of the setting name, followed by an equal sign (=), followed by the value. The value does
not need quotes. For example, to set the ODBCInstLib setting, you add a line like this:
ODBCInstLib=/usr/lib64/libodbcinst.so
See Additional Parameter Settings (page 51) for a list of the additional settings.
Required Settings
You must configure two settings in order for the ODBC driver to work correctly:


ErrorMessagesPath
ODBCInstLib (unless the driver manager's installation library is in a directory listed in the
LD_LIBRARY_PATH or LIB_PATH environment variables).
Also, if your driver manager does not use UTF-8 encoding, you need to set
DriverManagerEncoding to the proper encoding.
Setting the VERTICAINI Environment Variable
You must set an environment variable named VERTICAINI to the absolute path of the
vertica.ini file. The HP Vertica ODBC driver uses this variable to find the settings.
Where you set this variable depends on whether users on your client system need to have
separate vertica.ini files. If you want to have a single, system-wide vertica.ini file, you
can add a command to set VERTICAINI in /etc/profile or some other system-wide
environment file. For example:
export VERTICAINI=/etc/vertica.ini
If users need individual vertica.ini files, set VERTICAINI in their ~/.profile or similar
configuration file. For example:
export VERTICAINI=~/.vertica.ini
On Macintosh OS X client systems, you can set the VERTICAINI environment variable in each
user's ~/.MacOSX/environment.plist file. See the Environment Variables
http://developer.apple.com/library/mac/#documentation/MacOSX/Conceptual/BPRuntime
Config/Articles/EnvironmentVars.html entry in the Apple's Developer's Library for more
information.
Example vertica.ini File
The following example vertica.ini file configures the ODBC driver to:



use the 64-bit UnixODBC driver manager.
get its error messages from the standard HP Vertica 64-bit ODBC driver installation directory.
log all warnings and more severe messages to log files stored in the temporary directory.
-50-
HP Vertica C lient Libraries
[Driver]
DriverManagerEncoding=UTF-16
ODBCInstLib=/usr/lib64/libodbcinst.so
ErrorMessagesPath=/opt/vertica/lib64
LogLevel=4
LogPath=/tmp
Additional Parameter Settings
The following parameters can be set for the HP Vertica client drivers.
Logging Settings
These parameters control how messages between the client and server are logged. None of these
settings are required. If they are not set, then the client library does not log any messages. They
apply to both ADO.NET and ODBC.



LogLevel—The severity of messages that are logged between the client and the server. The
valid values are:
 0—No logging
 1—Fatal errors
 2—Errors
 3—Warnings
 4—Info
 5—Debug
 6—Trace (all messages)
The value you specify for this setting sets the minimum severity for a message to be logged.
For example, setting LogLevel to 3 means that the client driver logs all warnings, errors, and
fatal errors.
LogPath—The absolute path of a directory to store log files . For example:
/var/log/verticaodbc
LogNamespace—Limits logging to messages generated by certain objects in the client driver.
Note: These settings are also available for the HP Vertica JDBC driver through connection
properties. See Connection Properties (page 117) for details.
ODBC-specific Settings
The following settings are used only by the HP Vertica ODBC client driver.

DriverManagerEncoding—The UTF encoding standard that the driver manager uses. This
setting needs to match the encoding the driver manager expects. The available values for this
setting are:
 UTF-8
 UTF-16 (usually used by unixODBC)
 UTF-32 (usually used by iODBC)
See the documentation for your driver manager to find the correct value for this setting.
-51-
Programmer's Guide
Note: While both UTF-16 and UTF-8 are valid settings for DataDirect, Vertica recommends
that you set the DataDirect driver manager encoding to UTF-16.
If you do not set this parameter, the ODBC driver defaults to the value shown in the following
table. If your driver manager uses a different encoding, you must set this value for the ODBC
driver to be able to work.
Client Platform
Default Encoding
AIX 32-bit
UTF-16
AIX 64-bit
UTF-32
HPUX 32-bit
UTF-32
HPUX 64-bit
UTF-32
Linux x86 32-bit
UTF-32
Linux x86 64-bit
UTF-32
Linux Itanium 64-bit
UTF-32
OS X
UTF-32
Solaris x86 32-bit
UTF-32
Solaris x86 64-bit
UTF-32
Solaris SPARC 32-bit UTF-32
Solaris SPARC 64-bit UTF-32
Windows 32-bit
UTF-16
Windows 64-bit
UTF-16

ErrorMessagesPath—The absolute path to the parent directory that contains the HP Vertica
client driver's localized error message files. These files are usually stored in the same directory
as the HP Vertica ODBC driver files.
Note: This setting is required. If you do not set it, then any error the ODBC driver encounters
will result in an error message about a missing ODBCMessages.xml file.
 ODBCInstLib—The absolute path to the file containing the ODBC installer library (ODBCInst).
This setting is required if the directory containing this library is not set in the
LD_LIBRARY_PATH or LIB_PATH environment variables. The library files for the major driver
manager are:
 UnixODBC: libodbcinst.so
 iODBC: libiodbcinst.so (libiodbcinst.2.dylib on OS X)
 DataDirect: libodbcinst.so
Note: On AIX platforms, you need give the path to the library archive, followed by the name of
the library enclosed in parenthesis. For example:
ODBCInstLib=/usr/lib64/libodbcinst.a(libodbcinst.so.1)
ADO.NET-specific Settings
This setting applies only to the ADO.NET client driver:
-52-
HP Vertica C lient Libraries
C#PreloadLogging—Tells the HP Vertica ADO.NET driver to begin logging as soon as possible,
before the driver has fully loaded itself. Normally, logging only starts after the driver has fully
loaded. Valid values for this setting are:


0—Do not start logging before the driver has loaded.
1—Start logging as soon as possible.
-53-
Programming ODBC Client Applications
HP Vertica provides an Open Database Connectivity (ODBC) driver that allows applications to
connect to the HP Vertica database. This driver can be used by custom-written client applications
that use the ODBC API to interact with HP Vertica. ODBC is also used by many third-party
applications to connect to HP Vertica, including business intelligence applications and extract,
transform, and load (ETL) applications.
This section details the process for configuring the HP Vertica ODBC driver. It also explains how to
use the ODBC API to connect to HP Vertica in your own client applications.
This section assumes that you have already installed the ODBC libraries on your client system. If
you have not, see Client Driver Install Procedures (page 19).
ODBC Architecture
The ODBC architecture has four layers:

Client Application

Is an application that opens a data source through a Data Source Name (DSN). It then sends
requests to the data source, and receives the results of those requests. Requests are made in
the form of calls to ODBC functions.
Driver Manager


Is a library on the client system that acts as an intermediary between a client application and
one or more drivers. The driver manager:
 Resolves the DSN provided by the client application.
 Loads the driver required to access the specific database defined within the DSN.
 Processes ODBC function calls from the client or passing them to the driver.
 Retrieves results from the driver.
 Unloads drivers when they are no longer needed.
On Windows and Mac client systems, the driver manager is provided by the operating system.
On Linux and UNIX systems, you usually need to install a driver manager. See ODBC
Prerequisites (page 15) for a list of driver managers that can be used with HP Vertica on your
client platform.
Driver
A library on the client system that provides access to a specific database. It translates requests
into the format expected by the database, and translates results back into the format required
by the client application.
Database
The database processes requests initiated at the client application and returns results.
-54-
HP Vertica C lient Libraries
ODBC Feature Support
The ODBC driver for HP Vertica supports the most of the features defined in the Microsoft ODBC
3.5 specifications. The following features are not supported:




Updatable result sets
Backwards scrolling cursors
Cursor attributes
More than one open statement per connection. For example you cannot execute a new
statement while another statement has a result set open. If you need to execute multiple
statements at once, open multiple database connections.
 Keysets
 Bookmarks
The HP Vertica ODBC driver accurately reports its capabilities. If you need to determine whether it
complies with a specific feature, you should query the driver's capabilities directly using the
SQLGetInfo() function.
Updating ODBC Client Code from Previous Driver Versions
In HP Vertica Version 5.1, the client drivers were rewritten to improve standards compliance and
reliability. As a result, some HP Vertica-specific features and past incompatibilities have been
eliminated. You must update any client code written for versions of the HP Vertica ODBC driver
earlier than 5.1 to work with the newer drivers. The following topics give you an overview of the
necessary code changes.
DSN Parameter Changes
A number of HP Vertica-specific DSN parameters have been eliminated or changed.
Removed DSN Parameters
The following parameters are no longer available in the 5.1 ODBC driver.
Parameter
Description
BatchInsertEnforceLength
Batch inserts now behave the same way all
other ins erts behave. If a piece of data is too
wide for its column, the row will be rejected.
To avoid having rows rejected, either truncate
the data yourself or use the COPY statement
directly, which defaults to truncating data.
BinaryDataTrans fer
All data is now transferred using NA TIVE
VARCHAR.
BoolAsChar
Boolean columns can no longer be bound to
SQL_CHA R values. Use SQL_B IT values
-55-
Programmer's Guide
instead.
Debug
Use the LogLevel parameter instead.
LRSPath and
LRSStreaming
The ODB C driver now always streams data. It
does not cache data on the local disk.
SupressWarnings
Removed to prevent clients from ignoring
warnings.
WideCharSizeIn and
WideCharSizeOut
These options are now set using the
DriverManagerE ncoding option in the
vertica.ini file.
Changed DSN Parameters
The following DSN parameters have changed since the previous version of the ODBC driver.
Parameter
Description
MaxMemoryCac he
This parameter is now named ResultBufferSize to
make its name consistent across all HP Vertica
client drivers.
New DSN Parameter
The following DSN parameters are new in the version 5.1 ODBC driver.
Parameter
Description
DriverStringConversion
s
Determines whet her strings sent between the
server and the ODB C client are converted.
ThreePartNaming
Cont rols whether t he database metadata us es the
database name as the catalog name. This
parameter is only used for backwards
compatibility with some client soft ware that
expects a non-null catalog name.
New DSN Parameter Alias
Parameter
Description
Password
An alias for PWD.
SessionLabel
This is a new alias for the existing Label
parameter that lets you assign a label to an ODB C
connection.
UserName
An alias for UID.
-56-
HP Vertica C lient Libraries
Function Changes
The following functions have changed their behavior in the version 5.1 drivers.






SQLGetInfo() now returns the file name of the ODBC library file when queried for the
SQL_DRIVER_NAME. Previous versions would return the brand name of the driver, which is
not part of the ODBC specifications.
In previous versions of the ODBC driver, passing the SQLGetInfo function
SQL_PARAM_ARRAY_SELECTS returned SQL_PAS_BATCH, indicating that the driver
returns a batch for each set of parameters in a SELECT statement. This return value was
incorrect. The driver now correctly returns SQL_PAS_NO_SELECT, which indicates that the
driver does not support SELECT queries using arrays of parameters.
For better compatibility with the ODBC standards, the time data type in the ODBC driver no
longer contain fractions of a second. In the new driver, functions that convert time values to
strings (for example, a SQLBindCol() call to bind a SQL_TYPE_TIME to a SQL_C_CHAR
value) no longer add fractional second values to the string. Earlier versions of the driver would
return the fractions of a second that the HP Vertica database stores.
SQLBindParameter() now requires that its column-size argument for variable-width
columns (such as SQL_CHAR, SQL_VARCHAR, SQL_VARBINARY) be non-zero as
specified in the ODBC standards. The implementation of this function in previous versions of
the driver allowed a non-standard zero value to indicate the column width was a default width
value.
The driver is now more strict about converting from character data types to numerical data
types. In previous drivers, any portion of the character string that could not be converted to a
numeric value was ignored. The new driver returns an error if any portion of the string cannot
be converted. For example, asking the previous driver to insert the character string "3 days" to
an integer column resulted in the value 3 being stored in the column. Now, the driver returns an
error when attempting to store this character value in an integer column.
Functions that return result sets containing catalog metadata now return SQL_WVARCHAR
columns instead of SQL_VARCHAR columns. This is standard behavior for ODBC 3.52
drivers that are Unicode capable.
Removed Functions
The LCOPY function has been removed from the version 5.1 driver. You should instead use the
LOCAL option of the COPY SQL statement to copy data from the client system to the server. See
Streaming Data from the Client Using COPY LOCAL (page 103).
Interval and TimeStamp Changes
When you call the SQLBindParameter() function to bind a SQL_C_INTERVAL value (for
example, SQL_INTERVAL_STRUCT) to an INTERVAL column after calling SQLPrepare(),
the interval leading precision is now reset to the default value of 2. Earlier versions of the driver did
not reset the leading precision as called for in the ODBC standards. If you want your interval value
to have a greater leading precision, your client application can take either of the following steps:
-57-
Programmer's Guide


Call SQLBindParamter() to bind the value to the INTERVAL column before calling
SQLPrepare(). The interval precision is not reset unless you call SQLBindParameter()
after you have called SQLPrepare().
Reset the leading precision by using the SQLSetDescField() function to set
SQL_DESC_DATETIME_INTERVAL_PRECISION to whatever value you want.
SQLDescribeCol() now returns a new width for interval data types. For example, when passed
SQL_INTERVAL_SECOND, the previous driver would return 21 bytes. The new driver returns 16.
In general, the new drivers are more strict regarding interval values, and throw errors when the old
drivers silently truncated values. If your application throws an exception when dealing with an
interval, your first debugging step is to make sure the values you are inserting are the correct width
and type.
Note: The units interval style is not supported. Do not use the SET INTERVALSTYLE
statement to change the interval style in your client applications.
SQLBindParameter() is more strict about the number of digits you supply in fractions of a
second in a timestamp. For example, the following call generates an error:
SQL_TIMESTAMP_STRUCT ts;
ts.fraction = 123456; //represents the fraction 0.000123456
SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT, SQL_C_TIMESTAMP, SQL_TYPE_TIMESTAMP,
0, 6, (SQLPOINTER)&ts, 0, NULL);
SQLExecute(hstmt);
The error occurs because the fractional value represents more than six digits. Instead you would
need to change the SQLBindParameter call to allow 9 digits in the fraction:
SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT, SQL_C_TIMESTAMP, SQL_TYPE_TIMESTAMP,
0, 9, (SQLPOINTER)&ts, 0, NULL);
Note that the fraction will be truncated to 0.000123, since timestamps do not have nanosecond
precision. But you still need to allow for all of the fractional digits to be inserted into the timestamp.
New Additional Driver Information
The new HP Vertica version 5.1 ODBC driver has some additional configuration settings not
covered by the standard ODBC.INI file. use a configuration file named vertica.ini on Linux,
AIX, Solaris, and HP-UX. It controls several features of the ODBC driver. For more information,
see Additional ODBC Driver Configuration Settings (page 49).
HP Vertica-specific ODBC Header File
The HP Vertica ODBC driver provides a C header file named verticaodbc.h that defines
several useful constants that you can use in your applications. These constants let you access
and alter HP Vertica-specific settings.
This file's location depends on your client operating system:
-58-
HP Vertica C lient Libraries


/opt/vertica/include on Linux and UNIX systems.
C:\Program Files (x86)\HP Systems\ODBC\include on Windows systems.
The constants defined in this file are listed below.
Parameter
SQL_ATTR_VERTICA_RESULT_BUFFER_SIZE
SQL_ATTR_VERTICA_DIRECT
_BATCH_INSERT
SQL_ATTR_VERTICA_LOCALE
Description
Sets the size of the buffer used
when ret rieving results from the
server.
Associated Function
SQLSetConnectAttr()
SQLGetConnectAttr()
SQLSetConnectAttr()
Determines whet her a batch is
SQLSetStmtAttr()
inserted directly into the ROS (1) SQLGetConnectAttr()
SQLGetStmtAttr()
or using A UTO mode (0). By
default batches are inserted
using AUTO.
SQLSetConnectAttr()
Changes the locale from
SQLGetConnectAttr()
en_US@collation=binary to the
ICU locale specified. See Setting
the Locale for ODBC Sessions
(page 64) for an example of using
this parameter.
Connecting to HP Vertica
The first step in any ODBC application is to connect to the database. When you create the
connection to a data source using ODBC, you use the name of the DSN that contains the details of
the driver to use, the database host, and other basic information about connecting to the data
source.
There are 4 steps your application needs to take to connect to a database:
1 Call SQLAllocHandle() to allocate a handle for the ODBC environment. This handle is used
to create connection objects and to set application-wide settings.
2 Use the environment handle to set the version of ODBC that your application wants to use.
This ensures that the data source knows which API your application will use to interact with it.
3 Allocate a database connection handle by calling SQLAllocHandle(). This handle
represents a connection to a specific data source.
4 Use the SQLConnect() or SQLDriverConnect() functions to open the connection to the
database.
Note: If you specify a locale either in the connection string or in the DSN, the call to the
connection function returns SQL_SUCCESS_WITH_INFO on a successful connection, with
messages about the state of the locale.
When creating the connection to the database, use SQLConnect() when the only options you
need to set at connection time is the username and password. Use SQLDriverConnect() when
you want to change connection options, such as the locale.
-59-
Programmer's Guide
The following example demonstrates connecting to a database using a DSN named ExampleDB.
After it creates the connection successfully, this example simply closes it.
-60-
HP Vertica C lient Libraries
// Demonstrate connecting to Vertica using ODBC.
// Standard i/o library
#include <stdio.h>
#include <stdlib.h>
// Only needed for Windows clients
// #include <windows.h>
// SQL include files that define data types and ODBC API
// functions
#include <sql.h>
#include <sqlext.h>
#include <sqltypes.h>
int main()
{
SQLRETURN ret;
SQLHENV hdlEnv;
// Stores return value from ODBC API calls
// Handle for the SQL environment object
// Allocate an a SQL environment object
ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hdlEnv);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate a handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated an environment handle.\n");
}
// Set the ODBC version we are going to use to
// 3.
ret = SQLSetEnvAttr(hdlEnv, SQL_ATTR_ODBC_VERSION,
(SQLPOINTER) SQL_OV_ODBC3, SQL_IS_UINTEGER);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not set application version to ODBC 3.\n");
exit(EXIT_FAILURE);
} else {
printf("Set application version to ODBC 3.\n");
}
// Allocate a database handle.
SQLHDBC hdlDbc;
ret = SQLAllocHandle(SQL_HANDLE_DBC, hdlEnv, &hdlDbc);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate database handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated Database handle.\n");
}
-61-
Programmer's Guide
// Connect to the database using
// SQL Connect
printf("Connecting to database.\n");
const char *dsnName = "ExampleDB";
const char* userID = "ExampleUser";
const char* passwd = "password123";
ret = SQLConnect(hdlDbc, (SQLCHAR*)dsnName,
SQL_NTS,(SQLCHAR*)userID,SQL_NTS,
(SQLCHAR*)passwd, SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not connect to database.\n");
exit(EXIT_FAILURE);
} else {
printf("Connected to database.\n");
}
// We're connected. You can do real
// work here
// When done, free all of the handles to close them
// in an orderly fashion.
printf("Disconnecting and freeing handles.\n");
ret = SQLDisconnect( hdlDbc );
if(!SQL_SUCCEEDED(ret)) {
printf("Error disconnecting from database. Transaction still open? \n");
exit(EXIT_FAILURE);
}
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_SUCCESS);
}
Running the above code prints the following:
Allocated an environment handle.
Set application version to ODBC 3.
Allocated Database handle.
Connecting to database.
Connected to database.
Disconnecting and freeing handles.
See Setting the Locale for ODBC Sessions (page 64) for an example of using
SQLDriverConnect to connect to the database.
-62-
HP Vertica C lient Libraries
Notes
 If you use the DataDirect® driver manager, you should always use the
SQL_DRIVER_NOPROMPT value for the SQLDriverConnect function's DriverCompletion
parameter (the final parameter in the function call) when connecting to HP Vertica. HP
Vertica's ODBC driver on Linux and UNIX platforms does not contain a UI, and therefore
cannot prompt users for a password.
 On Windows client platforms, if your connection string doesn't include a DSN and also lacks
either or both a database or server parameter, the HP Vertica ODBC Driver will open a dialog
box to prompt the user for the missing information. See Prompting Windows Users for
Missing Connection Parameters (page 63) for more information.
 If your database is not in compliance with the terms your license agreement (for example, it is
larger than the data allowance in your license), HP Vertica sends a message about the
non-compliance to your client application in the return value of the SQLConnect() function.
Your application should always test this return value to see if it is
SQL_SUCCESS_WITH_INFO. If it is, your application should extract and display the message
to the user.
Prompting Windows Users for Missing Connection Parameters
The HP Vertica Windows ODBC driver can prompt the user for connection information if required
information is missing. The driver displays the HP Vertica Connection Dialog if the client
application calls SQLDriverConnect to connect to HP Vertica and either of the following is true:


the DriverCompletion parameter is set to SQL_DRIVER_PROMPT.
the DriverCompletion parameter is set to SQL_DRIVER_COMPLETE or
SQL_DRIVER_COMPLETE_REQUIRED and the connection string or DSN being used to
connect is missing the server, database, or port information.
If either of the above conditions are true, the driver displays an HP Vertica Connection Dialog to
the user to prompt for connection information.
Any parameters supplied in the connection string or DSN filled in on the dialog.
Note: Your connection string at least needs to specify Vertica as the driver, otherwise Windows
will not know to use the Vertica ODBC driver to try to open the connection.
-63-
Programmer's Guide
The required fields on the connection dialog are Database, UID, Server, and Port. Once these are
filled in, the form enables the OK button.
If the user clicks Cancel on the dialog, the SQLDriverConnect function call returns
SQL_NO_DATA immediately, without attempting to connect to HP Vertica. If the user supplies
incomplete or incorrect information for the connection, the connection function returns
SQL_ERROR after the connection attempt fails.
Note: If the DriverCompletion parameter of the SQLDriverConnect function call is
SQL_DRIVER_NOPROMPT, the ODBC driver immediately returns a SQL_ERROR indicating
that it cannot connect because not enough information has been supplied and the driver is not
allowed to prompt the user for the missing information.
Setting the Locale for ODBC Sessions
HP Vertica provides three ways to set the locale for an ODBC session:

Specify the locale for all connections made using the DSN:
 On Linux and other UNIX-like platforms: Creating an ODBC DSN for Linux, Solaris, AIX,
and HP-UX (page 29)
 On Windows platforms, you need to directly add the locale DSN parameter to the ODBC
definition in the Windows registry. See Editing DSN Entries (page 40).

Set the Locale connection parameter in the connection string in SQLDriverConnect()
function.
For example:
SQLDriverConnect(conn, NULL, (SQLCHAR*)"DSN=Vertica; Locale=en_GB",
SQL_NTS, szConnOut, sizeof(szConnOut), &iAvailable,
SQL_DRIVER_NOPROMPT)
Use the SQLSetConnectAttr() method with the SQL_ATTR_VERTICA_LOCALE constant
and specify the ICU string as the attribute value. See the example below.

Notes
 Having the client system use a non-Unicode locale (such as setting LANG=C on Linux
platforms) and using a Unicode locale for the connection to HP Vertica can result in errors
such as "(10170) String data right truncation on data from data source." If data received from
HP Vertica isn't in UTF-8 format. The driver allocates string memory based on the system's
locale setting, and non-UTF-8 data can trigger an overrun. You can avoid these errors by
always using a Unicode locale on the client system.
 If you specify a locale either in the connection string or in the DSN, the call to the connection
function returns SQL_SUCCESS_WITH_INFO on a successful connection, with messages
about the state of the locale.
 ODBC applications can be in either ANSI or Unicode mode:
 If Unicode, the encoding used by ODBC is UCS-2.
 If ANSI, the data must be in single-byte ASCII, which is compatible with UTF-8 on the
database server.
-64-
HP Vertica C lient Libraries

The ODBC driver converts UCS-2 to UTF-8 when passing to the HP Vertica server and
converts data sent by the HP Vertica server from UTF-8 to UCS-2.
If the end-user application is not already in UCS-2, the application is responsible for converting
the input data to UCS-2, or unexpected results could occur. For example:
 On non-UCS-2 data passed to ODBC APIs, when it is interpreted as UCS-2, it could result
in an invalid UCS-2 symbol being passed to the APIs, resulting in errors.
 Or the symbol provided in the alternate encoding could be a valid UCS-2 symbol; in this
case, incorrect data is inserted into the database.
ODBC applications should set the correct server session locale using SQLSetConnectAttr
(if different from database-wide setting) in order to set the proper collation and string functions
behavior on server.
The following example code demonstrates setting the locale using both the connection string and
through the SQLSetConnectAttr() function.
-65-
Programmer's Guide
// Standard i/o library
#include <stdio.h>
#include <stdlib.h>
// Only needed for Windows clients
// #include <windows.h>
// SQL include files that define data types and ODBC API
// functions
#include <sql.h>
#include <sqlext.h>
#include <sqltypes.h>
// Vertica-specific definitions. This include file is located as
// /opt/vertica/include on database hosts.
#include <verticaodbc.h>
int main()
{
SQLRETURN ret;
SQLHENV hdlEnv;
// Stores return value from ODBC API calls
// Handle for the SQL environment object
// Allocate an a SQL environment object
ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hdlEnv);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate a handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated an environment handle.\n");
}
// Set the ODBC version we are going to use to 3.
ret = SQLSetEnvAttr(hdlEnv, SQL_ATTR_ODBC_VERSION,
(SQLPOINTER) SQL_OV_ODBC3, SQL_IS_UINTEGER);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not set application version to ODBC 3.\n");
exit(EXIT_FAILURE);
} else {
printf("Set application version to ODBC 3.\n");
}
// Allocate a database handle.
SQLHDBC hdlDbc;
ret = SQLAllocHandle(SQL_HANDLE_DBC, hdlEnv, &hdlDbc);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate database handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated Database handle.\n");
}
-66-
HP Vertica C lient Libraries
// Connect to the database using SQLDriverConnect
printf("Connecting to database.\n");
// Set the locale to English in Great Britain.
const char *connStr = "DSN=ExampleDB;locale=en_GB;"
"UID=dbadmin;PWD=password123";
ret = SQLDriverConnect(hdlDbc, NULL, (SQLCHAR*)connStr, SQL_NTS,
NULL, 0, NULL, SQL_DRIVER_NOPROMPT );
if(!SQL_SUCCEEDED(ret)) {
printf("Could not connect to database.\n");
exit(EXIT_FAILURE);
} else {
printf("Connected to database.\n");
}
// Get the Locale
char locale[256];
SQLGetConnectAttr(hdlDbc, SQL_ATTR_VERTICA_LOCALE, locale, sizeof(locale),
0);
printf("Locale is set to: %s\n", locale);
// Set the locale to a new value
const char* newLocale = "en_GB";
SQLSetConnectAttr(hdlDbc, SQL_ATTR_VERTICA_LOCALE, (SQLCHAR*)newLocale,
SQL_NTS);
// Get the Locale again
SQLGetConnectAttr(hdlDbc, SQL_ATTR_VERTICA_LOCALE, locale, sizeof(locale),
0);
printf("Locale is now set to: %s\n", locale);
// When done, free all of the handles to close them
// in an orderly fashion.
printf("Disconnecting and freeing handles.\n");
ret = SQLDisconnect( hdlDbc );
if(!SQL_SUCCEEDED(ret)) {
printf("Error disconnecting from database. Transaction still open? \n");
exit(EXIT_FAILURE);
}
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_SUCCESS);
}
-67-
Programmer's Guide
AUTOCOMMIT and ODBC Transactions
The AUTOCOMMIT connection attribute controls whether INSERT, ALTER, COPY and other
data-manipulation statements are automatically committed after they complete. By default,
AUTOCOMMIT is enabled—all statements are committed after they execute. This is often not the
best setting to use, since it is less efficient. Also, you often want to control whether a set of
statements are committed as a whole, rather than have each individual statement committed. For
example, you may only want to commit a series of inserts if all of the inserts succeed. With
AUTOCOMMIT disabled, you can roll back the transaction if one of the statements fail.
If AUTOCOMMIT is on, the results of statements are committed immediately after they are
executed. You cannot roll back a statement executed in AUTOCOMMIT mode.
For example, when AUTOCOMMIT is on, the following single INSERT statement is automatically
committed:
ret = SQLExecDirect(hdlStmt, (SQLCHAR*)"INSERT INTO customers VALUES(500,"
"'Smith, Sam', '123-456-789');", SQL_NTS);
If AUTOCOMMIT is off, you need to manually commit the transaction after executing a statement.
For example:
ret = SQLExecDirect(hdlStmt, (SQLCHAR*)"INSERT INTO customers VALUES(500,"
"'Smith, Sam', '123-456-789');", SQL_NTS);
// Other inserts and data manipulations
// Commit the statements(s)
ret = SQLEndTran(SQL_HANDLE_DBC, hdlDbc, SQL_COMMIT);
The inserted row is only committed when you call SQLEndTran(). You can roll back the
INSERT and other statements at any point before committing the transaction.
Note: Prepared statements cache the AUTOCOMMIT setting when you create them using
SQLPrepare(). Later changing the connection's AUTOCOMMIT setting has no effect on the
AUTOCOMMIT settings of previously created prepared statements. See Using Prepared
Statements (page 81) for details.
The following example demonstrates turning off AUTOCOMMIT, executing an insert, then
manually committing the transaction.
-68-
HP Vertica C lient Libraries
// Some standard headers
#include <stdio.h>
#include <stdlib.h>
// Only needed for Windows clients
// #include <windows.h>
// Standard ODBC headers
#include <sql.h>
#include <sqltypes.h>
#include <sqlext.h>
int main()
{
// Set up the ODBC environment
SQLRETURN ret;
SQLHENV hdlEnv;
ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hdlEnv);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate a handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated an environment handle.\n");
}
// Tell ODBC that the application uses ODBC 3.
ret = SQLSetEnvAttr(hdlEnv, SQL_ATTR_ODBC_VERSION,
(SQLPOINTER) SQL_OV_ODBC3, SQL_IS_UINTEGER);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not set application version to ODBC3.\n");
exit(EXIT_FAILURE);
} else {
printf("Set application to ODBC 3.\n");
}
// Allocate a database handle.
SQLHDBC hdlDbc;
ret = SQLAllocHandle(SQL_HANDLE_DBC, hdlEnv, &hdlDbc);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate database handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated Database handle.\n");
}
// Connect to the database
printf("Connecting to database.\n");
const char *dsnName = "ExampleDB";
const char* userID = "dbadmin";
-69-
Programmer's Guide
const char* passwd = "password123";
ret = SQLConnect(hdlDbc, (SQLCHAR*)dsnName,
SQL_NTS,(SQLCHAR*)userID,SQL_NTS,
(SQLCHAR*)passwd, SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not connect to database.\n");
exit(EXIT_FAILURE);
} else {
printf("Connected to database.\n");
}
// Get the AUTOCOMMIT state
SQLINTEGER autoCommitState;
SQLGetConnectAttr(hdlDbc, SQL_ATTR_AUTOCOMMIT, &autoCommitState, 0, NULL);
printf("Autocommit is set to: %d\n", autoCommitState);
// Disable AUTOCOMMIT
printf("Disabling autocommit.\n");
ret = SQLSetConnectAttr(hdlDbc, SQL_ATTR_AUTOCOMMIT, SQL_AUTOCOMMIT_OFF,
SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not disable autocommit.\n");
exit(EXIT_FAILURE);
}
// Get the AUTOCOMMIT state again
SQLGetConnectAttr(hdlDbc, SQL_ATTR_AUTOCOMMIT, &autoCommitState, 0, NULL);
printf("Autocommit is set to: %d\n", autoCommitState);
// Set up a statement handle
SQLHSTMT hdlStmt;
SQLAllocHandle(SQL_HANDLE_STMT, hdlDbc, &hdlStmt);
// Create a table to hold the data
SQLExecDirect(hdlStmt, (SQLCHAR*)"DROP TABLE IF EXISTS customers",
SQL_NTS);
SQLExecDirect(hdlStmt, (SQLCHAR*)"CREATE TABLE customers "
"(CustID int, CustName varchar(100), Phone_Number char(15));",
SQL_NTS);
// Insert a single row.
ret = SQLExecDirect(hdlStmt, (SQLCHAR*)"INSERT INTO customers VALUES(500,"
"'Smith, Sam', '123-456-789');", SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not perform single insert.\n");
} else {
printf("Performed single insert.\n");
}
-70-
HP Vertica C lient Libraries
// Need to commit the transaction before closing, since autocommit is
// disabled. Otherwise SQLDisconnect returns an error.
printf("Committing transaction.\n");
ret = SQLEndTran(SQL_HANDLE_DBC, hdlDbc, SQL_COMMIT);
if(!SQL_SUCCEEDED(ret)) {
printf("Error committing transaction.\n");
exit(EXIT_FAILURE);
}
// Clean up
printf("Free handles.\n");
ret = SQLDisconnect(hdlDbc);
if(!SQL_SUCCEEDED(ret)) {
printf("Error disconnecting from database. Transaction still open? \n");
exit(EXIT_FAILURE);
}
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_SUCCESS);
}
Running the above code results in the following output:
Allocated an environment handle.
Set application to ODBC 3.
Allocated Database handle.
Connecting to database.
Connected to database.
Autocommit is set to: 1
Disabling autocommit.
Autocommit is set to: 0
Performed single insert.
Committing transaction.
Free handles.
Note: You can also disable AUTOCOMMIT in the ODBC connection string. See Setting DSN
Parameters (page 48) for more information.
Retrieving Data through ODBC
To retrieve data through ODBC, you execute a query that returns a result set (SELECT, for
example), then retrieve the results using one of two methods:

Use the SQLFetch() function to retrieve a row of the result set, then access column values in
the row by calling SQLGetData().
-71-
Programmer's Guide

Use the SQLBindColumn() function to bind a variable or array to a column in the result set,
then call SQLExtendedFetch() or SQLFetchScroll() to read a row of the result set and
insert its values into the variable or array.
In both methods you loop through the result set until you either reach the end (signaled by the
SQL_NO_DATA return status) or encounter an error.
Note: HP Vertica supports one cursor per connection. Attempting to use more than one cursor
per connection will result in an error. For example, you receive an error if you execute a
statement while another statement has a result set open.
The following code example demonstrates retrieving data from HP Vertica by:
1
2
3
4
Connecting to the database.
Executing a SELECT statement that returns the IDs and names of all tables.
Binds two variables to the two columns in the result set.
Loops through the result set, printing the ids and name values.
-72-
HP Vertica C lient Libraries
// Demonstrate running a query and getting results by querying the tables
// system table for a list of all tables in the current schema.
// Some standard headers
#include <stdlib.h>
#include <sstream>
#include <iostream>
#include <assert.h>
// Standard ODBC headers
#include <sql.h>
#include <sqltypes.h>
#include <sqlext.h>
// Use std namespace to make output easier
using namespace std;
// Helper function to print SQL error messages.
template <typename HandleT>
void reportError(int handleTypeEnum, HandleT hdl)
{
// Get the status records.
SQLSMALLINT
i, MsgLen;
SQLRETURN ret2;
SQLCHAR
SqlState[6], Msg[SQL_MAX_MESSAGE_LENGTH];
SQLINTEGER
NativeError;
i = 1;
cout << endl;
while ((ret2 = SQLGetDiagRec(handleTypeEnum, hdl, i, SqlState, &NativeError,
Msg, sizeof(Msg), &MsgLen)) != SQL_NO_DATA) {
cout << "error record #" << i++ << endl;
cout << "sqlstate: " << SqlState << endl;
cout << "detailed msg: " << Msg << endl;
cout << "native error code: " << NativeError << endl;
}
}
int main()
{
// Set up the ODBC environment
SQLRETURN ret;
SQLHENV hdlEnv;
ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hdlEnv);
assert(SQL_SUCCEEDED(ret));
// Tell ODBC that the application uses ODBC 3.
ret = SQLSetEnvAttr(hdlEnv, SQL_ATTR_ODBC_VERSION,
-73-
Programmer's Guide
(SQLPOINTER) SQL_OV_ODBC3, SQL_IS_UINTEGER);
assert(SQL_SUCCEEDED(ret));
// Allocate a database handle.
SQLHDBC hdlDbc;
ret = SQLAllocHandle(SQL_HANDLE_DBC, hdlEnv, &hdlDbc);
assert(SQL_SUCCEEDED(ret));
// Connect to the database
cout << "Connecting to database." << endl;
const char* dsnName = "ExampleDB";
const char* userID = "dbadmin";
const char* passwd = "password123";
ret = SQLConnect(hdlDbc, (SQLCHAR*)dsnName,
SQL_NTS,(SQLCHAR*)userID,SQL_NTS,
(SQLCHAR*)passwd, SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
cout << "Could not connect to database" << endl;
reportError<SQLHDBC>(SQL_HANDLE_DBC, hdlDbc);
exit(EXIT_FAILURE);
} else {
cout << "Connected to database." << endl;
}
// Set up a statement handle
SQLHSTMT hdlStmt;
SQLAllocHandle(SQL_HANDLE_STMT, hdlDbc, &hdlStmt);
assert(SQL_SUCCEEDED(ret));
// Execute a query to get the names and IDs of all tables in the schema
// search p[ath (usually public).
ret = SQLExecDirect( hdlStmt, (SQLCHAR*)"SELECT table_id, table_name "
"FROM tables ORDER BY table_name", SQL_NTS );
if(!SQL_SUCCEEDED(ret)) {
// Report error an go no further if statement failed.
cout << "Error executing statement." << endl;
reportError<SQLHDBC>(SQL_HANDLE_STMT, hdlStmt);
exit(EXIT_FAILURE);
} else {
// Query succeeded, so bind two variables to the two colums in the
// result set,
cout << "Fetching results..." << endl;
SQLBIGINT table_id;
// Holds the ID of the table.
SQLTCHAR table_name[256]; // buffer to hold name of table
ret = SQLBindCol(hdlStmt, 1, SQL_C_SBIGINT, (SQLPOINTER)&table_id,
sizeof(table_id), NULL);
-74-
HP Vertica C lient Libraries
ret = SQLBindCol(hdlStmt, 2, SQL_C_TCHAR, (SQLPOINTER)table_name,
sizeof(table_name), NULL);
// Loop through the results,
while( SQL_SUCCEEDED(ret = SQLFetchScroll(hdlStmt, SQL_FETCH_NEXT,1))) {
// Print the bound variables, which now contain the values from the
// fetched row.
cout << table_id << " | " << table_name << endl;
}
// See if loop exited for reasons other than running out of data
if (ret != SQL_NO_DATA) {
// Exited for a reason other than no more data... report the error.
reportError<SQLHDBC>( SQL_HANDLE_STMT, hdlStmt );
}
}
// Clean up by shutting down the connection
cout << "Free handles." << endl;
ret = SQLDisconnect( hdlDbc );
if(!SQL_SUCCEEDED(ret)) {
cout << "Error disconnecting. Transaction still open?" << endl;
exit(EXIT_FAILURE);
}
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_SUCCESS);
}
Running the example code in the vmart database produces output similar to this:
Connecting to database.
Connected to database.
Fetching results...
45035996273970908 | call_center_dimension
45035996273970836 | customer_dimension
45035996273972958 | customers
45035996273970848 | date_dimension
45035996273970856 | employee_dimension
45035996273970868 | inventory_fact
45035996273970904 | online_page_dimension
45035996273970912 | online_sales_fact
45035996273970840 | product_dimension
45035996273970844 | promotion_dimension
45035996273970860 | shipping_dimension
45035996273970876 | store_dimension
45035996273970894 | store_orders_fact
45035996273970880 | store_sales_fact
45035996273972806 | t
-75-
Programmer's Guide
45035996273970852 | vendor_dimension
45035996273970864 | warehouse_dimension
Free handles.
Handling NULL Values in ODBC
When inserting data into and reading data from HP Vertica, your ODBC client program needs to
handle NULL values in a special way. There is no value that represents NULL for any of HP
Vertica's data types. Instead, a column's value is NULL when the width of the data in the column
equals the SQL_NULL_DATA constant.
When reading data from a result set returned by HP Vertica, you should always check the width of
the values you are reading to for NULL values if the column can contain NULLs. You do this by
passing the address of a a SQLLEN variable as the last parameter of the SQLBindCol function
call. You then see if the value of this length variable is equal to SQL_NULL_DATA. If it is, the value
in the variable bound to the column in the result set is not valid. Your application will produce
inaccurate results if you use this value.
Similarly, when inserting data into HP Vertica, the last parameter of SQLBindParameter function
call is the address of a SQLLEN variable that indicates the width of the data in the parameter.
When you need to insert a null value into a column, set the parameter width variable to
SQL_NULL_DATA to mark the value as NULL. The actual value of the variable that is bound to the
parameter is irrelevant—the ODBC driver ignores it when sending data to HP Vertica, since the
value's width indicates it is a NULL value.
The following example code demonstrates inserting two rows of data, then reading them back.
The second row of data has a NULL value.
-76-
HP Vertica C lient Libraries
// Some standard headers
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Only needed for Windows clients
// #include <windows.h>
// Standard ODBC headers
#include <sql.h>
#include <sqltypes.h>
#include <sqlext.h>
int main()
{
// Set up the ODBC environment
SQLRETURN ret;
SQLHENV hdlEnv;
ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hdlEnv);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate a handle.\n");
exit(EXIT_FAILURE);
}
// Tell ODBC that the application uses ODBC 3.
ret = SQLSetEnvAttr(hdlEnv, SQL_ATTR_ODBC_VERSION,
(SQLPOINTER) SQL_OV_ODBC3, SQL_IS_UINTEGER);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not set application version to ODBC3.\n");
exit(EXIT_FAILURE);
}
// Allocate a database handle.
SQLHDBC hdlDbc;
ret = SQLAllocHandle(SQL_HANDLE_DBC, hdlEnv, &hdlDbc);
// Connect to the database
printf("Connecting to database.\n");
const char* dsnName = "verticaDSN";
const char* userID = "ExampleUser";
const char* passwd = "password123";
ret = SQLConnect(hdlDbc, (SQLCHAR*)dsnName,
SQL_NTS,(SQLCHAR*)userID,SQL_NTS,
(SQLCHAR*)passwd, SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not connect to database.\n");
exit(EXIT_FAILURE);
} else {
printf("Connected to database.\n");
}
// Disable AUTOCOMMIT
printf("Disabling autocommit.\n");
-77-
Programmer's Guide
ret = SQLSetConnectAttr(hdlDbc, SQL_ATTR_AUTOCOMMIT, SQL_AUTOCOMMIT_OFF,
0);
// Set up a statement handle
SQLHSTMT hdlStmt;
SQLAllocHandle(SQL_HANDLE_STMT, hdlDbc, &hdlStmt);
SQLExecDirect(hdlStmt, (SQLCHAR*)"DROP TABLE IF EXISTS customers",
SQL_NTS);
SQLExecDirect(hdlStmt, (SQLCHAR*)"CREATE TABLE customers "
"(CustID int NOT NULL, CustName varchar(100) NOT NULL, "
"Phone_Number char(15));",
SQL_NTS);
// Create the prepared statement. This will insert data into the
// table we created above.
printf("Creating prepared statement\n");
ret = SQLPrepare (hdlStmt, (SQLTCHAR*)"INSERT INTO customers (CustID, "
"CustName, Phone_Number) VALUES(?,?,?)", SQL_NTS) ;
// Set up a bunch of variables to be bound to the statement
// parameters.
SQLINTEGER custID = 1234;
// Integer for the ID
SQLLEN custIDLen = sizeof(custID);
// The size of the custID
SQLVARCHAR custName[100] = "Adams, Alan"; // a varchar for customer name
// Use SQL_NTS to have the driver calculate the size of the null-terminated
// string value. This is ratehr lazy. It is better to supply the size
// yourself to avoid forcing additional overhead on the driver.
SQLLEN custNameLen = SQL_NTS;
// CHAR value for the phone number. Use strlen to supply an actual numeric
// value for the size. Need to cast to C++ char for strlen to work.
SQLCHAR phoneNum[15] = "555-123-6789";
SQLLEN phoneNumLen = strlen((char *)phoneNum);
// Bind the data arrays to the parameters in the prepared SQL
// statement
SQLBindParameter(hdlStmt, 1, SQL_PARAM_INPUT, SQL_C_DEFAULT, SQL_INTEGER,
0, 0, &custID, 0 , &custIDLen);
// Bind CustNames
SQLBindParameter(hdlStmt, 2, SQL_PARAM_INPUT, SQL_C_DEFAULT, SQL_VARCHAR,
50, 0, (SQLPOINTER)custName, 0, &custNameLen);
// Bind phoneNums
SQLBindParameter(hdlStmt, 3, SQL_PARAM_INPUT, SQL_C_DEFAULT, SQL_CHAR,
15, 0, (SQLPOINTER)phoneNum, 0, &phoneNumLen);
// Execute the prepared statement.
printf("Inserting row #1...");
ret = SQLExecute(hdlStmt);
if(!SQL_SUCCEEDED(ret)) {
printf("not successful!\n");
-78-
HP Vertica C lient Libraries
}
else {
printf("successful.\n");
}
// Change the bound values to insert another row.
custID++;
strcpy((char *)custName,"Billings, Bob");
// Insert a NULL value for the Phone # of the second row.
// Technically, the actual value of phoneNum doesn't matter, since the
// data isn't used when the column length indicates it is NULL. This
// example sets it to a dummy value. Remember, an empty string is not the
// same as a NULL value!
strcpy((char *)phoneNum,"NULL VALUE");
// To actually set the NULL value, set the length of the value
// to the SQL_NULL_DATA constant. This marks the column as NULL.
phoneNumLen = SQL_NULL_DATA;
// Execute the prepared statement again to insert values in new row.
printf("Inserting row #2...");
ret = SQLExecute(hdlStmt);
if(!SQL_SUCCEEDED(ret)) {
printf("not successful!\n");
} else {
printf("successful.\n");
}
// Done with inserts, commit the transaction
printf("Committing transaction\n");
ret = SQLEndTran(SQL_HANDLE_DBC, hdlDbc, SQL_COMMIT);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not commit transaction\n");
} else {
printf("Committed transaction\n");
}
// Re-enable autocommit so we don't get nagged aout uncommitted
// transactions after querying database
printf("Enabling autocommit.\n");
ret = SQLSetConnectAttr(hdlDbc, SQL_ATTR_AUTOCOMMIT, (void *)
SQL_AUTOCOMMIT_ON , 0);
// Now read the data back.
// Free and reallocate statetment handle
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
ret = SQLAllocHandle(SQL_HANDLE_STMT, hdlDbc, &hdlStmt);
// Query the Customers table
ret = SQLExecDirect( hdlStmt, (SQLCHAR*)"SELECT * FROM Customers "
"ORDER BY CustID", SQL_NTS );
-79-
Programmer's Guide
if(!SQL_SUCCEEDED(ret)) {
printf("Error executing query statement.\n");
exit(EXIT_FAILURE);
}
// Query succeeded, so bind variables to result set, Reuse the previously
// defined variables that we used to insert data.
printf("Fetching results...\n");
ret = SQLBindCol(hdlStmt, 1, SQL_C_ULONG, &custID,
sizeof(custID), &custIDLen);
ret = SQLBindCol(hdlStmt, 2, SQL_C_CHAR, custName,
sizeof(custName), &custNameLen);
ret = SQLBindCol(hdlStmt, 3, SQL_C_CHAR, phoneNum,
sizeof(phoneNum), &phoneNumLen);
// Print header for table
printf (" CustID | Name
| Phone\n");
printf (" ===========================================================\n");
// Loop through the results, printing them in a table.
while( SQL_SUCCEEDED(ret = SQLFetchScroll(hdlStmt, SQL_FETCH_NEXT,1))) {
// Print the bound variables, which now contain the values from the
// fetched row. Do not need to check custID or custName for nulls, since
// the columns are dfefined as NOT NULL.
printf(" %6d | %-30s | ",custID, custName);
//
//
//
//
//
if
Check the phoneNum column for NULL values by comparing the length of
the value to the SQL_NULL_DATA constant. If they match, the column
value is NULL. The value that is stored in phoneNum is not correct.
using it without checking for null values results in incorrect
results.
(phoneNumLen == SQL_NULL_DATA) {
// NULL value. Handle by just printing the string "NULL"
printf("%-15s","NULL");
} else {
// Value is not null. You can retrieve the value safely.
printf("%-15s",phoneNum);
}
printf("\n");
}
// Clean up
printf("Free handles.\n");
ret = SQLDisconnect( hdlDbc );
if(!SQL_SUCCEEDED(ret)) {
printf("Error disconnecting. Transaction still open?\n");
// reportError<SQLHDBC>( SQL_HANDLE_DBC, hdlDbc );
exit(EXIT_FAILURE);
}
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_SUCCESS);
}
-80-
HP Vertica C lient Libraries
The example produces the following output when run:
Connecting to database.
Connected to database.
Disabling autocommit.
Creating prepared statement
Inserting row #1...successful.
Inserting row #2...successful.
Committing transaction
Committed transaction
Enabling autocommit.
Fetching results...
CustID | Name
| Phone
===========================================================
1234 | Adams, Alan
| 555-123-6789
1235 | Billings, Bob
| NULL
Free handles.
Loading Data through ODBC
A primary task for many client applications is loading data into the HP Vertica database. There are
several different ways to insert data using ODBC, which are covered by the topics in this section.
Using a Single Row Insert
The easiest way to load data into HP Vertica is to run an INSERT SQL statement using the
SQLExecuteDirect function. However this method is limited to inserting a single row of data.
ret = SQLExecDirect(hstmt, (SQLTCHAR*)"INSERT into Customers values"
"(1,'abcda','efgh','1')", SQL_NTS);
Using Prepared Statements
HP Vertica supports using server-side prepared statements with both ODBC and JDBC. Prepared
statements let you define a statement once, and then run it many times with different parameters.
The statement you want to execute contains placeholders instead of parameters. When you
execute the statement, you supply values for each placeholder.
Placeholders are represented by question marks (?) as in the following example query:
SELECT * FROM public.inventory_fact WHERE product_key = ?
Server-side prepared statements are useful for:

Optimizing queries. HP Vertica only needs to parse the statement once.
-81-
Programmer's Guide


Preventing SQL injection attacks. A SQL injection attack occurs when user input is either
incorrectly filtered for string literal escape characters embedded in SQL statements or user
input is not strongly typed and thereby unexpectedly run. Since a prepared statement is
parsed separately from the input data, there is no chance the data can be accidentally
executed by the database.
Binding direct variables to return columns. By pointing to data structures, the code doesn't
have to perform extra transformations.
The following example demonstrates a using a prepared statement for a single insert.
-82-
HP Vertica C lient Libraries
// Some standard headers
#include <stdio.h>
#include <stdlib.h>
// Only needed for Windows clients
// #include <windows.h>
// Standard ODBC headers
#include <sql.h>
#include <sqltypes.h>
#include <sqlext.h>
// Some
#define
#define
#define
constants for the size of the data to be inserted.
CUST_NAME_LEN 50
PHONE_NUM_LEN 15
NUM_ENTRIES 4
int main()
{
// Set up the ODBC environment
SQLRETURN ret;
SQLHENV hdlEnv;
ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hdlEnv);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate a handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated an environment handle.\n");
}
// Tell ODBC that the application uses ODBC 3.
ret = SQLSetEnvAttr(hdlEnv, SQL_ATTR_ODBC_VERSION,
(SQLPOINTER) SQL_OV_ODBC3, SQL_IS_UINTEGER);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not set application version to ODBC3.\n");
exit(EXIT_FAILURE);
} else {
printf("Set application to ODBC 3.\n");
}
// Allocate a database handle.
SQLHDBC hdlDbc;
ret = SQLAllocHandle(SQL_HANDLE_DBC, hdlEnv, &hdlDbc);
// Connect to the database
printf("Connecting to database.\n");
const char *dsnName = "ExampleDB";
-83-
Programmer's Guide
const char* userID = "dbadmin";
const char* passwd = "password123";
ret = SQLConnect(hdlDbc, (SQLCHAR*)dsnName,
SQL_NTS,(SQLCHAR*)userID,SQL_NTS,
(SQLCHAR*)passwd, SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not connect to database.\n");
exit(EXIT_FAILURE);
} else {
printf("Connected to database.\n");
}
// Disable AUTOCOMMIT
printf("Disabling autocommit.\n");
ret = SQLSetConnectAttr(hdlDbc, SQL_ATTR_AUTOCOMMIT, SQL_AUTOCOMMIT_OFF,
SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not disable autocommit.\n");
exit(EXIT_FAILURE);
}
// Set up a statement handle
SQLHSTMT hdlStmt;
SQLAllocHandle(SQL_HANDLE_STMT, hdlDbc, &hdlStmt);
SQLExecDirect(hdlStmt, (SQLCHAR*)"DROP TABLE IF EXISTS customers",
SQL_NTS);
SQLExecDirect(hdlStmt, (SQLCHAR*)"CREATE TABLE customers "
"(CustID int, CustName varchar(100), Phone_Number char(15));",
SQL_NTS);
// Set up a bunch of variables to be bound to the statement
// parameters.
// Create the prepared statement. This will insert data into the
// table we created above.
printf("Creating prepared statement\n");
ret = SQLPrepare (hdlStmt, (SQLTCHAR*)"INSERT INTO customers (CustID, "
"CustName, Phone_Number) VALUES(?,?,?)", SQL_NTS) ;
if(!SQL_SUCCEEDED(ret)) {
printf("Could not create prepared statement\n");
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_FAILURE);
} else {
printf("Created prepared statement.\n");
}
SQLINTEGER custID = 1234;
SQLCHAR custName[100] = "Fein, Fredrick";
SQLVARCHAR phoneNum[15] = "555-123-6789";
-84-
HP Vertica C lient Libraries
SQLLEN strFieldLen = SQL_NTS;
SQLLEN custIDLen = 0;
// Bind the data arrays to the parameters in the prepared SQL
// statement
ret = SQLBindParameter(hdlStmt, 1, SQL_PARAM_INPUT, SQL_C_LONG, SQL_INTEGER,
0, 0, &custID, 0 , &custIDLen);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not bind custID array\n");
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_FAILURE);
} else {
printf("Bound custID to prepared statement\n");
}
// Bind CustNames
SQLBindParameter(hdlStmt, 2, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_VARCHAR,
50, 0, (SQLPOINTER)custName, 0, &strFieldLen);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not bind custNames\n");
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_FAILURE);
} else {
printf("Bound custName to prepared statement\n");
}
// Bind phoneNums
SQLBindParameter(hdlStmt, 3, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR,
15, 0, (SQLPOINTER)phoneNum, 0, &strFieldLen);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not bind phoneNums\n");
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_FAILURE);
} else {
printf("Bound phoneNum to prepared statement\n");
}
// Execute the prepared statement.
printf("Running prepared statement...");
ret = SQLExecute(hdlStmt);
if(!SQL_SUCCEEDED(ret)) {
printf("not successful!\n");
} else {
printf("successful.\n");
}
-85-
Programmer's Guide
// Done with batches, commit the transaction
printf("Committing transaction\n");
ret = SQLEndTran(SQL_HANDLE_DBC, hdlDbc, SQL_COMMIT);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not commit transaction\n");
} else {
printf("Committed transaction\n");
}
// Clean up
printf("Free handles.\n");
ret = SQLDisconnect( hdlDbc );
if(!SQL_SUCCEEDED(ret)) {
printf("Error disconnecting. Transaction still open?\n");
exit(EXIT_FAILURE);
}
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_SUCCESS);
}
Using Batch Inserts
You use batch inserts to insert chunks of data into the database. By breaking the data into
batches, you can monitor the progress of the load by receiving information about any rejected
rows after each batch is loaded. To perform a batch load through ODBC, you typically use a
prepared statement with the parameters bound to arrays that contain the data to be loaded. For
each batch, you load a new set of data into the arrays then execute the prepared statement.
When you perform a batch load, HP Vertica uses a COPY statement to load the data. Each
additional batch you load uses the same COPY statement. The statement remains open until you
end the transaction, close the cursor for the statement, or execute a non-INSERT statement.
Using a single COPY statement for multiple batches improves batch loading efficiency by:


reducing the overhead of inserting individual batches
combining individual batches into larger ROS containers
Note: If the database connection has AUTOCOMMIT enabled, then the transaction is
automatically committed after each batch insert statement which closes the COPY statement.
Leaving AUTOCOMMIT enabled makes your batch load much less efficient, and can cause
added overhead in your database as all of the smaller loads are consolidated.
Even though HP Vertica uses a single COPY statement to insert multiple batches within a
transaction, you can locate which (if any) rows were rejected due to invalid row formats or data
type issues after each batch is loaded. See Tracking Load Status (page 92) for details.
-86-
HP Vertica C lient Libraries
Note: While you can find rejected rows during the batch load transaction, other types of errors
(such as running out of disk space or a node shutdown that makes the database unsafe) are
only reported when the COPY statement ends.
Since the batch loads share a COPY statement, errors in one batch can cause earlier batches in
the same transaction to be rolled back.
Batch Insert Steps
The steps your application needs to take in order to perform an ODBC Batch Insert are:
1
2
3
4
Connect to the database.
Disable autocommit for the connection.
Create a prepared statement that inserts the data you want to load.
Bind the parameters of the prepared statement to arrays that will contain the data you want to
load.
5 Populate the arrays with the data for your batches.
6 Execute the prepared statement.
7 Optionally, check the results of the batch load to find rejected rows.
8 Repeat the previous three steps until all of the data you want to load is loaded.
9 Commit the transaction.
10 Optionally, check the results of the entire batch transaction.
The following example code demonstrates a simplified version of the above steps.
-87-
Programmer's Guide
// Some standard headers
#include <stdio.h>
#include <stdlib.h>
// Only needed for Windows clients
// #include <windows.h>
// Standard ODBC headers
#include <sql.h>
#include <sqltypes.h>
#include <sqlext.h>
int main()
{
// Number of data rows to insert
const int NUM_ENTRIES = 4;
// Set up the ODBC environment
SQLRETURN ret;
SQLHENV hdlEnv;
ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hdlEnv);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate a handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated an environment handle.\n");
}
// Tell ODBC that the application uses ODBC 3.
ret = SQLSetEnvAttr(hdlEnv, SQL_ATTR_ODBC_VERSION,
(SQLPOINTER) SQL_OV_ODBC3, SQL_IS_UINTEGER);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not set application version to ODBC3.\n");
exit(EXIT_FAILURE);
} else {
printf("Set application to ODBC 3.\n");
}
// Allocate a database handle.
SQLHDBC hdlDbc;
ret = SQLAllocHandle(SQL_HANDLE_DBC, hdlEnv, &hdlDbc);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate database handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated Database handle.\n");
}
-88-
HP Vertica C lient Libraries
// Connect to the database
printf("Connecting to database.\n");
const char *dsnName = "ExampleDB";
const char* userID = "dbadmin";
const char* passwd = "password123";
ret = SQLConnect(hdlDbc, (SQLCHAR*)dsnName,
SQL_NTS,(SQLCHAR*)userID,SQL_NTS,
(SQLCHAR*)passwd, SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not connect to database.\n");
exit(EXIT_FAILURE);
} else {
printf("Connected to database.\n");
}
// Disable AUTOCOMMIT
printf("Disabling autocommit.\n");
ret = SQLSetConnectAttr(hdlDbc, SQL_ATTR_AUTOCOMMIT, SQL_AUTOCOMMIT_OFF,
SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not disable autocommit.\n");
exit(EXIT_FAILURE);
}
// Set up a statement handle
SQLHSTMT hdlStmt;
SQLAllocHandle(SQL_HANDLE_STMT, hdlDbc, &hdlStmt);
// Create a table to hold the data
SQLExecDirect(hdlStmt, (SQLCHAR*)"DROP TABLE IF EXISTS customers",
SQL_NTS);
SQLExecDirect(hdlStmt, (SQLCHAR*)"CREATE TABLE customers "
"(CustID int, CustName varchar(100), Phone_Number char(15));",
SQL_NTS);
// Create the prepared statement. This will insert data into the
// table we created above.
printf("Creating prepared statement\n");
ret = SQLPrepare (hdlStmt, (SQLTCHAR*)"INSERT INTO customers (CustID, "
"CustName, Phone_Number) VALUES(?,?,?)", SQL_NTS) ;
if(!SQL_SUCCEEDED(ret)) {
printf("Could not create prepared statement\n");
exit(EXIT_FAILURE);
} else {
printf("Created prepared statement.\n");
}
// This is the data to be inserted into the database.
-89-
Programmer's Guide
SQLCHAR custNames[][50] = { "Allen, Anna", "Brown, Bill", "Chu, Cindy",
"Dodd, Don" };
SQLINTEGER custIDs[] = { 100, 101, 102, 103};
SQLCHAR phoneNums[][15] = {"1-617-555-1234", "1-781-555-1212",
"1-508-555-4321", "1-617-555-4444"};
// Bind the data arrays to the parameters in the prepared SQL
// statement. First is the custID.
ret = SQLBindParameter(hdlStmt, 1, SQL_PARAM_INPUT, SQL_C_LONG, SQL_INTEGER,
0, 0, (SQLPOINTER)custIDs, sizeof(SQLINTEGER) , NULL);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not bind custID array\n");
exit(EXIT_FAILURE);
} else {
printf("Bound CustIDs array to prepared statement\n");
}
// Bind CustNames
ret = SQLBindParameter(hdlStmt, 2, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_VARCHAR,
50, 0, (SQLPOINTER)custNames, 50, NULL);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not bind custNames\n");
exit(EXIT_FAILURE);
} else {
printf("Bound CustNames array to prepared statement\n");
}
// Bind phoneNums
ret = SQLBindParameter(hdlStmt, 3, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR,
15, 0, (SQLPOINTER)phoneNums, 15, NULL);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not bind phoneNums\n");
exit(EXIT_FAILURE);
} else {
printf("Bound phoneNums array to prepared statement\n");
}
// Tell the ODBC driver how many rows we have in the
// array.
ret = SQLSetStmtAttr( hdlStmt, SQL_ATTR_PARAMSET_SIZE,
(SQLPOINTER)NUM_ENTRIES, 0 );
if(!SQL_SUCCEEDED(ret)) {
printf("Could not bind set parameter size\n");
exit(EXIT_FAILURE);
} else {
printf("Bound phoneNums array to prepared statement\n");
}
// Add multiple batches to the database. This just adds the same
-90-
HP Vertica C lient Libraries
// batch of data four times for simplicity's sake. Each call adds
// the 4 rows into the database.
for (int batchLoop=1; batchLoop<=5; batchLoop++) {
// Execute the prepared statement, loading all of the data
// in the arrays.
printf("Adding Batch #%d...", batchLoop);
ret = SQLExecute(hdlStmt);
if(!SQL_SUCCEEDED(ret)) {
printf("not successful!\n");
} else {
printf("successful.\n");
}
}
// Done with batches, commit the transaction
printf("Committing transaction\n");
ret = SQLEndTran(SQL_HANDLE_DBC, hdlDbc, SQL_COMMIT);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not commit transaction\n");
} else {
printf("Committed transaction\n");
}
// Clean up
printf("Free handles.\n");
ret = SQLDisconnect( hdlDbc );
if(!SQL_SUCCEEDED(ret)) {
printf("Error disconnecting. Transaction still open?\n");
exit(EXIT_FAILURE);
}
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_SUCCESS);
}
The result of running the above code is shown below.
Allocated an environment handle.
Set application to ODBC 3.
Allocated Database handle.
Connecting to database.
Connected to database.
Creating prepared statement
Created prepared statement.
Bound CustIDs array to prepared statement
Bound CustNames array to prepared statement
Bound phoneNums array to prepared statement
Adding Batch #1...successful.
Adding Batch #2...successful.
Adding Batch #3...successful.
-91-
Programmer's Guide
Adding Batch #4...successful.
Adding Batch #5...successful.
Committing transaction
Committed transaction
Free handles.
The resulting table looks like this:
=> SELECT * FROM customers;
CustID | CustName
| Phone_Number
--------+-------------+----------------100 | Allen, Anna | 1-617-555-1234
101 | Brown, Bill | 1-781-555-1212
102 | Chu, Cindy | 1-508-555-4321
103 | Dodd, Don
| 1-617-555-4444
100 | Allen, Anna | 1-617-555-1234
101 | Brown, Bill | 1-781-555-1212
102 | Chu, Cindy | 1-508-555-4321
103 | Dodd, Don
| 1-617-555-4444
100 | Allen, Anna | 1-617-555-1234
101 | Brown, Bill | 1-781-555-1212
102 | Chu, Cindy | 1-508-555-4321
103 | Dodd, Don
| 1-617-555-4444
100 | Allen, Anna | 1-617-555-1234
101 | Brown, Bill | 1-781-555-1212
102 | Chu, Cindy | 1-508-555-4321
103 | Dodd, Don
| 1-617-555-4444
100 | Allen, Anna | 1-617-555-1234
101 | Brown, Bill | 1-781-555-1212
102 | Chu, Cindy | 1-508-555-4321
103 | Dodd, Don
| 1-617-555-4444
(20 rows)
Note: An input parameter bound with the SQL_C_NUMERIC data type uses the default
numeric precision (37) and the default scale (0) instead of the precision and scale set by the
SQL_NUMERIC_STRUCT input value. This behavior adheres to the ODBC standard. If you do
not want to use the default precision and scale, use SQLSetDescField() or
SQLSetDescRec() to change them in the statement's attributes.
Tracking Load Status
After loading a batch of data, your client application can get the number of rows that were
processed and find out whether each row was accepted or rejected.
Finding the Number of Accepted Rows
To get the number of rows processed by a batch, you add an attribute named
SQL_ATTR_PARAMS_PROCESSED_PTR to the statement object that points to a variable to
receive the number rows:
-92-
HP Vertica C lient Libraries
SQLULEN rowsProcessed;
SQLSetStmtAttr(hdlStmt, SQL_ATTR_PARAMS_PROCESSED_PTR, &rowsProcessed, 0);
When your application calls SQLExecute() to insert the batch, the HP Vertica ODBC driver
saves the number of rows that it processed (which is not necessarily the number of rows that were
successfully inserted) in the variable you specified in the
SQL_ATTR_PARAMS_PROCESSED_PTR statement attribute.
Finding the Accepted and Rejected Rows
Your application can also set a statement attribute named SQL_ATTR_PARAM_STATUS_PTR
that points to an array where the ODBC driver can store the result of inserting each row:
SQLUSMALLINT
rowResults[ NUM_ENTRIES ];
SQLSetStmtAttr(hdlStmt, SQL_ATTR_PARAM_STATUS_PTR, rowResults, 0);
This array must be at least as large as the number of rows being inserted in each batch.
When your application calls SQLExecute to insert a batch, the ODBC driver populates the array
with values indicating whether each row was successfully inserted (SQL_PARAM_SUCCESS or
SQL_PARAM_SUCCESS_WITH_INFO) or encountered an error (SQL_PARAM_ERROR).
The following example expands on the example shown in Using Batch Inserts (page 86) to
include reporting the number of rows processed and the status of each row inserted.
-93-
Programmer's Guide
// Some standard headers
#include <stdio.h>
#include <stdlib.h>
// Only needed for Windows clients
// #include <windows.h>
// Standard ODBC headers
#include <sql.h>
#include <sqltypes.h>
#include <sqlext.h>
// Helper function to print SQL error messages.
template <typename HandleT>
void reportError(int handleTypeEnum, HandleT hdl)
{
// Get the status records.
SQLSMALLINT
i, MsgLen;
SQLRETURN ret2;
SQLCHAR
SqlState[6], Msg[SQL_MAX_MESSAGE_LENGTH];
SQLINTEGER
NativeError;
i = 1;
printf("\n");
while ((ret2 = SQLGetDiagRec(handleTypeEnum, hdl, i, SqlState, &NativeError,
Msg, sizeof(Msg), &MsgLen)) != SQL_NO_DATA) {
printf("error record %d\n", i);
printf("sqlstate: %s\n", SqlState);
printf("detailed msg: %s\n", Msg);
printf("native error code: %d\n\n", NativeError);
i++;
}
}
int main()
{
// Number of data rows to insert
const int NUM_ENTRIES = 4;
SQLRETURN ret;
SQLHENV hdlEnv;
ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hdlEnv);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate a handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated an environment handle.\n");
}
ret = SQLSetEnvAttr(hdlEnv, SQL_ATTR_ODBC_VERSION,
-94-
HP Vertica C lient Libraries
(SQLPOINTER) SQL_OV_ODBC3, SQL_IS_UINTEGER);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not set application version to ODBC3.\n");
exit(EXIT_FAILURE);
} else {
printf("Set application to ODBC 3.\n");
}
SQLHDBC hdlDbc;
ret = SQLAllocHandle(SQL_HANDLE_DBC, hdlEnv, &hdlDbc);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate database handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated Database handle.\n");
}
// Connect to the database
printf("Connecting to database.\n");
const char *dsnName = "ExampleDB";
const char* userID = "dbadmin";
const char* passwd = "password123";
ret = SQLConnect(hdlDbc, (SQLCHAR*)dsnName,
SQL_NTS,(SQLCHAR*)userID,SQL_NTS,
(SQLCHAR*)passwd, SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not connect to database.\n");
reportError<SQLHDBC>(SQL_HANDLE_DBC, hdlDbc);
exit(EXIT_FAILURE);
} else {
printf("Connected to database.\n");
}
// Set up a statement handle
SQLHSTMT hdlStmt;
SQLAllocHandle(SQL_HANDLE_STMT, hdlDbc, &hdlStmt);
SQLExecDirect(hdlStmt, (SQLCHAR*)"DROP TABLE IF EXISTS customers",
SQL_NTS);
// Create a table into which we can store data
printf("Creating table.\n");
ret = SQLExecDirect(hdlStmt, (SQLCHAR*)"CREATE TABLE customers "
"(CustID int, CustName varchar(50), Phone_Number char(15));",
SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
reportError<SQLHDBC>( SQL_HANDLE_STMT, hdlStmt );
exit(EXIT_FAILURE);
} else {
printf("Created table.\n");
}
-95-
Programmer's Guide
// Create the prepared statement. This will insert data into the
// table we created above.
printf("Creating prepared statement\n");
ret = SQLPrepare (hdlStmt, (SQLTCHAR*)"INSERT INTO customers (CustID, "
"CustName, Phone_Number) VALUES(?,?,?)", SQL_NTS) ;
if(!SQL_SUCCEEDED(ret)) {
reportError<SQLHDBC>( SQL_HANDLE_STMT, hdlStmt );
exit(EXIT_FAILURE);
} else {
printf("Created prepared statement.\n");
}
// This is the data to be inserted into the database.
char custNames[][50] = { "Allen, Anna", "Brown, Bill", "Chu, Cindy",
"Dodd, Don" };
SQLINTEGER custIDs[] = { 100, 101, 102, 103};
char phoneNums[][15] = {"1-617-555-1234", "1-781-555-1212",
"1-508-555-4321", "1-617-555-4444"};
// Bind the data arrays to the parameters in the prepared SQL
// statement
ret = SQLBindParameter(hdlStmt, 1, SQL_PARAM_INPUT, SQL_C_LONG, SQL_INTEGER,
0, 0, (SQLPOINTER)custIDs, sizeof(SQLINTEGER) , NULL);
if(!SQL_SUCCEEDED(ret)) {
reportError<SQLHDBC>( SQL_HANDLE_STMT, hdlStmt );
exit(EXIT_FAILURE);
} else {
printf("Bound CustIDs array to prepared statement\n");
}
// Bind CustNames
SQLBindParameter(hdlStmt, 2, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_VARCHAR,
50, 0, (SQLPOINTER)custNames, 50, NULL);
if(!SQL_SUCCEEDED(ret)) {
reportError<SQLHDBC>( SQL_HANDLE_STMT, hdlStmt );
exit(EXIT_FAILURE);
} else {
printf("Bound CustNames array to prepared statement\n");
}
// Bind phoneNums
SQLBindParameter(hdlStmt, 3, SQL_PARAM_INPUT, SQL_C_CHAR, SQL_CHAR,
15, 0, (SQLPOINTER)phoneNums, 15, NULL);
if(!SQL_SUCCEEDED(ret)) {
reportError<SQLHDBC>( SQL_HANDLE_STMT, hdlStmt );
exit(EXIT_FAILURE);
} else {
printf("Bound phoneNums array to prepared statement\n");
-96-
HP Vertica C lient Libraries
}
// Set up a variable to recieve number of parameters processed.
SQLULEN rowsProcessed;
// Set a statement attribute to point to the variable
SQLSetStmtAttr(hdlStmt, SQL_ATTR_PARAMS_PROCESSED_PTR, &rowsProcessed, 0);
// Set up an array to hold the result of each row insert
SQLUSMALLINT
rowResults[ NUM_ENTRIES ];
// Set a statement attribute to point to the array
SQLSetStmtAttr(hdlStmt, SQL_ATTR_PARAM_STATUS_PTR, rowResults, 0);
// Tell the ODBC driver how many rows we have in the
// array.
SQLSetStmtAttr(hdlStmt, SQL_ATTR_PARAMSET_SIZE, (SQLPOINTER)NUM_ENTRIES,
0);
// Add multiple batches to the database. This just adds the same
// batch of data over and over again for simplicity's sake.
for (int batchLoop=1; batchLoop<=5; batchLoop++) {
// Execute the prepared statement, loading all of the data
// in the arrays.
printf("Adding Batch #%d...", batchLoop);
ret = SQLExecute(hdlStmt);
if(!SQL_SUCCEEDED(ret)) {
reportError<SQLHDBC>( SQL_HANDLE_STMT, hdlStmt );
exit(EXIT_FAILURE);
}
// Number of rows processed is in rowsProcessed
printf("Params processed: %d\n", rowsProcessed);
printf("Results of inserting each row:\n");
int i;
for (i = 0; i<NUM_ENTRIES; i++) {
SQLUSMALLINT result = rowResults[i];
switch(rowResults[i]) {
case SQL_PARAM_SUCCESS:
case SQL_PARAM_SUCCESS_WITH_INFO:
printf(" Row %d inserted successsfully\n", i+1);
break;
case SQL_PARAM_ERROR:
printf(" Row %d was not inserted due to an error.", i+1);
break;
default:
printf(" Row %d had some issue with it: %d\n", i+1, result);
}
}
}
-97-
Programmer's Guide
// Done with batches, commit the transaction
printf("Commit Transaction\n");
ret = SQLEndTran(SQL_HANDLE_DBC, hdlDbc, SQL_COMMIT);
if(!SQL_SUCCEEDED(ret)) {
reportError<SQLHDBC>( SQL_HANDLE_STMT, hdlStmt );
}
// Clean up
printf("Free handles.\n");
ret = SQLDisconnect( hdlDbc );
if(!SQL_SUCCEEDED(ret)) {
printf("Error disconnecting. Transaction still open?\n");
exit(EXIT_FAILURE);
}
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_SUCCESS);
}
Running the example code produces the following output:
Allocated an environment handle.
Set application to ODBC 3.
Allocated Database handle.
Connecting to database.
Connected to database.
Creating table.
Created table.
Creating prepared statement
Created prepared statement.
Bound CustIDs array to prepared statement
Bound CustNames array to prepared statement
Bound phoneNums array to prepared statement
Adding Batch #1...Params processed: 4
Results of inserting each row:
Row 1 inserted successfully
Row 2 inserted successfully
Row 3 inserted successfully
Row 4 inserted successfully
Adding Batch #2...Params processed: 4
Results of inserting each row:
Row 1 inserted successfully
Row 2 inserted successfully
Row 3 inserted successfully
Row 4 inserted successfully
Adding Batch #3...Params processed: 4
Results of inserting each row:
Row 1 inserted successfully
Row 2 inserted successfully
Row 3 inserted successfully
-98-
HP Vertica C lient Libraries
Row 4 inserted successfully
Adding Batch #4...Params processed: 4
Results of inserting each row:
Row 1 inserted successfully
Row 2 inserted successfully
Row 3 inserted successfully
Row 4 inserted successfully
Adding Batch #5...Params processed: 4
Results of inserting each row:
Row 1 inserted successfully
Row 2 inserted successfully
Row 3 inserted successfully
Row 4 inserted successfully
Commit Transaction
Free handles.
Error Handling During Batch Loads
When loading individual batches, you can find information on how many rows were accepted and
what rows were rejected (see Tracking Load Status (page 92) for details). Other errors, such as
disk space errors, do not occur while inserting individual batches. This behavior is caused by
having a single COPY statement perform the loading of multiple consecutive batches. Using the
single COPY statement makes the batch load process perform much faster. It is only when the
COPY statement closes that the batched data is committed and HP Vertica reports other types of
errors.
Your bulk loading application should check for errors when the COPY statement closes. Normally,
you force the COPY statement to close by calling the SQLEndTran() function to end the
transaction. You can also force the COPY statement to close by closing the cursor using the
SQLCloseCursor() function, or by setting the database connection's AutoCommit property to
true before inserting the last batch in the load.
Note: The COPY statement also closes if you execute any non-insert statement. However
having to deal with errors from the COPY statement in what might be an otherwise-unrelated
query is not intuitive, and can lead to confusion and a harder to maintain application. You
should explicitly end the COPY statement at the end of your batch load and handle any errors
at that time.
Using the COPY Statement
The COPY statement lets you bulk load data from a file on stored on a database node into the
HP Vertica database. This method is the most efficient way to load data into HP Vertica because
the file resides on the database server. One drawback is that only a database superuser can use
COPY, since it requires privilege in order to access the filesystem of the database node.
-99-
Programmer's Guide
One drawback of using COPY instead of performing batch loads is that you can only get results of
the load (the number of accepted and rejected rows) when the COPY statement has finished. With
batch loads, you can monitor the progress as batches are inserted. The ability to monitor the
progress of a load can be a useful feature if you want to stop loading if a large portion of the data is
being rejected.
A primary concern when bulk loading data using COPY is deciding whether the data should be
loaded directly into ROS using the DIRECT option, or by using the AUTO method (loading into
WOS until it fills, then loading into ROS). You should load directly into the ROS when your
transaction will load a large (more than 100MB of data or so) amount of data.
Note: The exceptions/rejections files are created on the client machine when the exceptions
and rejected data modifiers are specified on the COPY command. Specify a local path and
filename for these modifiers when executing a COPY query from the driver.
The following example loads data into the WOS (Write Optimized Store) until it fills, then stores
additional data directly in ROS (Read Optimized Store).
ret=SQLExecDirect(hdlStmt, (SQLCHAR*)"COPY customers "
"FROM '/data/customers.txt' AUTO",SQL_NTS);
The following example loads data into the ROS (Read Optimized Store.
ret=SQLExecDirect(hdlStmt, (SQLCHAR*)"COPY customers "
"FROM '/data/customers.txt' DIRECT",SQL_NTS);
See the COPY statement in the SQL Reference Manual for more information about its syntax and
use.
The following example demonstrates using the COPY command.
-100-
HP Vertica C lient Libraries
// Some standard headers
#include <stdio.h>
#include <stdlib.h>
// Only needed for Windows clients
// #include <windows.h>
// Standard ODBC headers
#include <sql.h>
#include <sqltypes.h>
#include <sqlext.h>
// Helper function to determine if an ODBC function call returned
// successfully.
bool notSuccess(SQLRETURN ret) {
return (ret != SQL_SUCCESS && ret != SQL_SUCCESS_WITH_INFO);
}
int main()
{
// Set up the ODBC environment
SQLRETURN ret;
SQLHENV hdlEnv;
ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hdlEnv);
if(notSuccess(ret)) {
printf("Could not allocate a handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated an environment handle.\n");
}
// Tell ODBC that the application uses ODBC 3.
ret = SQLSetEnvAttr(hdlEnv, SQL_ATTR_ODBC_VERSION,
(SQLPOINTER) SQL_OV_ODBC3, SQL_IS_UINTEGER);
if(notSuccess(ret)) {
printf("Could not set application version to ODBC3.\n");
exit(EXIT_FAILURE);
} else {
printf("Set application to ODBC 3.\n");
}
// Allocate a database handle.
SQLHDBC hdlDbc;
ret = SQLAllocHandle(SQL_HANDLE_DBC, hdlEnv, &hdlDbc);
// Connect to the database
-101-
Programmer's Guide
printf("Connecting to database.\n");
const char *dsnName = "ExampleDB";
// Note: User MUST be a database superuser to be able to access files on the
// filesystem of the node.
const char* userID = "dbadmin";
const char* passwd = "password123";
ret = SQLConnect(hdlDbc, (SQLCHAR*)dsnName,
SQL_NTS,(SQLCHAR*)userID,SQL_NTS,
(SQLCHAR*)passwd, SQL_NTS);
if(notSuccess(ret)) {
printf("Could not connect to database.\n");
exit(EXIT_FAILURE);
} else {
printf("Connected to database.\n");
}
// Disable AUTOCOMMIT
printf("Disabling autocommit.\n");
ret = SQLSetConnectAttr(hdlDbc, SQL_ATTR_AUTOCOMMIT, SQL_AUTOCOMMIT_OFF,
SQL_NTS);
if(notSuccess(ret)) {
printf("Could not disable autocommit.\n");
exit(EXIT_FAILURE);
}
// Set up a statement handle
SQLHSTMT hdlStmt;
SQLAllocHandle(SQL_HANDLE_STMT, hdlDbc, &hdlStmt);
// Create table to hold the data
SQLExecDirect(hdlStmt, (SQLCHAR*)"DROP TABLE IF EXISTS customers",
SQL_NTS);
SQLExecDirect(hdlStmt, (SQLCHAR*)"CREATE TABLE customers"
"(Last_Name char(50) NOT NULL, First_Name char(50),Email char(50), "
"Phone_Number char(15));",
SQL_NTS);
// Run the copy command to load data into ROS.
ret=SQLExecDirect(hdlStmt, (SQLCHAR*)"COPY customers "
"FROM '/data/customers.txt' DIRECT",
SQL_NTS);
if(notSuccess(ret)) {
printf("Data was not successfully loaded.\n");
exit(EXIT_FAILURE);
} else {
// Get number of rows added.
SQLLEN numRows;
ret=SQLRowCount(hdlStmt, &numRows);
-102-
HP Vertica C lient Libraries
printf("Successfully inserted %d rows.\n", numRows);
}
// Done with batches, commit the transaction
printf("Committing transaction\n");
ret = SQLEndTran(SQL_HANDLE_DBC, hdlDbc, SQL_COMMIT);
if(notSuccess(ret)) {
printf("Could not commit transaction\n");
} else {
printf("Committed transaction\n");
}
// Clean up
printf("Free handles.\n");
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_SUCCESS);
}
The example prints the following when run:
Allocated an environment handle.
Set application to ODBC 3.
Connecting to database.
Connected to database.
Disabling autocommit.
Successfully inserted 10001 rows.
Committing transaction
Committed transaction
Free handles.
Streaming Data from the Client Using COPY LOCAL
The LOCAL option of the SQL COPY statement lets you stream data from a file on a client system
to your HP Vertica database. This statement works through the ODBC driver, making the task of
transferring data files from the client to the server much easier.
The LOCAL option of COPY works transparently through the ODBC driver. Just have your client
application execute a statement containing a COPY LOCAL statement, and the ODBC driver will
read and stream the data file from the client to the server. For example:
-103-
Programmer's Guide
// Some standard headers
#include <stdio.h>
#include <stdlib.h>
// Only needed for Windows clients
// #include <windows.h>
// Standard ODBC headers
#include <sql.h>
#include <sqltypes.h>
#include <sqlext.h>
int main()
{
// Set up the ODBC environment
SQLRETURN ret;
SQLHENV hdlEnv;
ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hdlEnv);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not allocate a handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Allocated an environment handle.\n");
}
// Tell ODBC that the application uses ODBC 3.
ret = SQLSetEnvAttr(hdlEnv, SQL_ATTR_ODBC_VERSION,
(SQLPOINTER) SQL_OV_ODBC3, SQL_IS_UINTEGER);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not set application version to ODBC3.\n");
exit(EXIT_FAILURE);
} else {
printf("Set application to ODBC 3.\n");
}
// Allocate a database handle.
SQLHDBC hdlDbc;
ret = SQLAllocHandle(SQL_HANDLE_DBC, hdlEnv, &hdlDbc);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not aalocate a database handle.\n");
exit(EXIT_FAILURE);
} else {
printf("Set application to ODBC 3.\n");
}
// Connect to the database
printf("Connecting to database.\n");
const char *dsnName = "ExampleDB";
-104-
HP Vertica C lient Libraries
const char* userID = "dbadmin";
const char* passwd = "password123";
ret = SQLConnect(hdlDbc, (SQLCHAR*)dsnName,
SQL_NTS,(SQLCHAR*)userID,SQL_NTS,
(SQLCHAR*)passwd, SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Could not connect to database.\n");
exit(EXIT_FAILURE);
} else {
printf("Connected to database.\n");
}
// Set up a statement handle
SQLHSTMT hdlStmt;
SQLAllocHandle(SQL_HANDLE_STMT, hdlDbc, &hdlStmt);
// Create table to hold the data
SQLExecDirect(hdlStmt, (SQLCHAR*)"DROP TABLE IF EXISTS customers",
SQL_NTS);
SQLExecDirect(hdlStmt, (SQLCHAR*)"CREATE TABLE customers"
"(Last_Name char(50) NOT NULL, First_Name char(50),Email char(50), "
"Phone_Number char(15));",
SQL_NTS);
// Run the copy command to load data into ROS.
ret=SQLExecDirect(hdlStmt, (SQLCHAR*)"COPY customers "
"FROM LOCAL '/home/dbadmin/customers.txt' DIRECT",
SQL_NTS);
if(!SQL_SUCCEEDED(ret)) {
printf("Data was not successfully loaded.\n");
exit(EXIT_FAILURE);
} else {
// Get number of rows added.
SQLLEN numRows;
ret=SQLRowCount(hdlStmt, &numRows);
printf("Successfully inserted %d rows.\n", numRows);
}
// COPY commits automatically, unless it is told not to, so
// there is no need to commit the transaction.
// Clean up
printf("Free handles.\n");
ret = SQLDisconnect( hdlDbc );
if(!SQL_SUCCEEDED(ret)) {
printf("Error disconnecting. Transaction still open?\n");
exit(EXIT_FAILURE);
}
SQLFreeHandle(SQL_HANDLE_STMT, hdlStmt);
SQLFreeHandle(SQL_HANDLE_DBC, hdlDbc);
-105-
Programmer's Guide
SQLFreeHandle(SQL_HANDLE_ENV, hdlEnv);
exit(EXIT_SUCCESS);
}
This example is essentially the same as the example shown in Using the COPY Statement (page
99), except it uses the COPY statement's LOCAL option to load data from the client system rather
than from the filesystem of the database node.
Note: On Windows clients, the path you supply for the COPY LOCAL file is limited to 216
characters due to limitations in the Windows API.
See Also
 COPY LOCAL (SQL Reference Manual).
-106-
Programming JDBC Client Applications
The HP Vertica JDBC driver provides you with a standard JDBC API. If you have accessed other
databases using JDBC, you should find accessing HP Vertica familiar. This section explains how
to use the JDBC to connect your Java application to HP Vertica.
You must first install the JDBC client driver on all client systems that will be accessing the HP
Vertica database. For installation instructions, see Installing the HP Vertica Client Drivers
(page 15).
For more information about JDBC:


JDBC Driver JavaDoc (HP Vertica extensions)
An Introduction to JDBC, Part 1
(http://www.onjava.com/pub/a/onjava/excerpt/javaentnut_2/index1.html)
JDBC Feature Support
The HP Vertica JDBC driver complies with the JDBC 3 standards, and supports most of its
features. You can use the DatabaseMetaData class to determine if the a particular feature you
want to use is supported. Also, note the following about the features supported by the HP Vertica
JDBC driver:



Cursors are forward only and are not scrollable. Result sets cannot be updated.
A connection supports a single statement executing at any time. If you want to execute
multiple statements simultaneously, open multiple connections.
Because HP Vertica does not have stored procedures, CallableStatement is not
supported. The DatabaseMetaData.getProcedures() and
.getProcedureColumns() methods return information about SQL functions (including
UDFs) instead of stored procedures.
Multiple SQL Statement Support
The HP Vertica JDBC driver supports executing strings containing multiple statements. For
example:
stmt.executeUpdate("CREATE TABLE t(a INT);INSERT INTO t VALUES(10);");
Only the Statement interface supports executing strings containing multiple SQL statements.
You cannot use multiple statement strings with PreparedStatement. COPY statements that
copy a file from a host file system work in a multiple statement string. However, client COPY
statements (COPY FROM STDIN) do not work.
Multiple Batch Conversion to COPY Statements
When you turn off your connection's AutoCommit property, the HP Vertica JDBC driver converts
all sequential batch insertions into a single COPY statement. This conversion can improve load
performance.
-107-
Programmer's Guide
Updating JDBC Client Code from Previous Driver Versions
The HP Vertica version 5.1 client drivers have been rewritten to improve standards compliance
and reliability. As a result, some HP Vertica-specific features and past incompatibilities have been
eliminated. You must update any client code written for the prior versions of the JDBC driver to
work with the version 5.1 driver and beyond. The following topics explain these updates.
Driver Package and Interface Name Changes
The name of the HP Vertica client driver package has changed. Previously, the HP Vertica JDBC
driver classes were located in com.vertica. They are now in com.vertica.jdbc.
Loading the 4.1 Driver
Loading the 5.1 Driver
Class.forName("com.vertica.Driver");
Class.forName("com.vertica.jdbc.Driver");
Interface Name Changes
The following table lists the interfaces in the HP Vertica JDBC client library whose names have
changed.
4.1 Interface Name
5.1 Interface Name
PGConnection
VerticaConnection
PGStatement
VerticaStatement
Normally, you referenc e these interfaces only when casting a Connection or Statement object to access
HP Vertica-specific methods or properties. Many of the HP Vertica-specific methods and properties have
been removed from the version 5.1 client drivers in favor of JDBC-standard methods and properties.
Therefore, you should not need to cast to these interfaces as often when using the new client drivers. See
Converting from PGConnection to VerticaConnection (page 109) and Converting from PGStatement
to VerticaStatem ent (page 110) for more information.
Removed Classes
The HP Vertica version 5.1 JDB C driver has removed several legacy classes.
Previously, many HP Vertica-s pecific methods would throw a PSQLException when encountering an
error. This class has been removed. Instead, all methods now throw a standard SQLException.
The PGInterval class has been replaced with a pair of HP Vertica interval classes :
VerticaDayTimeInterval (which represents all ten types of day/time intervals) and
VerticaYearMonthInterval (which represents all three types of year/month intervals). See Using
Interval s with JDBC (page 131) for details.
The PGMoney class has been removed from the driver.
-108-
HP Vertica C lient Libraries
Converting from PGConnection to VerticaConnection
The VerticaConnection interface replaces the PGConnection interface in the version 5.1
JDBC driver. The PGConnection interface contained HP Vertica-specific extensions to the
standard JDBC Connection interface. VerticaConnection does not implement some of
PGConnection's methods to make it more compliant with the JDBC standards.
Property Setters and Getters
The PGConnection interface had several specific setters and getters for some connection
properties. These have been removed from VerticaConnection and replaced with properties,
as described in the following table:
4.1 Methods
5.1 Property Name
getBatchDirectInsert()
setBatchDirectInsert()
DirectBatchInsert
Description
getMaxLRSMemory()
setMaxLRSMemory()
ResultBufferSize
Cont rols whether dat a is inserted
directly into ROS, or int o WOS.
Cont rols the size of the memory
buffer the client uses when
retrieving a result set stream.
Instead of these non-standard methods, use the VerticaConnection.getProperty() and
VerticaConnection.setProperty() methods to access these properties, or set them when
creating the database connection. See Connection Properties (page 117) and Setting and
Getting Connection Property Values (page 120) for more information.
Deprecated Methods
The following methods of the PGConnection class were deprecated in the version 4.1 drivers,
and are not implemented by VerticaConnection:
Removed Methods
getBatchInsertEnforceLength()
setBatchInsertEnforceLength()
Reason
In the HP Vertica version 5.1 drivers,
batch inserts always enforce
column-widt h limitations. This makes
batch inserts consistent with non-batch
inserts, which always enforce width
limitations. If you do not want to batch
load errors to occur due to data being
too wide for the column, you can either:


Trunc ate the dat a to the
column's width before
attempting to insert it
Use the
VerticaCopyStream class to
execute a COPY FROM S TDIN
statement, without an
ENFORCE LENGTH
parameter.
-109-
Programmer's Guide
Removed Methods
getBinaryBatchInsert()
setBinaryBatchInsert()
getLocale()
setLocale()
getStreamingLRS()
setStreamingLRS()
getEncoding()
setEncoding()
getFastPathAPI()
getLargeObjectAPI()
getManagedBatchInsert()
setManagedBatchInsert()
getObject()
getPGType()
getPrepareThreshold()
setPrepareThreshold()
getSQLType()
getUse35CopyFormat()
setUse35CopyFormat()
getUse35CopyParameters()
setUser35CopyParameters()
addDataType()
Reason
The new JDBC driver now uses a single
batch insert protocol rather than having
separate binary and text modes to
insert data.
These methods are not implemented by
VerticaConnection. Instead, you
set the connection's locale using the
SET LOCALE SQL statement. See
Setting the Locale for JDBC
Sessions (page 123) for details.
The new JDB C driver always uses
streaming result sets rather than
optionally caching the results on the
client.
All of these methods were previously
deprecated and have been removed
from the version 5.1 JDBC driver.
Savepoint Support
The version 5.1 JDBC driver implements the Savepoint interface, allowing you to use
savepoints to segment your transactions and later roll back a portion of the transaction.
Updatable Result Set Changes
The createStatement() method in the 4.1 and earlier JDBC drivers accepted the
ResultSet.CONCUR_UPDATABLE flag to create an updatable result set. However, the driver's
support of this feature was limited. The 5.1 JDBC driver does not support updatable result sets.
The createStatement() method now accepts just ResultSet.TYPE_FORWARD_ONLY and
ResultSet.CONCUR_READ_ONLY flags. It throws an exception if you pass it other flags.
Converting from PGStatement to VerticaStatement
The VerticaStatement interface contains the HP Vertica-specific extensions of the standard
JDBC Statement interface. In previous versions of the HP Vertica JDBC driver, this interface
was named PGStatement. In addition to the name change, a some methods have been
removed from this interface.
-110-
HP Vertica C lient Libraries
Deprecated Methods
VerticaStatement does not implement the following PGStatement methods because they
are obsolete:




executeCopyIn()
executeCopyOut()
finishLoad()
getLastOID()
Bulk Loading Method Changes
VerticaStatement does not implement the following PGStatement bulk loading methods:



addStreamtoCopyIn()
startCopyIn()
finishCopyIn()
Instead, the new VerticaCopyStream class implements a stream copying feature. See
Streaming Data via JDBC (page 152).
Connection Property Setters and Getters
In previous versions of the drivers, many of the connection property setters and getters on the
PGConnection interface were also implemented in the PGStatement interface. These methods
allowed you to change some of the connection parameters for the statement, letting you override
their settings on the PGConnection object used to create the statement. VerticaStatement
does not implement these setters and getters helping to prevent potential confusion and
difficult-to-debug issues caused by having different statements on the same connection having
their own connection properties.
VerticaStatement objects cache two properties independently: DirectBatchInsert and
ResultBufferSize. Once a VerticaStatement object is instantiated, it stores the values of these
properties that were set on the VerticaConnection object you used to create them. The object
retains these values even if you later change the property on the VerticaConnection object.
For example, in the following code:
// Set the DirectBatchInsert property
((VerticaConnection) conn).setProperty("DirectBatchInsert", true );
// Create a statement object
Statement stmtA = conn.createStatement();
// Change the direct batch insert setting and create another
// statement.
((VerticaConnection) conn).setProperty("DirectBatchInsert", false );
Statement stmtB = conn.createStatement();
The stmtA object has its DirectBatchInsert property set to true, since that was the property's value
on the VerticaConnection used to instantiate it. Since this property is specific to the
statement, stmtA's DirectBatchInsert property remains unchanged when the connection's
DirectBatchInsert property changes to false later.
-111-
Programmer's Guide
Note: The VerticaStatement interface does not cache any of the other connection
properties. If you change another property (such as Locale) on a VerticaConnection
object, the change affects all of the VerticaStatement objects instantiated using that
connection object.
Multiple Statement Support
The previous JDBC driver's implementation of the Statement interface did not support executing
SQL strings containing multiple statements. The new driver's implementation does support
multiple statements. For example:
stmt.executeUpdate("CREATE TABLE t(a INT);INSERT INTO t VALUES(10);");
Only the Statement interface supports executing strings containing multiple SQL statements.
You cannot use multiple statement strings with PreparedStatement. COPY statements that
copy a file from a host file system work in a multiple statement string. However, client COPY
statements (COPY FROM STDIN) do not work.
Connection Property Changes
JDBC connection properties let you control the behavior of your application's connection to the
database. The HP Vertica version 5.1 JDBC driver has removed some old properties, renamed
several others, and added some new properties.
New Connection Properties
The following properties have been added to the HP Vertica version 5.1 JDBC driver.
Property
Description
ConnSettings
Cont ains SQL commands to be
executed when the databas e
connection is established.
LogLevel
Sets the types of messages that the
client writes to a log file.
LogNameSpace
Limits the messages written to the log
to only those generated by classes in
the given namespace.
LogP ath
Sets the path where the log file is
written.
Trans actionIsolation
Previously could only be accessed
using a getter and setter met hod on the
PGConnection object. It can now be
set using the same methods as other
connection properties.
-112-
HP Vertica C lient Libraries
Property
Description
ThreePartNaming
Cont rols whether the database
metadat a uses the database name as
the catalog name. This property is only
used for backwards compatibility with
some client software.
See Connection Properties (page 117) for more information about these new properties.
Renamed Properties
Several properties have been renamed to make connection properties more consistent across the
different client libraries:
4.1 Property Name
5.1 Name
Description
defaultAutoCommit
Autocommit
Sets whether statements
automatically commit
themselves.
SessionLabel
Label
Identifies the connection on
the server.
maxLRSMemory
ResultBufferSize
Sets the buffer size the driver
uses when retrieving a result
set from the server. In
addition, the default value has
been changed from 8MB to
8KB.
Removed Connection Properties
The following table lists the connection properties have been removed from the HP Vertica version
5.1 JDBC driver:
Parameter
Description
batchInsertEnforceLength
Cont rolled whether inserting data that is
too wide for its column would caus e an
error. Removed to make batch inserts
consistent with other types of data
inserts. Attempting to insert data that is
too wide for a column always results in
an error.
batchInsert RecordTerminator
Set the record terminator for a batch
insert. This property was deprecated
and was only available for backwards
compatibility.
-113-
Programmer's Guide
Parameter
Description
binaryBatchInsert
Cont rolled whether batched data
inserts were transmitted as binary data
or converted to string. Removed
because the driver now uses NA TIVE
VARCHAR transfer for all data types.
BinaryDataTrans fer
Cont rolled whether data was
transmitted between the server and the
client in binary format. Removed
because the driver now uses NA TIVE
VARCHAR to transfer data
KeepAlive
Caus ed the network connection to send
keepalive packets to ensure the
connection remained open during idle
periods. The new JDB C driver always
sends keepalive packets.
Locale
Set the locale for the connection.
Instead of a property, use a query to set
the locale. You can include a SQL
statement in the ConnSettings property
to set the locale as soon as the JDBC
driver connects to the database. See
Setting the Locale for JDBC
Sessions (page 123).
prepareThreshold
Cont rolled how many times a statement
would be executed before the server
would aut omatically convert it to a
server-side prepared statement.
Removed since this functionality was
deprecated.
streamingLRS
Cont rolled whether results were
streamed to the client as it requested
data, or if all data in a result set was
sent to the client, which cached it in a
local file. Removed since the new driver
always uses streaming mode.
Creating and Configuring a Connection
Before your Java application can interact with HP Vertica, it must create a connection. Connecting
to HP Vertica via JDBC is similar to connecting to most other databases.
Importing SQL Packages and Loading the Driver
Before creating a connection, you must import the Java SQL packages. The easiest way to do this
to import the entire package using a wildcard:
-114-
HP Vertica C lient Libraries
import java.sql.*;
You may also want to import the Properties class. You can use an instance of this class to pass
connection properties when instantiating a connection, rather than encoding everything within the
connection string:
import java.util.Properties;
Finally, you'll need to load the HP Vertica JDBC driver using the Class.forName() method:
try {
Class.forName("com.vertica.jdbc.Driver");
} catch (ClassNotFoundException e) {
// Could not find the driver class. Likely an issue
// with finding the .jar file.
System.err.println("Could not find the JDBC driver class.");
e.printStackTrace();
return; // Exit. Cannot do anything further.
}
Opening the Connection
With SQL packages imported and the driver loaded, you are ready to create your connection by
calling the DriverManager.getConnection() method. You supply this method with at least
the following information:
 The name of a host in the database cluster
 The port number for the database
 The name of the database
 The username of a user who has access to the database
 The password of the user
The first three parameters are always supplied as part of the connection string (a URL that tells the
JDBC driver where to find the database). The format of the connection string is:
"jdbc:vertica://VerticaHost:portNumber/databaseName"
The first portion of the connection string selects the specific JDBC driver to use, followed by the
location of the database.
The last two parameters, username and password, can be given to the JDBC driver in one of three
ways:

as part of the connection string. The parameters are encoded similarly to URL parameters:
"jdbc:vertica://VerticaHost:portNumber/databaseName?user=username&pa
ssword=password"

passed as separate parameters to DriverManager.getConnection():
Connection conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:portNumber/databaseName",
"username", "password");
passed in a Properties object:

-115-
Programmer's Guide
Properties myProp = new Properties();
myProp.put("user", "username");
myProp.put("password", "password");
Connection conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:portNumber/databaseName", myProp);
You usually want to use the Properties object, because it makes passing additional connection
properties to the getConnection() method easy. See Connection Properties (page 117) and
Setting and Getting Connection Property Values (page 120) for more information about the
additional connection properties.
The getConnection() method throws a SQLException if there is any problem establishing a
connection to the database, so you want to enclose it within a try-catch block, as shown in the
following complete example of establishing a connection:
import java.sql.*;
import java.util.Properties;
public class ConnectionExample {
public static void main(String[] args) {
// Load JDBC driver
try {
Class.forName("com.vertica.jdbc.Driver");
} catch (ClassNotFoundException e) {
// Could not find the driver class. Likely an issue
// with finding the .jar file.
System.err.println("Could not find the JDBC driver class.");
e.printStackTrace();
return; // Bail out. We cannot do anything further.
}
// Create property object to hold username & password
Properties myProp = new Properties();
myProp.put("user", "ExampleUser");
myProp.put("password", "password123");
Connection conn;
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/ExampleDB", myProp);
} catch (SQLException e) {
// Could not connect to database.
System.err.println("Could not connect to database.");
e.printStackTrace();
return;
}
// Connection is established, do something with it here or
// return it to a calling method
}
}
-116-
HP Vertica C lient Libraries
Notes:
 When you disconnect a user session, any uncommitted transactions are automatically rolled
back.
 If your database is not compliant with your HP Vertica license terms, HP Vertica issues a
SQLWarning when you establish the connection to the database. You can retrieve this
warning using the Connection.getWarnings() method.
JDBC Connection Properties
You use connection properties to configure the connection between your JDBC client application
and your HP Vertica database. The properties provide the basic information about the
connections, such as the server name and port number to use to connect to your database. They
also let you tune the performance of your connection and enable logging.
There are three ways to set a connection property:

Including the property name and value as part of the connection string you pass to the
DriverManager.getConnection() method.

Setting the properties in a Properties object, then passing it to the
DriverManager.getConnection()method.

Using the VerticaConnection.setProperty() method. Only the connection properties
that can be changed after the connection has been established can be changed using the
setProperty() method.
In addition, some of the standard JDBC connection properties have getters and setters on the
Connection interface (such as Connection.setAutocommit()).
The following tables list the properties supported by the HP Vertica JDBC driver, and explain
which are required in order for the connection to be established.
Connection Properties
The properties in the following table can only be set before you open the connection to the
database. Two of them are required for every connection.
Property
Description
Required?
ConnSettings
A string containing SQL statements
No
that the JDBC driver automatically
runs after it connects to the database.
This property is useful to set the
locale, set the schema search path,
or perform other configuration that
the connection requires.
none
Label
Sets a label for the connection on the No
server. This value appears in the
session_id column of the
V_MONITOR.SESSIONS system
table.
jdbc-driver_version-ra
ndom_number
-117-
Default Value
Programmer's Guide
Property
Description
Required?
Default Value
LoginTimeout
The number of seconds HP Vertica
waits for a connection to be
established to the database before
throwing a SQLException. When
set to 0 (the default) there is no TCP
timeout.
No
0
SSL
When set to true, uses SSL to encrypt No
the connection to the server. HP
Vertica needs to be configured to
handle SSL connections before you
can establish an SSL-encrypted
connection to it. See Implementing
SSL in the Administrator's Guide.
false
Password
The password to use to log into the
database.
Yes
none
User
The databas e username to use to
connection to the database.
Yes
none
General Properties
The following properties can be set after the connection is established. None of these properties
are required.
Property
Description
Set After Connection
AutoCommit
Connection.
Cont rols whether the connection
setAutoCommit()
automatically commits
transactions. Set this parameter to
false to prevent the connection
from automatically committing its
transactions. You often want to do
this when you are bulk loading
multiple batches of data and you
want the ability to roll back all of the
loads if an error occurs.
Note: This property was called
defaultAutoCommit in previous
versions of the HP Vertica JDB C
driver.
true
DirectBatchInsert
Determines whet her a batch insert VerticaConnection.
stored data directly into ROS (true) setProperty()
or using A UTO mode (false).
false
-118-
Default Value
HP Vertica C lient Libraries
Property
Description
Set After Connection
ReadOnly
Connection.
When set to true, makes the data
connection read-only. Any queries setReadOnly()
attempting to update the dat abase
using a read-only connection cause
a SQLException.
ResultBufferSize
Sets the size of the buffer the HP
Vertica JDBC driver uses to
temporarily store result sets.
Default Value
false
VerticaConnection.
setProperty()
8912 (8KB)
VerticaConnection.
setProperty()
"$user",
public,
v_catalog,
v_monitor,
v_int ernal
Note: This property was named
maxLRSMemory in previous
versions of the HP Vertica JDB C
driver.
SearchP ath
Sets the schema search path for
the connection. The value for this
property is a string containing a
comma-separated list of schema
names. See Setting Search Pat hs
in the Administrator's Guide for
more information on the schema
search path.
ThreePartNaming
VerticaConnection.
A Boolean that controls how
setProperty()
DatabaseMetaData reports the
catalog name. When set to t rue, the
database name is returned as the
catalog name in the database
metadat a. When set to false, NULL
is returned as the catalog name.
true
Enable this option if your client
software expects to be able to get
the catalog name from the
database metadata and use it as
part of a three-part name reference.
Trans actionIsolatio
n
Sets the isolation level of the
transactions that use the
connection. See Changing the
Transaction Isolation Level
(page 125) for details.
Note: In previous versions of the
HP Vertica JDBC driver, this
property was only available using a
getter and setter on the
PGConnection object. It can now
be set like other connection
properties.
-119-
Connection.
setTransactionIsolation(
)
TRANSA CTIO
N_READ_CO
MMITTE D
Programmer's Guide
Logging Properties
The properties that control client logging must be set before the connection is opened. None of
these properties are required, and none can be changed after the Connection object has been
instantiated.
Property
Description
Default Value
LogLevel
Sets the type of information logged by the JDB C driver. "OFF"
The value is set to one of the following values:
 "DEBUG"
 "ERROR"
 "TRACE"
 "WARNING"
 "INFO"
 "OFF"
LogNameSpace
Restricts logging to just messages generated by a
none
specific packages. Valid values are:
 com.vertica (all messages generated by the
JDB C driver)
 com.vertica.jdbc (messages generated by
the top-level JDBC API layer)
 com.vertica.jdbc.core (connection and
statement settings)
 com.vertica.jdbc.io (client/server
protocol messages)
 com.vertica.jdbc.util (miscellaneous
utilities)
 com.vertica.jdbc.dataengine (query
execution and result set iteration).
LogP ath
Sets the path where the log file is written.
The current working
directory
Note: The VerticaConnection.setProperty() method can also be used to set
properties that have standard JDBC Connection setters, such as AutoCommit.
For information about manipulating these attributes, see Setting and Getting Connection
Property Values (page 120).
Setting and Getting Connection Property Values
There are three ways to set a connection property:

Including the property name and value as part of the connection string you pass to the
DriverManager.getConnection() method.
-120-
HP Vertica C lient Libraries

Setting the properties in a Properties object, then passing it to the
DriverManager.getConnection()method.

Using the VerticaConnection.setProperty() method. Only the connection properties
that can be changed after the connection has been established can be changed using the
setProperty() method.
In addition, some of the standard JDBC connection properties have getters and setters on the
Connection interface (such as Connection.setAutocommit()).
Setting Properties when Connecting
There are two ways you can set connection properties when creating a connection to HP Vertica:


In the connection string, using the same URL parameter format that you can use to set the
username and password. The following example sets the SSL connection parameter to true:
"jdbc:vertica://VerticaHost:5433/db?user=UserName&password=Password&
ssl=true"
In a Properties object that you pass to the getConnection() call. You will need to import
the java.util.Properties class in order to instantiate a Properties object. Then you
use the put() method to add the property name and value to the object:
Properties myProp = new Properties();
myProp.put("user", "ExampleUser");
myProp.put("password", "password123");
myProp.put("LoginTimeout", "35");
Connection conn;
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/ExampleDB", myProp);
} catch (SQLException e) {
e.printStackTrace();
}
Note: The data type of all of the values you set in the Properties object are strings, even if the
property value's data type is integer or Boolean.
Getting and Setting Properties after Connecting
The VerticaConnection.getProperty() method lets you get the value of some connection
properties. You can use VerticaConnection.setProperty() method to change the value
for properties that can be set after the database connection has been established. Since these
methods are HP Vertica-specific, you must cast your Connection object to the
VerticaConnection interface to be able to use them. To cast to VerticaConnection, you
must either import it into your client application or use a fully-qualified reference
(com.vertica.jdbc.VerticaConnection). The following example demonstrates getting
and setting the value of the DirectBatchInsert property.
-121-
Programmer's Guide
import java.sql.*;
import java.util.Properties;
import com.vertica.jdbc.*;
public class SetConnectionProperties {
public static void main(String[] args) {
// Note: If your application needs to run under Java 5, you need to
// load the JDBC driver using Class.forName() here.
Properties myProp = new Properties();
myProp.put("user", "ExampleUser");
myProp.put("password", "password123");
// Set DirectBatchInsert to true initially
myProp.put("DirectBatchInsert", "true");
Connection conn;
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/ExampleDB",
myProp);
// Show state of the DirectBatchInsert property. This was set at the
// time the connection was created.
System.out.println("DirectBatchInsert state: "
+ ((VerticaConnection) conn).getProperty(
"DirectBatchInsert"));
// Change it and show it again
((VerticaConnection) conn).setProperty("DirectBatchInsert", false);
System.out.println("DirectBatchInsert state is now: " +
((VerticaConnection) conn).getProperty(
"DirectBatchInsert"));
conn.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
When run, the example prints the following on the standard output:
DirectBatchInsert state: true
DirectBatchInsert state is now: false
-122-
HP Vertica C lient Libraries
Setting the Locale for JDBC Sessions
You set the locale for a connection while opening it by including a SET LOCALE statement in the
ConnSettings property, or by executing a SET LOCALE statement at any time after opening the
connection. Changing the locale of a Connection object affects all of the Statement objects
you instantiated using it.
You can get the locale by executing a SHOW LOCALE query. The following example
demonstrates setting the locale using ConnSettings and executing a statement, as well as getting
the locale:
-123-
Programmer's Guide
import java.sql.*;
import java.util.Properties;
public class GetAndSetLocale {
public static void main(String[] args) {
try {
Class.forName("com.vertica.jdbc.Driver");
} catch (ClassNotFoundException e) {
System.err.println("Could not find the JDBC driver class.");
e.printStackTrace();
return;
}
Properties myProp = new Properties();
myProp.put("user", "ExampleUser");
myProp.put("password", "password123");
// Set Locale to true en_GB on connection. After the connection
// is established, the JDBC driver runs the statements in the
// ConnSettings property.
myProp.put("ConnSettings", "SET LOCALE TO en_GB");
Connection conn;
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/ExampleDB",
myProp);
// Execute a query to get the locale. The results should
// show "en_GB" as the locale, since it was set by the
// conn settings property.
Statement stmt = conn.createStatement();
ResultSet rs = null;
rs = stmt.executeQuery("SHOW LOCALE");
System.out.print("Query reports that Locale is set to: ");
while (rs.next()) {
System.out.println(rs.getString(2).trim());
}
// Now execute a query to set locale.
stmt.execute("SET LOCALE TO en_US");
// Run query again to get locale.
rs = stmt.executeQuery("SHOW LOCALE");
System.out.print("Query now reports that Locale is set to: ");
while (rs.next()) {
System.out.println(rs.getString(2).trim());
}
// Clean up
conn.close();
-124-
HP Vertica C lient Libraries
} catch (SQLException e) {
e.printStackTrace();
}
}
}
Running the above example displays the following on the system console:
Query reports that Locale is set to: en_GB (LEN)
Query now reports that Locale is set to: en_US (LEN)
Notes:
 JDBC applications use a UTF-16 character set encoding and are responsible for converting
any non-UTF-16 encoded data to UTF-16. Failing to convert the data can result in errors or the
data being stored incorrectly.
 The JDBC driver converts UTF-16 data to UTF-8 when passing to the HP Vertica server and
converts data sent by HP Vertica server from UTF-8 to UTF-16 .
Changing the Transaction Isolation Level
Changing the transaction isolation level lets you choose how transactions prevent interference
from other transactions. By default, the JDBC driver matches the transaction isolation level of the
HP Vertica server. The HP Vertica default transaction isolation level is READ_COMMITTED, which
means any changes made by a transaction cannot be read by any other transaction until after they
are committed. This prevents a transaction from reading data inserted by another transaction that
is later rolled back.
HP Vertica also supports the SERIALIZABLE transaction isolation level. This level locks tables to
prevent queries from having the results of their WHERE clauses changed by other transactions.
Locking tables can have a performance impact, since only one transaction is able to access the
table at a time.
A transaction retains its isolation level until it completes, even if the session's transaction isolation
level changes mid-transaction. HP Vertica internal processes (such as the Tuple Mover and
refresh operations) and DDL operations are always run at SERIALIZABLE isolation level to
ensure consistency.
You can change the transaction isolation level connection property after the connection has been
established using the Connection object's setter (setTransactionIsolation()) and getter
(getTransactionIsolation()). The value for transaction isolation property is an integer. The
Connection interface defines constants to help you set the value in a more intuitive manner:
Constant
Value
Connection.TRANSACTION_READ_COMMITTED
2
Connection.TRANSACTION_SERIALIZABLE
8
-125-
Programmer's Guide
Note: The Connection interface also defines several other transaction isolation constants
(READ_UNCOMMITTED and REPEATABLE_READ). Since HP Vertica does not support these
isolation levels, they are converted to READ_COMMITTED and SERIALIZABLE, respectively.
The following example demonstrates setting the transaction isolation level to SERIALIZABLE.
import java.sql.*;
import java.util.Properties;
public class SetTransactionIsolation {
public static void main(String[] args) {
try {
Class.forName("com.vertica.jdbc.Driver");
} catch (ClassNotFoundException e) {
System.err.println("Could not find the JDBC driver class.");
e.printStackTrace();
return;
}
Properties myProp = new Properties();
myProp.put("user", "ExampleUser");
myProp.put("password", "password123");
Connection conn;
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/ExampleDB",
myProp);
// Get default transaction isolation
System.out.println("Transaction Isolation Level: "
+ conn.getTransactionIsolation());
// Set transaction isolation to SERIALIZABLE
conn.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);
// Get the transaction isolation again
System.out.println("Transaction Isolation Level: "
+ conn.getTransactionIsolation());
conn.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
Running the example results in the following being printed out to the console:
Transaction Isolation Level: 2
Transaction Isolation Level: 8
-126-
HP Vertica C lient Libraries
Using a Pooling Data Source
A pooling data source uses pool of connections in order to reduce the overhead of network
connections between the client and server. Opening a new connection for each request is more
costly to both the server and the client than keeping a small pool of connections open constantly,
ready to be used by new requests. When a request comes in, one of the pre-existing connections
in the pool is assigned to it. Only if there are no free connections in the pool is a new connection
created. Once the request is complete, the connection returns to the pool and waits to service
another request.
The HP Vertica JDBC driver supports connection pooling as defined in the JDBC 3.0 standard. If
you are using a J2EE-based application server in conjunction with HP Vertica, it should already
have a built-in data pooling feature. All that is required is that the application server work with the
PooledConnection interface implemented by HP Vertica's JDBC driver. An application server's
pooling feature is usually well-tuned for the works loads that the server is designed to handle. See
your application server's documentation for details on how to work with pooled connections.
Normally, using pooled connections should be transparent in your code—you will just open
connections and the application server will worry about the details of pooling them.
If you are not using an application server, or your application server does not offer connection
pooling that is compatible with HP Vertica, you can use a third-party pooling library, such as the
open-source c3p0 library, to implement connection pooling.
JDBC Data Types
The JDBC driver transparently converts most HP Vertica data types to the appropriate Java data
type. In a few cases, an HP Vertica data type cannot be directly translated to a Java data type;
these are explained in this section.
HP Vertica Numeric Data Alias Conversion
The HP Vertica server supports data type aliases for INTEGER, FLOAT, and NUMERIC types
(see SQL Data Types in the SQL Reference Manual). The JDBC driver reports these as its basic
data types (Integer, Float, and Numeric), as follows:
HP Vertica Server Types and Aliases
HP Vertica JDBC Type
INTE GER
INT
INT8
Integer
BIGINT
SMALLINT
TINY INT
-127-
Programmer's Guide
HP Vertica Server Types and Aliases
HP Vertica JDBC Type
DOUBLE PRE CIS ION
FLOA T5
FLOA T8
Float
REAL
DECIMAL
NUME RIC
NUMBE R
Numeric
MONEY
If a client application retrieves the values into smaller data types, HP Vertica JDBC driver does not
check for overflows. The following example demonstrates the results of this overflow.
-128-
HP Vertica C lient Libraries
import java.sql.*;
import java.util.Properties;
public class JDBCDataTypes {
public static void main(String[] args) {
// If running under a Java 5 JVM, use you need to load the JDBC driver
// using Class.forname here
Properties myProp = new Properties();
myProp.put("user", "ExampleUser");
myProp.put("password", "password123");
Connection conn;
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/VMart",
myProp);
Statement statement = conn.createStatement();
// Create a table that will hold a row of different types of
// numeric data.
statement.executeUpdate(
"DROP TABLE IF EXISTS test_all_types cascade");
statement.executeUpdate("CREATE TABLE test_all_types ("
+ "c0 INTEGER, c1 TINYINT, c2 DECIMAL, "
+ "c3 MONEY, c4 DOUBLE PRECISION, c5 REAL)");
// Add a row of values to it.
statement.executeUpdate("INSERT INTO test_all_types VALUES("
+ "111111111111, 444, 55555555555.5555, "
+ "77777777.77, 88888888888888888.88, "
+ "10101010.10101010101010)");
// Query the new table to get the row back as a result set.
ResultSet rs = statement
.executeQuery("SELECT * FROM test_all_types");
// Get the metadata about the row, including its data type.
ResultSetMetaData md = rs.getMetaData();
// Loop should only run once...
while (rs.next()) {
// Print out the data type used to defined the column, followed
// by the values retrieved using several different retrieval
// methods.
String[] vertTypes = new String[] {"INTEGER", "TINYINT",
"DECIMAL", "MONEY", "DOUBLE PRECISION", "REAL"};
for (int x=1; x<7; x++) {
System.out.println("\n\nColumn " + x + " (" +
vertTypes[x-1]
-129-
Programmer's Guide
+ ")");
System.out.println("\tgetColumnType()\t\t"
+ md.getColumnType(x));
System.out.println("\tgetColumnTypeName()\t"
+ md.getColumnTypeName(x));
System.out.println("\tgetShort()\t\t"
+ rs.getShort(x));
System.out.println("\tgetLong()\t\t" + rs.getLong(x));
System.out.println("\tgetInt()\t\t" + rs.getInt(x));
System.out.println("\tgetByte()\t\t" + rs.getByte(x));
}
}
rs.close();
statement.executeUpdate("drop table test_all_types cascade");
statement.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
The above example prints the following on the console when run:
Column 1 (INTEGER)
getColumnType()
-5
getColumnTypeName()
Integer
getShort()
455
getLong()
111111111111
getInt()
-558038585
getByte()
-57
Column 2 (TINYINT)
getColumnType()
getColumnTypeName()
getShort()
444
getLong()
444
getInt()
444
getByte()
-68
-5
Integer
Column 3 (DECIMAL)
getColumnType()
2
getColumnTypeName()
Numeric
getShort()
-1
getLong()
55555555555
getInt()
2147483647
getByte()
-1
Column 4 (MONEY)
getColumnType()
2
-130-
HP Vertica C lient Libraries
getColumnTypeName()
Numeric
getShort()
-13455
getLong()
77777777
getInt()
77777777
getByte()
113
Column 5 (DOUBLE PRECISION)
getColumnType()
8
getColumnTypeName()
Float
getShort()
-1
getLong()
88888888888888900
getInt()
2147483647
getByte()
-1
Column 6 (REAL)
getColumnType()
8
getColumnTypeName()
Float
getShort()
8466
getLong()
10101010
getInt()
10101010
getByte()
18
Using Intervals with JDBC
The JDBC standard does not contain a data type for intervals (the duration between two points in
time). To handle HP Vertica's INTERVAL data type, you must use JDBC's database-specific
object type.
When reading an interval value from a result set, use the ResultSet.getObject() method to
retrieve the value, and then cast it to one of the HP Vertica interval classes:
VerticaDayTimeInterval (which represents all ten types of day/time intervals) or
VerticaYearMonthInterval (which represents all three types of year/month intervals).
Note: The units interval style is not supported. Do not use the SET INTERVALSTYLE
statement to change the interval style in your client applications.
Using Intervals in Batch Inserts
When inserting batches into tables that contain interval data, you must create instances of the
VerticaDayTimeInterval or VerticaYearMonthInterval classes to hold the data you
want to insert. You set values either when calling the class's constructor, or afterwards using
setters. You then insert your interval values using the PreparedStatement.setObject()
method. You can also use the .setString() method, passing it a string in "DD HH:MM:SS" or
"YY-MM" format.
The following example demonstrates inserting data into a table containing a day/time interval and
a year/month interval:
-131-
Programmer's Guide
import java.sql.*;
import java.util.Properties;
// Need to import the Vertica JDBC classes to be able to instantiate
// the interval classes.
import com.vertica.jdbc.*;
public class IntervalDemo {
public static void main(String[] args) {
// If running under a Java 5 JVM, use you need to load the JDBC driver
// using Class.forname here
Properties myProp = new Properties();
myProp.put("user", "ExampleUser");
myProp.put("password", "password123");
Connection conn;
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/VMart", myProp);
// Create table for interval values
Statement stmt = conn.createStatement();
stmt.execute("DROP TABLE IF EXISTS interval_demo");
stmt.executeUpdate("CREATE TABLE interval_demo("
+ "DayInt INTERVAL DAY TO SECOND, "
+ "MonthInt INTERVAL YEAR TO MONTH)");
// Insert data into interval columns using
// VerticaDayTimeInterval and VerticaYearMonthInterval
// classes.
PreparedStatement pstmt = conn.prepareStatement(
"INSERT INTO interval_demo VALUES(?,?)");
// Create instances of the Vertica classes that represent
// intervals.
VerticaDayTimeInterval dayInt = new VerticaDayTimeInterval(10, 0,
5, 40, 0, 0, false);
VerticaYearMonthInterval monthInt = new VerticaYearMonthInterval(
10, 6, false);
// These objects can also be manipulated using setters.
dayInt.setHour(7);
// Add the interval values to the batch
((VerticaPreparedStatement) pstmt).setObject(1, dayInt);
((VerticaPreparedStatement) pstmt).setObject(2, monthInt);
pstmt.addBatch();
// Set another row from strings.
// Set day interval in "days HH:MM:SS" format
pstmt.setString(1, "10 10:10:10");
// Set year to month value in "MM-YY" format
pstmt.setString(2, "12-09");
-132-
HP Vertica C lient Libraries
pstmt.addBatch();
// Execute the batch to insert the values.
try {
pstmt.executeBatch();
} catch (SQLException e) {
System.out.println("Error message: " + e.getMessage());
}
Reading Interval Values
You read an interval value from a result set using the ResultSet.getObject() method, and
cast the object to the appropriate HP Vertica object class: VerticaDayTimeInterval for
day/time intervals or VerticaYearMonthInterval for year/month intervals. This is easy to do
if you know that the column contains an interval, and you know what type of interval it is. If your
application cannot assume the structure of the data in the result set it reads in, you can test
whether a column contains a database-specific object type, and if so, determine whether the
object belongs to either the VerticaDayTimeInterval or VerticaYearMonthInterval
classes.
-133-
Programmer's Guide
// Retrieve the interval values inserted by previous demo.
// Query the table to get the row back as a result set.
ResultSet rs = stmt.executeQuery("SELECT * FROM interval_demo");
// If you do not know the types of data contained in the result set,
// you can read its metadata to determine the type, and use
// additional information to determine the interval type.
ResultSetMetaData md = rs.getMetaData();
while (rs.next()) {
for (int x = 1; x <= md.getColumnCount(); x++) {
// Get data type from metadata
int colDataType = md.getColumnType(x);
// You can get the type in a string:
System.out.println("Column " + x + " is a "
+ md.getColumnTypeName(x));
//
//
//
if
Normally, you'd have a switch statement here to
handle all sorts of column types, but this example is
simplified to just handle database-specific types
(colDataType == Types.OTHER) {
// Column contains a database-specific type. Determine
// what type of interval it is. Assuming it is an
// interval...
Object columnVal = rs.getObject(x);
if (columnVal instanceof VerticaDayTimeInterval) {
// We know it is a date time interval
VerticaDayTimeInterval interval =
(VerticaDayTimeInterval) columnVal;
// You can use the getters to access the interval's
// data
System.out.print("Column " + x + "'s value is ");
System.out.print(interval.getDay() + " Days ");
System.out.print(interval.getHour() + " Hours ");
System.out.println(interval.getMinute()
+ " Minutes");
} else if (columnVal instanceof
VerticaYearMonthInterval) {
VerticaYearMonthInterval interval =
(VerticaYearMonthInterval) columnVal;
System.out.print("Column " + x + "'s value is ");
System.out.print(interval.getYear() + " Years ");
System.out.println(interval.getMonth() + "
Months");
} else {
System.out.println("Not an interval.");
}
}
}
}
} catch (SQLException e) {
e.printStackTrace();
}
-134-
HP Vertica C lient Libraries
}
}
The example prints the following to the console:
Column
Column
Column
Column
Column
Column
Column
Column
1 is a INTERVAL
1's value is 10
2 is a INTERVAL
2's value is 10
1 is a INTERVAL
1's value is 10
2 is a INTERVAL
2's value is 12
DAY TO SECOND
Days 7 Hours 5 Minutes
YEAR TO MONTH
Years 6 Months
DAY TO SECOND
Days 10 Hours 10 Minutes
YEAR TO MONTH
Years 9 Months
Another option is to use database metadata to find columns that contain intervals.
-135-
Programmer's Guide
// Determine the interval data types by examining the database
// metadata.
DatabaseMetaData dbmd = conn.getMetaData();
ResultSet dbMeta = dbmd.getColumns(null, null, "interval_demo", null);
int colcount = 0;
while (dbMeta.next()) {
// Get the metadata type for a column.
int javaType = dbMeta.getInt("DATA_TYPE");
System.out.println("Column " + ++colcount + " Type name is " +
dbMeta.getString("TYPE_NAME"));
if(javaType == Types.OTHER) {
// The SQL_DATETIME_SUB column in the metadata tells you
// Specifically which subtype of interval you have.
// The VerticaDayTimeInterval.isDayTimeInterval()
// methods tells you if that value is a day time.
//
int intervalType = dbMeta.getInt("SQL_DATETIME_SUB");
if(VerticaDayTimeInterval.isDayTimeInterval(intervalType)) {
// Now you know it is one of the 10 day/time interval types.
// When you select this column you can cast to
// VerticaDayTimeInterval.
// You can get more specific by checking intervalType
// against each of the 10 constants directly, but
// they all are represented by the same object.
System.out.println("column " + colcount + " is a " +
"VerticaDayTimeInterval intervalType = "
+ intervalType);
} else if(VerticaYearMonthInterval.isYearMonthInterval(
intervalType)) {
//now you know it is one of the 3 year/month intervals,
//and you can select the column and cast to
// VerticaYearMonthInterval
System.out.println("column " + colcount + " is a " +
"VerticaDayTimeInterval intervalType = "
+ intervalType);
} else {
System.out.println("Not an interval type.");
}
}
}
Executing Queries Through JDBC
To run a query through JDBC:
1 Connect with the HP Vertica database. See Creating and Configuring a Connection (page
114).
2 Run the query.
-136-
HP Vertica C lient Libraries
The method you use to run the query depends on the type of query you want to run:



a DDL query that does not return a result set.
a DDL query that returns a result set.
a DML query
Executing DDL (Data Definition Language) Queries
To run DDL queries, such as CREATE TABLE and COPY, use the Statement.execute()
method. You get an instance of this class by calling the createStatement method of your
connection object.
The following example creates an instance of the Statement class and uses it to execute a
CREATE TABLE and a COPY query:
Statement stmt = conn.createStatement();
stmt.execute("CREATE TABLE address_book (Last_Name char(50) default ''," +
"First_Name char(50),Email char(50),Phone_Number char(50))");
stmt.execute("COPY address_book FROM 'address.dat' DELIMITER ',' NULL 'null'" );
Executing Queries that Return Result Sets
Use the Statement class's executeQuery method to execute queries that return a result set,
such as SELECT. To get the data from the result set, use methods such as getInt, getString,
and getDouble to access column values depending upon the data types of columns in the result
set. Use ResultSet.next to advance to the next row of the data set.
ResultSet rs = null;
rs = stmt.executeQuery("SELECT First_Name, Last_Name FROM address_book");
int x = 1;
while(rs.next()){
System.out.println(x + ". " + rs.getString(1).trim() + " "
+ rs.getString(2).trim());
x++;
}
Note: The HP Vertica JDBC driver does not support scrollable cursors. You can only read
forwards through the result set.
Executing DML (Data Manipulation Language) Queries Using executeUpdate
Use the executeUpdate method for DML SQL queries that change data in the database, such
as INSERT, UPDATE and DELETE which do not return a result set.
-137-
Programmer's Guide
stmt.executeUpdate("INSERT INTO address_book " +
"VALUES ('Ben-Shachar', 'Tamar', 'tamarrow@example.com'," +
"'555-380-6466')");
stmt.executeUpdate("INSERT INTO address_book (First_Name, Email) " +
"VALUES ('Pete','pete@example.com')");
Note: The HP Vertica JDBC driver's Statement class supports executing multiple statements
in the SQL string you pass to the execute method. The PreparedStatement class does not
support using multiple statements in a single execution.
Loading Data Through JDBC
You can use any of the following methods to load data via the JDBC interface:



Executing a SQL INSERT statement to insert a single row directly.
Batch loading data using a prepared statement.
Bulk loading data from files or streams using COPY.
When loading data into HP Vertica, you need to decide whether to write data to the Write
Optimized Store (WOS) or the Read Optimized Store (ROS). By default, most data loading
methods insert data into the WOS until it fills up, then insert any additional data directly into ROS
containers (called AUTO mode). This is the best method to use when frequently loading small
amounts of data (often referred to as trickle-loading). When performing less frequent large data
loads (any loads over 100MB of data at once), you should change this behavior to insert data
directly into the ROS.
The following sections explain in detail how you load data using JDBC.
Using a Single Row Insert
The simplest way to insert data into a table is to use the SQL INSERT statement. You can use this
statement by instantiating a member of the Statement class, and use its executeUpdate()
method to run your SQL statement.
The following code fragment demonstrates how you can create a Statement object and use it to
insert data into a table named address_book:
Statement stmt = conn.createStatement();
stmt.executeUpdate("INSERT INTO address_book " +
"VALUES ('Smith', 'John', 'jsmith@example.com', " +
"'555-123-4567')");
This method has a few drawbacks: you need convert your data to string and escape any special
characters in your data. A better way to insert data is to use prepared statements. See Batch
Inserts Using JDBC Prepared Statements (page 139).
-138-
HP Vertica C lient Libraries
Batch Inserts Using JDBC Prepared Statements
You can load batches of data into HP Vertica using prepared INSERT statements—server-side
statements that you set up once, and then call repeatedly. You instantiate a member of the
PreparedStatement class with a SQL statement that contains question mark placeholders for
data. For example:
PreparedStatement pstmt = conn.prepareStatement(
"INSERT INTO customers(last, first, id) VALUES(?,?,?)");
You then set the parameters using data-type-specific methods on the PreparedStatement
object, such as setString() and setInt(). Once your parameters are set, call the
addBatch() method to add the row to the batch. When you have a complete batch of data ready,
call the executeBatch() method to execute the insert batch.
Behind the scenes, the batch insert is converted into a COPY statement. When the connection's
AutoCommit parameter is disabled, HP Vertica keeps the COPY statement open and uses it to
load sequential batches until the transaction is committed, the cursor is closed, or your application
executes a non-insert SQL statement (or executes any statement using another Statement or
PreparedStatement object). Using a single COPY statement for multiple batch inserts makes
loading data more efficient. If you are loading multiple batches, you should disable the
AutoCommit property of the database to take advantage of this increased efficiency.
The following example demonstrates using a prepared statement to batch insert data.
-139-
Programmer's Guide
import java.sql.*;
import java.util.Properties;
public class BatchInsertExample {
public static void main(String[] args) {
try {
Class.forName("com.vertica.jdbc.Driver");
} catch (ClassNotFoundException e) {
System.err.println("Could not find the JDBC driver class.");
e.printStackTrace();
return;
}
Properties myProp = new Properties();
myProp.put("user", "ExampleUser");
myProp.put("password", "password123");
Connection conn;
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/ExampleDB",
myProp);
// establish connection and make a table for the data.
Statement stmt = conn.createStatement();
// Set AutoCommit to false to allow Vertica to reuse the same
// COPY statement
conn.setAutoCommit(false);
// Drop table and recreate.
stmt.execute("DROP TABLE IF EXISTS customers CASCADE");
stmt.execute("CREATE TABLE customers (CustID int, Last_Name"
+ " char(50), First_Name char(50),Email char(50), "
+ "Phone_Number char(12))");
// Some dummy data to insert.
String[] firstNames = new String[] { "Anna", "Bill", "Cindy",
"Don", "Eric" };
String[] lastNames = new String[] { "Allen", "Brown", "Chu",
"Dodd", "Estavez" };
String[] emails = new String[] { "aang@example.com",
"b.brown@example.com", "cindy@example.com",
"d.d@example.com", "e.estavez@example.com" };
String[] phoneNumbers = new String[] { "123-456-7890",
"555-444-3333", "555-867-5309",
"555-555-1212", "781-555-0000" };
// Create the prepared statement
PreparedStatement pstmt = conn.prepareStatement(
"INSERT INTO customers (CustID, Last_Name, " +
"First_Name, Email, Phone_Number)" +
" VALUES(?,?,?,?,?)");
// Add rows to a batch in a loop. Each iteration adds a
-140-
HP Vertica C lient Libraries
// new row.
for (int i = 0; i < firstNames.length; i++) {
// Add each parameter to the row.
pstmt.setInt(1, i + 1);
pstmt.setString(2, lastNames[i]);
pstmt.setString(3, firstNames[i]);
pstmt.setString(4, emails[i]);
pstmt.setString(5, phoneNumbers[i]);
// Add row to the batch.
pstmt.addBatch();
}
try {
// Batch is ready, execute it to insert the data
pstmt.executeBatch();
} catch (SQLException e) {
System.out.println("Error message: " + e.getMessage());
return; // Exit if there was an error
}
// Commit the transaction to close the COPY command
conn.commit();
// Print the resulting table.
ResultSet rs = null;
rs = stmt.executeQuery("SELECT CustID, First_Name, "
+ "Last_Name FROM customers ORDER BY CustID");
while (rs.next()) {
System.out.println(rs.getInt(1) + " - "
+ rs.getString(2).trim() + " "
+ rs.getString(3).trim());
}
// Cleanup
conn.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
The result of running the example code is:
1
2
4
5
-
Anna Allen
Bill Brown
Don Dodd
Eric Estavez
-141-
Programmer's Guide
Notes
 Using the PreparedStatement.setFloat() method can cause rounding errors. If
precision is important, use the .setDouble() method instead.



The PreparedStatement object caches the connection's AutoCommit property when the
statement is prepared. Later changes to the AutoCommit property has no effect on the
prepared statement.
In previous versions of the HP Vertica JDBC driver, the PreparedStatement.addBatch()
method would execute the batch immediately. In version 5.1 and later, the batch is only
executed when you call PreparedStatement.executeBatch().
Always check batch insert returns values for SQL warnings. Warnings can occur for issues
such as the truncation of a fractional portion of a numeric value.
Loading Batches Directly into ROS
When loading large batches of data (more than 100MB or so), you should load the data directly
into ROS containers. Inserting directly into ROS is more efficient for large loads than AUTO mode,
since it avoids overflowing the WOS and spilling the remainder of the batch to ROS. Otherwise,
the Tuple Mover has to perform a moveout on the data in the WOS, while subsequent data is
directly written into ROS containers causing your data to be segmented across storage
containers.
To directly load batches into ROS, set the directBatchInsert connection property to true. See
Setting and Getting Connection Property Values (page 120) for an explanation of how to set
connection properties. When this property is set to true, all batch inserts bypass the WOS and
load directly into a ROS container.
If all of batches being inserted using a connection should be inserted into the ROS, you should set
the DirectBatchInsert connection property to true in the Properties object you use to create the
connection:
Properties myProp = new Properties();
myProp.put("user", "ExampleUser");
myProp.put("password", "password123");
// Enable directBatchInsert for this connection
myProp.put("DirectBatchInsert", "true");
Connection conn;
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/ExampleDB", myProp);
. . .
If you will be using the connection for inserting both large and small batches (or you do not know
the size batches you will be inserting when you create the Connection object), you can set the
DirectBatchInsert property after the connection has been established using the
VerticaConnection.setProperty method:
((VerticaConnection)conn).setProperty("DirectBatchInsert", true);
See Setting and Getting Connection Property Values (page 120) for a full example of setting
DirectBatchInsert.
-142-
HP Vertica C lient Libraries
Error Handling During Batch Loads
When loading individual batches, you can find how many rows were accepted and what rows were
rejected (see Identifying Accepted and Rejected Rows (page 143) for details). If you have
disabled the AutoCommit connection setting, other errors (such as disk space errors, for example)
do not occur while inserting individual batches. This behavior is caused by having a single SQL
COPY statement perform the loading of multiple consecutive batches (which makes the load
process more efficient). It is only when the COPY statement closes that the batched data is
committed and HP Vertica reports other types of errors.
Therefore, your bulk loading application should be prepared to check for errors when the COPY
statement closes. You can trigger the COPY statement to close by:



ending the batch load transaction by calling Connection.commit()
closing the statement using Statement.close()
setting the connection's AutoCommit property to true before inserting the last batch in the load
Note: The COPY statement also closes if you execute any non-insert statement or execute any
statement using a different Statement or PreparedStatement object. Ending the COPY
statement using either of these methods can lead to confusion and a harder-to-maintain
application, since you would need to handle batch load errors in a non-batch load statement.
You should explicitly end the COPY statement at the end of your batch load and handle any
errors at that time.
Identifying Accepted and Rejected Rows
The return value of PreparedStatement.executeBatch is an integer array containing the
success or failure status of inserting each row. A value 1 means the row was accepted and a value
of -3 means that the row was rejected. In the case where an exception occurred during the batch
execution, you can also get the array using BatchUpdateException.getUpdateCounts().
The following example extends the example shown in Batch Inserts Using JDBC Prepared
Statements (page 139) to retrieve this array and display the results the batch load.
-143-
Programmer's Guide
import java.sql.*;
import java.util.Arrays;
import java.util.Properties;
public class BatchInsertErrorHandlingExample {
public static void main(String[] args) {
try {
Class.forName("com.vertica.jdbc.Driver");
} catch (ClassNotFoundException e) {
System.err.println("Could not find the JDBC driver class.");
e.printStackTrace();
return;
}
Properties myProp = new Properties();
myProp.put("user", "ExampleUser");
myProp.put("password", "password123");
Connection conn;
// establish connection and make a table for the data.
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/ExampleDB",
myProp);
// Disable auto commit
conn.setAutoCommit(false);
// Create a statement
Statement stmt = conn.createStatement();
// Drop table and recreate.
stmt.execute("DROP TABLE IF EXISTS customers CASCADE");
stmt.execute("CREATE TABLE customers (CustID int, Last_Name"
+ " char(50), First_Name char(50),Email char(50 ), "
+ "Phone_Number char(12))");
// Some dummy data to insert. The one row won't insert because
// the phone number is too long for the phone column.
String[] firstNames = new String[] { "Anna", "Bill", "Cindy",
"Don", "Eric" };
String[] lastNames = new String[] { "Allen", "Brown", "Chu",
"Dodd", "Estavez" };
String[] emails = new String[] { "aang@example.com",
"b.brown@example.com", "cindy@example.com",
"d.d@example.com", "e.estavez@example.com" };
String[] phoneNumbers = new String[] { "123-456-789",
"555-444-3333", "555-867-53093453453",
"555-555-1212", "781-555-0000" };
// Create the prepared statement
PreparedStatement pstmt = conn.prepareStatement(
"INSERT INTO customers (CustID, Last_Name, " +
-144-
HP Vertica C lient Libraries
"First_Name, Email, Phone_Number)" +
" VALUES(?,?,?,?,?)");
// Add rows to a batch in a loop. Each iteration adds a
// new row.
for (int i = 0; i < firstNames.length; i++) {
// Add each parameter to the row.
pstmt.setInt(1, i + 1);
pstmt.setString(2, lastNames[i]);
pstmt.setString(3, firstNames[i]);
pstmt.setString(4, emails[i]);
pstmt.setString(5, phoneNumbers[i]);
// Add row to the batch.
pstmt.addBatch();
}
// Integer array to hold the results of inserting
// the batch. Will contain an entry for each row,
// indicating success or failure.
int[] batchResults = null;
try {
// Batch is ready, execute it to insert the data
batchResults = pstmt.executeBatch();
} catch (BatchUpdateException e) {
// We expect an exception here, since one of the
// inserted phone numbers is too wide for its column. All of the
// rest of the rows will be inserted.
System.out.println("Error message: " + e.getMessage());
// Batch results isn't set due to exception, but you
// can get it from the exception object.
//
// In your own code, you shouldn't assume that a batch
// exception occurred, since exceptions can be thrown
// by the server for a variety of reasons.
batchResults = e.getUpdateCounts();
}
// You should also be prepared to catch SQLExceptions in your own
// application code, to handle dropped connections and other general
// problems.
// Commit the transaction
conn.commit();
// Print the array holding the results of the batch insertions.
System.out.println("Return value from inserting batch: "
+ Arrays.toString(batchResults));
-145-
Programmer's Guide
// Print the resulting table.
ResultSet rs = null;
rs = stmt.executeQuery("SELECT CustID, First_Name, "
+ "Last_Name FROM customers ORDER BY CustID");
while (rs.next()) {
System.out.println(rs.getInt(1) + " - "
+ rs.getString(2).trim() + " "
+ rs.getString(3).trim());
}
// Cleanup
conn.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
Running the above example produces the following output on the console:
Error message: [Vertica][VJDBC](100172) One or more rows were rejected by the
server.
Return value from inserting batch: [1, 1, -3, 1, 1]
1 - Anna Allen
2 - Bill Brown
4 - Don Dodd
5 - Eric Estavez
Notice that the third row failed to insert because its phone number is too long for the
Phone_Number column. All of the rest of the rows in the batch (including those after the error)
were correctly inserted.
Note: It is more efficient for you to ensure that the data you are inserting is the correct data type
and width for the table column you are inserting it into than to handle exceptions after the fact.
Rolling Back Batch Loads on the Server
Batch loads always insert all of their data, even if one or more rows is rejected. Only the rows that
caused errors in a batch are not loaded. When the database connection's AutoCommit property is
true, batches automatically commit their transactions when they complete, so once the batch
finishes loading, the data is committed.
In some cases, you may want all of the data in a batch to be successfully inserted—none of the
data should be committed if an error occurs. The best way to accomplish this is to turn off the
database connection's AutoCommit property to prevent batches from automatically committing
themselves. Then, if a batch encounters an error, you can roll back the transaction after catching
the BatchUpdateException caused by the insertion error.
The following example demonstrates performing a rollback if any error occurs when loading a
batch.
-146-
HP Vertica C lient Libraries
import java.sql.*;
import java.util.Arrays;
import java.util.Properties;
public class RollbackBatchOnError {
public static void main(String[] args) {
try {
Class.forName("com.vertica.jdbc.Driver");
} catch (ClassNotFoundException e) {
System.err.println("Could not find the JDBC driver class.");
e.printStackTrace();
return;
}
Properties myProp = new Properties();
myProp.put("user", "ExampleUser");
myProp.put("password", "password123");
Connection conn;
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/ExampleDB",
myProp);
// Disable auto-commit. This will allow you to roll back a
// a batch load if there is an error.
conn.setAutoCommit(false);
// establish connection and make a table for the data.
Statement stmt = conn.createStatement();
// Drop table and recreate.
stmt.execute("DROP TABLE IF EXISTS customers CASCADE");
stmt.execute("CREATE TABLE customers (CustID int, Last_Name"
+ " char(50), First_Name char(50),Email char(50), "
+ "Phone_Number char(12))");
// Some dummy data to insert. The one row won't insert because
// the phone number is too long for the phone column.
String[] firstNames = new String[] { "Anna", "Bill", "Cindy",
"Don", "Eric" };
String[] lastNames = new String[] { "Allen", "Brown", "Chu",
"Dodd", "Estavez" };
String[] emails = new String[] { "aang@example.com",
"b.brown@example.com", "cindy@example.com",
"d.d@example.com", "e.estavez@example.com" };
String[] phoneNumbers = new String[] { "123-456-789",
"555-444-3333", "555-867-53094535", "555-555-1212",
"781-555-0000" };
// Create the prepared statement
PreparedStatement pstmt = conn.prepareStatement(
"INSERT INTO customers (CustID, Last_Name, " +
"First_Name, Email, Phone_Number) "+
"VALUES(?,?,?,?,?)");
-147-
Programmer's Guide
// Add rows to a batch in a loop. Each iteration adds a
// new row.
for (int i = 0; i < firstNames.length; i++) {
// Add each parameter to the row.
pstmt.setInt(1, i + 1);
pstmt.setString(2, lastNames[i]);
pstmt.setString(3, firstNames[i]);
pstmt.setString(4, emails[i]);
pstmt.setString(5, phoneNumbers[i]);
// Add row to the batch.
pstmt.addBatch();
}
// Integer array to hold the results of inserting
// the batch. Will contain an entry for each row,
// indicating success or failure.
int[] batchResults = null;
try {
// Batch is ready, execute it to insert the data
batchResults = pstmt.executeBatch();
// If we reach here, we inserted the batch without errors.
// Commit it.
System.out.println("Batch insert successful. Committing.");
conn.commit();
} catch (BatchUpdateException e) {
System.out.println("Error message: " + e.getMessage());
// Batch results isn't set due to exception, but you
// can get it from the exception object.
batchResults = e.getUpdateCounts();
// Roll back the batch transaction.
System.out.println("Rolling back batch insertion");
conn.rollback();
}
catch (SQLException e) {
// General SQL errors, such as connection issues, throw
// SQLExceptions. Your application should do something more
// than just print a stack trace,
e.printStackTrace();
}
System.out.println("Return value from inserting batch: "
+ Arrays.toString(batchResults));
System.out.println("Customers table contains:");
// Print the resulting table.
ResultSet rs = null;
rs = stmt.executeQuery("SELECT CustID, First_Name, "
+ "Last_Name FROM customers ORDER BY CustID");
while (rs.next()) {
System.out.println(rs.getInt(1) + " - "
-148-
HP Vertica C lient Libraries
+ rs.getString(2).trim() + " "
+ rs.getString(3).trim());
}
// Cleanup
conn.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
Running the above example prints the following on the system console:
Error message: [Vertica][VJDBC](100172) One or more rows were rejected by the
server.
Rolling back batch insertion
Return value from inserting batch: [1, 1, -3, 1, 1]
Customers table contains:
The return values indicate whether each rows was successfully inserted. The value 1 means the
row inserted without any issues, and a -3 indicates the row failed to insert.
The customers table is empty since the batch insert was rolled back due to the error caused by the
third column.
Bulk Loading Using the COPY Statement
One of the fastest ways to load large amounts of data into HP Vertica at once (bulk loading) is to
use the COPY statement. This statement loads data from a file stored on an HP Vertica host (or in
a data stream) into a table in the database. You can pass the COPY statement parameters that
define the format of the data in the file, how the data is to be transformed as it is loaded, how to
handle errors, and how the data should be loaded. See the COPY documentation in the SQL
Reference Manual for details.
One parameter that is particularly important is the DIRECT option, which tells COPY to load the
data directly into ROS rather than going through the WOS. You should use this option when you
are loading large files (over 100MB) into the database. Without this option, your load may fill the
WOS and overflow into ROS, requiring the Tuple Mover to perform a Moveout on the data in the
WOS. It is more efficient to directly load into ROS and avoid forcing a moveout.
Only a superuser can use the COPY statement to copy a file stored on a host, so you must
connect to the database using a superuser account. If you want to have a non-superuser user
bulk-load data, you can use COPY to load from a stream on the host (such as STDIN) rather than
a file or stream data from the client (see Streaming Data via JDBC (page 152)). You can also
perform a standard batch insert using a prepared statement (page 139), which uses the COPY
statement in the background to load the data.
-149-
Programmer's Guide
The following example demonstrates using the COPY statement through the JDBC to load a file
name customers.txt into a new database table. This file must be stored on the database host
to which your application connects (in this example, a host named VerticaHost). Since the
customers.txt file used in the example is very large, this example uses the DIRECT option to
bypass WOS and load directly into ROS.
-150-
HP Vertica C lient Libraries
import java.sql.*;
import java.util.Properties;
import com.vertica.jdbc.*;
public class COPYFromFile {
public static void main(String[] args) {
try {
Class.forName("com.vertica.jdbc.Driver");
} catch (ClassNotFoundException e) {
System.err.println("Could not find the JDBC driver class.");
e.printStackTrace();
return;
}
Properties myProp = new Properties();
myProp.put("user", "ExampleAdmin"); // Must be superuser
myProp.put("password", "password123");
Connection conn;
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/ExampleDB",myProp);
// Disable AutoCommit
conn.setAutoCommit(false);
Statement stmt = conn.createStatement();
// Create a table to hold data.
stmt.execute("DROP TABLE IF EXISTS customers;");
stmt.execute("CREATE TABLE IF NOT EXISTS customers (Last_Name
char(50) "
+ "NOT NULL, First_Name char(50),Email char(50), "
+ "Phone_Number char(15))");
// Use the COPY command to load data. Load directly into ROS, since
// this load could be over 100MB. Use ENFORCELENGTH to reject
// strings too wide for their columns.
boolean result = stmt.execute("COPY customers FROM "
+ " '/data/customers.txt' DIRECT ENFORCELENGTH");
// Determine if execution returned a count value, or a full result
// set.
if (result) {
System.out.println("Got result set");
} else {
// Count will usually return the count of rows inserted.
System.out.println("Got count");
int rowCount = stmt.getUpdateCount();
System.out.println("Number of accepted rows = " + rowCount);
}
// Commit the data load
conn.commit();
} catch (SQLException e) {
System.out.print("Error: ");
-151-
Programmer's Guide
System.out.println(e.toString());
}
}
}
The example prints the following out to the system console when run (assuming that the
customers.txt file contained two million valid rows):
Number of accepted rows = 2000000
Streaming Data via JDBC
There are two options to stream data from a file on the client to your HP Vertica database:
 Use the VerticaCopyStream class to stream data in an object-oriented manner
 Execute a COPY LOCAL SQL statement to stream the data
The topics in this section explain how to use these options.
Using VerticaCopyStream
The VerticaCopyStream class lets you stream data from the client system to an HP Vertica
database. It lets you use the SQL COPY statement directly without having to copy the data to a
host in the database cluster first. Using the COPY command to load data from the host requires
superuser privileges to be able to access the host's filesystem. The COPY statement used to load
data from a stream does not require superuser privileges so your client can connect using any
user account that has INSERT privileges on the table that will receive the data.
To copy streams into the database:
1 Disable the database connections AutoCommit connection parameter.
2 Instantiate a VerticaCopyStreamObject, passing it at least the database connection
objects and a string containing a COPY statement to load the data. This statement must copy
data from the STDIN into your table. You can use whatever parameters are appropriate for
your data load.
Note: The VerticaCopyStreamObject constructor optionally takes a single InputStream
object, or a List of InputStream objects. This option lets you pre-populate the list of
streams to be copied into the database.
3 Call VerticaCopyStreamObject.start() to start the COPY statement and begin
streaming the data in any streams you have already added to the
VerticaCopyStreamObject.
4 Call VerticaCopyStreamObject.addStream() to add additional streams to the list of
streams to send to the database. You can then call
VerticaCopyStreamObject.execute() to stream them to the server.
-152-
HP Vertica C lient Libraries
5 Optionally, call VerticaCopyStreamObject.getRejects() to get a list of rejected rows
from the last .execute() call. The list of rejects is reset by each call to .execute() or
.finish().
Note: If you used either the REJECTED DATA or EXCEPTIONS options in the COPY
statement you passed to VerticaCopyStreamObject the object in step 2,
.getRejects() returns an empty list. You can only use one method of tracking the rejected
rows at a time.
6 When you are finished adding streams, call VerticaCopyStreamObject.finish() to
send any remaining streams to the database and close the COPY statement.
7 Call Connection.commit() to commit the loaded data.
Getting Rejected Rows
The VerticaCopyStreamObject.getRejects() method returns a List containing the row
numbers of rows that were rejected after the previous .execute() method call. Each call to
.execute() clears the list of rejected rows, so you need to call .getRejects() after each call
to .execute(). Since .start() and .finish() also call .execute() to send any pending
streams to the server, you should also call .getRejects() after these methods as well.
The following example demonstrates loading the content of five text files stored on the client
system into a table.
-153-
Programmer's Guide
import
import
import
import
import
import
import
import
java.io.File;
java.io.FileInputStream;
java.sql.Connection;
java.sql.DriverManager;
java.sql.Statement;
java.util.Iterator;
java.util.List;
java.util.Properties;
import com.vertica.jdbc.VerticaConnection;
import com.vertica.jdbc.VerticaCopyStream;
public class CopyMultipleStreamsExample {
public static void main(String[] args) {
try {
Class.forName("com.vertica.jdbc.Driver");
} catch (ClassNotFoundException e) {
System.err.println("Could not find the JDBC driver class.");
e.printStackTrace();
return;
}
// Set up the properties of the connection
Properties myProp = new Properties();
myProp.put("user", "ExampleUser"); // Must be superuser
myProp.put("password", "password123");
// When performing bulk loads, you should always disable the
// connection's AutoCommit property to ensure the loads happen as
// efficiently as possible by reusing the same COPY command and
// transaction.
myProp.put("AutoCommit", "false");
Connection conn;
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/ExampleDB",
myProp);
Statement stmt = conn.createStatement();
// Create a table to receive the data
stmt.execute("DROP TABLE IF EXISTS customers");
stmt.execute("CREATE TABLE customers (Last_Name char(50), "
+ "First_Name char(50),Email char(50), "
+ "Phone_Number char(15))");
// Prepare the query to insert from a stream. This query must use
// the COPY statement to load data from STDIN. Unlike copying from
// a file on the host, you do not need superuser privileges to
// copy a stream. All your user account needs is INSERT privileges
// on the target table.
String copyQuery = "COPY customers FROM STDIN "
+ "DELIMITER '|' DIRECT ENFORCELENGTH";
-154-
HP Vertica C lient Libraries
// Create an instance of the stream class. Pass in the
// connection and the query string.
VerticaCopyStream stream = new VerticaCopyStream(
(VerticaConnection) conn, copyQuery);
// Keep running count of the number of rejects
int totalRejects = 0;
// start() starts the stream process, and opens the COPY command.
stream.start();
//
//
//
//
If you added streams to VerticaCopyStream before calling start(),
You should check for rejects here (see below). The start() method
calls execute() to send any pre-queued streams to the server
once the COPY statement has been created.
// Simple for loop to load 5 text files named customers-1.txt to
// customers-5.txt
for (int loadNum = 1; loadNum <= 5; loadNum++) {
// Prepare the input file stream. Read from a local file.
String filename = "C:\\Data\\customers-" + loadNum + ".txt";
System.out.println("\n\nLoading file: " + filename);
File inputFile = new File(filename);
FileInputStream inputStream = new FileInputStream(inputFile);
// Add stream to the VerticaCopyStream
stream.addStream(inputStream);
// call execute() to load the newly added stream. You could
// add many streams and call execute once to load them all.
// Which method you choose depends mainly on whether you want
// the ability to check the number of rejections as the load
// progresses so you can stop if the number of rejects gets too
// high. Also, high numbers of InputStreams could create a
// resource issue on your client system.
stream.execute();
// Show any rejects from this execution of the stream load
// getRejects() returns a List containing the
// row numbers of rejected rows.
List<Long> rejects = stream.getRejects();
// The size of the list gives you the number of rejected rows.
int numRejects = rejects.size();
-155-
Programmer's Guide
totalRejects += numRejects;
System.out.println("Number of rows rejected in load #"
+ loadNum + ": " + numRejects);
// List all of the rows that were rejected.
Iterator<Long> rejit = rejects.iterator();
long linecount = 0;
while (rejit.hasNext()) {
System.out.print("Rejected row #" + ++linecount);
System.out.println(" is row " + rejit.next());
}
}
// Finish closes the COPY command. It returns the number of
// rows inserted.
long results = stream.finish();
System.out.println("Finish returned " + results);
// If you added any streams that hadn't been executed(),
// you should also check for rejects here, since finish()
// calls execute() to
// You can also get the number of rows inserted using
// getRowCount().
System.out.println("Number of rows accepted: "
+ stream.getRowCount());
System.out.println("Total number of rows rejected: " + totalRejects);
// Commit the loaded data
conn.commit();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Running the above example on some sample data results in the following output:
Loading file: C:\Data\customers-1.txt
Number of rows rejected in load #1: 3
Rejected row #1 is row 3
Rejected row #2 is row 7
Rejected row #3 is row 51
Loading file: C:\Data\customers-2.txt
Number of rows rejected in load #2: 5
Rejected row #1 is row 4143
-156-
HP Vertica C lient Libraries
Rejected
Rejected
Rejected
Rejected
row
row
row
row
#2
#3
#4
#5
is
is
is
is
row
row
row
row
6132
9998
10000
10050
Loading file: C:\Data\customers-3.txt
Number of rows rejected in load #3: 9
Rejected row #1 is row 14142
Rejected row #2 is row 16131
Rejected row #3 is row 19999
Rejected row #4 is row 20001
Rejected row #5 is row 20005
Rejected row #6 is row 20049
Rejected row #7 is row 20056
Rejected row #8 is row 20144
Rejected row #9 is row 20236
Loading file: C:\Data\customers-4.txt
Number of rows rejected in load #4: 8
Rejected row #1 is row 23774
Rejected row #2 is row 24141
Rejected row #3 is row 25906
Rejected row #4 is row 26130
Rejected row #5 is row 27317
Rejected row #6 is row 28121
Rejected row #7 is row 29321
Rejected row #8 is row 29998
Loading file: C:\Data\customers-5.txt
Number of rows rejected in load #5: 1
Rejected row #1 is row 39997
Finish returned 39995
Number of rows accepted: 39995
Total number of rows rejected: 26
Note: The above example shows a simple load process that targets one node in the HP Vertica
cluster. It is more efficient to simultaneously load multiple streams to multiple database nodes.
Doing so greatly improves performance, because it spreads the processing for the load across
the cluster.
Using COPY LOCAL with JDBC
To use COPY LOCAL with JDBC, just execute a COPY LOCAL statement with the path to the
source file on the client system. This method is simpler than using the VerticaCopyStream
class. However, you may prefer using VerticaCopyStream if you have many files to copy to the
database or if your data comes from a source other than a file (streamed over a network
connection, for example).
-157-
Programmer's Guide
The following example code demonstrates using COPY LOCAL to copy a file from the client to the
database. It is the same as the code shown in Bulk Loading Using the COPY Statement (page
149), except for the use of the LOCAL option in the COPY statement, and the path to the data file
is on the client system, rather than on the server.
Note: The exceptions/rejections files are created on the client machine when the exceptions
and rejected data modifiers are specified on the copy local command. Specify a local path and
filename for these modifiers when executing a COPY LOCAL query from the dri ver.
-158-
HP Vertica C lient Libraries
import java.sql.*;
import java.util.Properties;
public class COPYLocal {
public static void main(String[] args) {
try {
Class.forName("com.vertica.jdbc.Driver");
} catch (ClassNotFoundException e) {
System.err.println("Could not find the JDBC driver class.");
e.printStackTrace();
return;
}
Properties myProp = new Properties();
myProp.put("user", "ExampleUser"); // Do not need to superuser
myProp.put("password", "password123");
Connection conn;
try {
conn = DriverManager.getConnection(
"jdbc:vertica://VerticaHost:5433/ExampleDB",myProp);
// Disable AutoCommit
conn.setAutoCommit(false);
Statement stmt = conn.createStatement();
// Create a table to hold data.
stmt.execute("DROP TABLE IF EXISTS customers;");
stmt.execute("CREATE TABLE IF NOT EXISTS customers (Last_Name
char(50) "
+ "NOT NULL, First_Name char(50),Email char(50), "
+ "Phone_Number char(15))");
// Use the COPY command to load data. Load directly into ROS, since
// this load could be over 100MB. Use ENFORCELENGTH to reject
// strings too wide for their columns.
boolean result = stmt.execute("COPY customers FROM LOCAL "
+ " 'C:\\Data\\customers.txt' DIRECT
ENFORCELENGTH");
// Determine if execution returned a count value, or a full result
// set.
if (result) {
System.out.println("Got result set");
} else {
// Count will usually return the count of rows inserted.
System.out.println("Got count");
int rowCount = stmt.getUpdateCount();
System.out.println("Number of accepted rows = " + rowCount);
}
conn.close();
} catch (SQLException e) {
System.out.print("Error: ");
System.out.println(e.toString());
}
-159-
Programmer's Guide
}
}
The result of running this code appears below. In this case, the customers.txt file contains 10000
rows, seven of which get rejected because they contain data too wide to fit into their database
columns.
Got count
Number of accepted rows = 9993
-160-
Programming ADO.NET Applications
The HP Vertica driver for the Microsoft Connectivity Pack and ADO.NET allows applications
written in C# to read data from, update, and load data into HP Vertica databases. It provides a data
adapter (HP Vertica Data Adapter) that facilitates reading data from a database into a data set,
and then writing changed data from the data set back to the database. It also provides a data
reader (HP VerticaDataReader) for reading data. The driver requires the .NET framework version
3.5+.
For more information about ADO.NET, see:


Overview of ADO.NET (http://msdn.microsoft.com/en-us/library/h43ks021(vs.85).aspx)
ADO.NET .NET Framework Developer's Guide
Note: All of the examples provided in this section are in C#.
Updating ADO.NET Client Code from Previous Driver Versions
Starting in release 5.1.1, the HP Vertica client drivers have been updated to improve standards
compliance, performance, and reliability. As a result, some HP Vertica-specific features and past
incompatibilities have been eliminated. You must update any client code written for the prior
versions of the ADO.NET driver to work with the version 5.1.1 driver and beyond.
Auto Commit Change
 All queries are now Auto Committed. The only exception is that queries run using a
Transaction are not committed until the Commit(); method is called.
Performance Improvements
 Prepared INSERT statements now run significantly faster than in previous driver versions. For
the best performance, prepared statements should be executed as part of a transaction.
Namespace Change
 The namespace has changed from vertica to Vertica.Data.VerticaClient
Connection Properties
 DSN is no longer a valid connection string keyword. You cannot connect to HP Vertica using
ADO.net with a DSN.
 The RowBufferSize connection property has been renamed to ResultBufferSize.
 Getters on the VerticaConnection to get various connection string options
(CacheDirectoryPooling, MinPoolSize, MaxPoolSize, SyncNotification, Timeout, Enlist,
UseExtendedTypes, Password, Pooling, MinPoolSize, MaxPoolSize) have been removed.
 There is no longer a locale connection string keyword and you cannot set the locale through
the connection string. To change the locale, run the query "set locale to..."
 The connection property to enable or disable auto commit has been removed. All queries
outside of transactions are auto-committed.
-161-
Programmer's Guide
Result Buffering
 The driver now buffers all results, and always uses streaming. Because of this, the following
functionality has changed:
 VerticaCommandBehavior enum has been removed. This enum extended the ADO.NET
CommandBehavior enum to add support for buffering results. Results are now buffered in
HP Vertica 5.1.x.
 The VerticaCommand.ExecuteReader(CommandBehavior, bool) argument has been
removed.
 CacheDirectory or PreloadReader connection string options have been removed.
Logging Changes
 Log properties are no longer configured on the connection string. Log properties are now
configured through the VerticaLogProperties class.
Data Type Changes
The following data types have changed:
Old Datatype Name
New Datatype Name
verticaType.Integer
VerticaType.BigInt
verticaType.Bigint
VerticaType.BigInt
verticaType.Timestamp
VerticaType.DateTime
verticaType.Interval
Changed to specific type of interval, for
example:
 VerticaType. IntervalDay
 VerticaType. IntervalDayToHour
 VerticaType. IntervalDayToMinute
 etc.
verticaType.Real
VerticaType.Double
verticaType.Text
VerticaType.VarChar
verticaType.Smallint
VerticaType.BigInt
verticaType.Varbinary
VerticaType.VarBinary
Multiple Commands Now Supported
Multiple commands in a single statement are now supported, provided that parameters are not
used in any of the commands in the statement. The exception is COPY commands. You cannot
issue multiple COPY commands in the same statement.
-162-
HP Vertica C lient Libraries
Setting the Locale for ADO.NET Sessions




ADO.NET applications use a UTF-16 character set encoding and are responsible for
converting any non-UTF-16 encoded data to UTF-16. The same cautions as for ODBC apply if
this encoding is violated.
The ADO.NET driver converts UTF-16 data to UTF-8 when passing to the HP Vertica server
and converts data sent by HP Vertica server from UTF-8 to UTF-16
ADO.NET applications should set the correct server session locale by executing the SET
LOCALE TO command in order to get expected collation and string functions behavior on the
server.
If there is no default session locale at the database level, ADO.NET applications need to set
the correct server session locale by executing the SET LOCALE TO command in order to get
expected collation and string functions behavior on the server. See the SET LOCALE
command in the SQL Reference Manual
Connecting to the Database
This section describes:




Using SSL: Installing SSL Certificates on Windows (page 163)
Opening and Closing the Database Connection (ADO.NET) (page 164)
ADO.NET Connection Properties (page 166)
Configuring Log Properties
Using SSL: Installing Certificates on Windows
You can optionally secure communication between your ADO.NET application and HP Vertica
using SSL. The HP Vertica ADO.NET driver uses the default Windows key store when looking for
SSL certificates. This is the same key store that Internet Explorer uses.
Before you can use SSL on the client side, you must implement SSL on the server. See
Implementing SSL in the Administrator's Guide, perform those steps, then return to this topic to
install the SSL certificate on Windows.
To use SSL for ADO.NET connections to HP Vertica:


Import the server and client certificates into the Windows Key Store.
If required by your certificates, import the public certificate of your Certifying Authority.
Import the server and client certificates into the Windows key store:
1 Copy the server.crt file you generated when you enabled SSL on the server to your Windows
Machine.
2 Double-click the certificate.
3 Let Windows determine the key type, and click Install.
-163-
Programmer's Guide
Import the public certificate of your CA:
You must establish a chain of trust for the certificates. You may need to import the public
certificate for your Certifying Authority (CA) (especially if it is a self-signed certificate).
1
2
3
4
using the same certificate as above, double-click the certificate.
Select Place all certificates in the following store.
Click Browse, select Trusted Root Certification Authorities and click Next.
Click Install.
Enable SSL in your ADO.NET Applications
In your connection string, be sure to enable SSL by setting the SSL property in
VerticaConnectionStringBuilder to true, for example:
//configure connection properties
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.17.10";
builder.Database = "VMart";
builder.User = "dbadmin";
builder.SSL = true;
//open the connection
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
Opening and Closing the Database Connection (ADO.NET)
Before you can access data in HP Vertica through ADO.NET, you must create a connection to the
database using the VerticaConnection class which is an implementation of
System.Data.DbConnection. The VerticaConnection class takes a single argument that contains
the connection properties as a string. You can manually create a string of property keywords to
use as the argument, or you can use the VerticaConnectionStringBuilder class to build a
connection string for you.
This topic details the following:



Manually building a connection string and connecting to HP Vertica
Using VerticaConnectionStringBuilder to create the connection string and connecting to HP
Vertica
Closing the connection
To manually create a connection string:
See ADO.NET Connection Properties (page 166) for a list of available properties to use in your
connection string. At a minimum, you need to specify the Host, Database, and User.
1 For each property, provide a value and append the properties and values one after the other,
separated by a semicolon. Assign this string to a variable. For example:
-164-
HP Vertica Client Libraries
String connectString = "DATABASE=VMart;HOST=node01;USER=dbadmin";
2 Build an HP Vertica connection object that specifies your connection string.
VerticaConnection _conn = new VerticaConnection(connectString)
3 Open the connection.
_conn.Open();
4 Create a command object and associate it with a connection. All VerticaCommand objects
must be associated with a connection.
VerticaCommand command = _conn.CreateCommand();
To use the VerticaConnectionStringBuilder class to create a connection string and open a
connection:
1 Create a new object of the VerticaConnectionStringBuilder class.
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
2 Update your VerticaConnectionStringBuilder object with property values. See ADO.NET
Connection Properties (page 166) for a list of available properties to use in your connection
string. At a minimum, you need to specify the Host, Database, and User.
builder.Host = "node01";
builder.Database = "VMart";
builder.User = "dbadmin";
3 Build an HP Vertica connection object that specifies your connection
VerticaConnectionStringBuilder object as a string.
VerticaConnection _conn = new VerticaConnection(builder.ToString());
4 Open the connection.
_conn.Open();
5 Create a command object and associate it with a connection. All VerticaCommand objects
must be associated with a connection.
VerticaCommand command = _conn.CreateCommand;
Note: If your database is not in compliance with your HP Vertica license, the call to
VerticaConnection.open() returns a warning message to the console and the log. See
Managing Your License Key in the Administrator's Guide for more information.
To close the connection:
When you're finished with the database, close the connection. Failure to close the connection can
deteriorate the performance and scalability of your application. It can also prevent other clients
from obtaining locks.
_conn.Close();
-165-
Programmer's Guide
Example Usage:
using
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
System.Data;
Vertica.Data.VerticaClient;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
//Perform some operations
_conn.Close();
}
}
}
ADO.NET Connection Properties
You use connection properties to configure the connection between your ADO.NET client
application and your HP Vertica database. The properties provide the basic information about the
connections, such as the server name and port number to use to connect to your database.
You can set a connection property in two ways:


Including the property name and value as part of the the connection string you pass to a
VerticaConnection.
Setting the properties in a VerticaConnectionStringBuilder object, then passing the object as a
string to a VerticaConnection.
-166-
HP Vertica C lient Libraries
Property
Description
Default Value
Database
Name of the HP Vertica database to connect
to. For example, if you installed the example
VMart database, the database is "VMart"
string.Empty
User
Name of the user to log into HP Vertica/
string.Empty
Port
Port on which HP Vertica is running.
5433
Host
The hostname or IP address of the server on
which HP Vertica is running.
string.Empty
Password
The password associated with the user
connecting to the server.
string.Empty
ConnSettings
SQL commands to run upon connection. Uses string.Empty
%3B for semicolons.
IsolationLevel
Sets the transaction isolation level for HP
System.Data.IsolationLevel.Unspecifi
Vertica. See Transactions for a description of ed
the different transaction levels. This value is
either Serializable, ReadCommitted, or
Unspecified. See Setting the Transaction
Isolation Level (page 179) for an example of
setting the isolation level usi ng this keyword.
Note: By default, this value is set to
IsolationLevel.Unspecified, whic h means the
connection uses the server's default
transaction isolation level. HP Vertica's default
isolation level is
IsolationLevel.ReadCommitted.
Label
A string to identify the session on the server.
DirectBatchInsert
A boolean value, whether to Bulk ins ert to ROS false
(true) or WOS (false).
ResultBufferSize
The size of the buffer to use when streaming
results.
8192
ConnectionTimeout
Number seconds to wait for a connection. A
value of 0 means no timeout.
0
ReadOnly
Boolean, if true, throw an exception on writ e
attempts.
false
Pooling
A boolean value, whether to enable connection false
pooling. Connection pooling is useful for server
applications because it allows the server to
reuse connections. This saves resources and
enhances the performance of executing
commands on the database. It also reduces
the amount of time a user must wait to
establish a connection to the database
-167-
string
Programmer's Guide
MinPoolSize
An integer that defines the minimum number of 1
connections to pool. Cannot be greater than
the number of connections that the server is
configured to allow or an exception is thrown
when that threshold is exceeded. The default
number of connections allowed is 55.
MaxPoolSize
An integer that defines the maximum number 20
of connections to pool. Cannot be greater than
the number of connections that the server is
configured to allow or an exception is thrown
when that threshold is exceeded.
LoadBalanceTimeout
The amount of time, in seconds, to
0 (no timeout)
timeout/remove pooled connections if the
connections are unused. Set to 0 to disable
this parameter and have no timeouts occur.
If you are using a cluster environment to
load-balance the work, then pool is restricted
to the servers in the cluster when the pool was
created. If additional servers are added to the
cluster, and the pool is not removed, then the
new servers will never be added to the
connection pool unless LoadBalanceTimeout
is set and exceeded or
VerticaConnection.ClearAllPools()
is manually called from an application. If you
are using loadbalancing then set this to a value
that takes into account when new servers are
added to the cluster. However, do not set it so
low that pools are frequently removed and
rebuilt, as this defeats the purpose of using
pooling in the first place.
SSL
A boolean value, whether to use SSL for the
connection.
false
Configuring Log Properties (ADO.Net)
Log properties for ADO.Net are configured differently than they are other client drivers. On the
other client drivers, log properties can be configured as one of the connection properties. The
ADO.Net driver user the VerticaLogProperties class to configure the properties.
VerticaLogProperties
VerticaLogProperties is a static class that allows you to set and get the log settings for the
ADO.net driver. You can control the log level, log path, and log namespace using this class.
The log is created when the first connection is opened. Once the connection is opened, you
cannot change the log path. It must be set prior to opening the connection. You can change the log
level and log namespace at any time.
-168-
HP Vertica C lient Libraries
Setting Log Properties
Setting the log properties is done using the three methods in the VerticaLogProperties class. The
three methods are:



SetLogPath(String path, bool persist)
SetLogNamespace(String lognamespace, bool persist)
SetLogLevel(VerticaLogLevel loglevel, bool persist)
Each of the methods requires a boolean persist argument. When set to true, the persist argument
causes the setting to be written to the client's Windows Registry, where it is used for all
subsequent connections. If set to false, then the log property only applies to the current session.
SetLogPath
The SetLogPath method takes as its arguments a string containing the path to the log file and the
persist argument. If the path string contains only a directory path, then the log file is created with
the name vdp-driver-MM-dd_HH.mm.ss.log (where MM-dd_HH.mm.ss is the date and time the
log was created). If the path ends in a filename, such as log.txt or log.log, then the log is created
with that filename.
If SetLogPath is called with an empty string for the path argument, then the client executable's
current directory is used as the log path.
If SetLogPath is not called and no registry entry exists for the log path, and you have called any of
the other VerticaLogProperties methods, then the client executable's current directory is used as
the log path.
When the persist argument is set to true, the path specified is copied to the registry verbatim. If no
filename was specified, then the filename is not saved to the registry.
Note: The path must exist on the client system prior to calling this method. The method does
not create directories.
Example Usage:
//set the log path
string path = "C:\\log";
VerticaLogProperties.SetLogPath(path, false);
SetLogNamespace
The SetLogNamespace method takes as its arguments a string containing the namespace to log
and the persist argument. The namespace string to log can be one of the following:





Vertica
Vertica.Data.VerticaClient
Vertica.Data.Internal.IO
Vertica.Data.Internal.DataEngine
Vertica.Data.Internal.Core
Namespaces can be truncated to include multiple child namespaces. For example, you can
specify "Vertica.Data.Internal" to log for all of the Vertica.Data.Internal namespaces.
-169-
Programmer's Guide
If a log namespace is not set, and no value is stored in the registry, then the "Vertica" namespace
is used for logging.
Example Usage:
//set namespace to log
string lognamespace = "Vertica.Data.VerticaClient";
VerticaLogProperties.SetLogNamespace(lognamespace, false);
SetLogLevel
The SetLogLevel method takes as its arguments a VerticaLogLevel type and the persist
argument. The VerticaLogLevel argument can be one of:







VerticaLogLevel.None
VerticaLogLevel.Fatal
VerticaLogLevel.Error
VerticaLogLevel.Warning
VerticaLogLevel.Info
VerticaLogLevel.Debug
VerticaLogLevel.Trace
If a log level is not set, and no value is stored in the registry, then VerticaLogLevel.None is used.
Example Usage:
//set log level
VerticaLogLevel level = VerticaLogLevel.Debug;
VerticaLogProperties.SetLogLevel(level, false);
Getting Log Properties
You can get the log property values using the getters included in the VerticaLogProperties class.
The properties are:
 LogPath
 LogNamespace
 LogLevel
Example Usage:
-170-
HP Vertica C lient Libraries
//get current log settings
string logpath = VerticaLogProperties.LogPath;
VerticaLogLevel loglevel = VerticaLogProperties.LogLevel;
string logns = VerticaLogProperties.LogNamespace;
Console.WriteLine("Current Log Settings:");
Console.WriteLine("Log Path: " + logpath);
Console.WriteLine("Log Level: " + loglevel);
Console.WriteLine("Log Namespace: " + logns);
Setting and Getting Log Properties Example
This complete example shows how to set and get log properties:
using
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
System.Data;
Vertica.Data.VerticaClient;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
//configure connection properties
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
//get current log settings
string logpath = VerticaLogProperties.LogPath;
VerticaLogLevel loglevel = VerticaLogProperties.LogLevel;
string logns = VerticaLogProperties.LogNamespace;
Console.WriteLine("\nOld Log Settings:");
Console.WriteLine("Log Path: " + logpath);
Console.WriteLine("Log Level: " + loglevel);
Console.WriteLine("Log Namespace: " + logns);
//set the log path
-171-
Programmer's Guide
string path = "C:\\log";
VerticaLogProperties.SetLogPath(path, false);
//set log level
VerticaLogLevel level = VerticaLogLevel.Debug;
VerticaLogProperties.SetLogLevel(level, false);
//set namespace to log
string lognamespace = "Vertica";
VerticaLogProperties.SetLogNamespace(lognamespace, false);
//open the connection
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
//get new log settings
logpath = VerticaLogProperties.LogPath;
loglevel = VerticaLogProperties.LogLevel;
logns = VerticaLogProperties.LogNamespace;
Console.WriteLine("\nNew Log Settings:");
Console.WriteLine("Log Path: " + logpath);
Console.WriteLine("Log Level: " + loglevel);
Console.WriteLine("Log Namespace: " + logns);
//close the connection
_conn.Close();
}
}
}
The example produces the following output:
Old
Log
Log
Log
Log Settings:
Path:
Level: None
Namespace:
-172-
HP Vertica C lient Libraries
New
Log
Log
Log
Log Settings:
Path: C:\log
Level: Debug
Namespace: Vertica
Querying the Database using ADO.NET
This section describes how to create queries to do the following:



Inserting data into the database (page 173)
Read data from the database (page 182)
Load data into the database (page 194)
Note: The ExecuteNonQuery() method used to query the database returns an int32 with the
number of rows affected by the query. The maximum size of an int32 type is a constant and is
defined to be 2,147,483,547. If your query returns more results than the int32 max, then
ADO.NET throws an exception because of the overflow of the int32 type. However the query is
still processed by HP Vertica even when the reporting of the return value fails. This is a
limitation in .NET, as ExecuteNonQuery() is part of the standard ADO.NET interface.
Inserting Data (ADO.NET)
Inserting data can done using the VerticaCommand class. VerticaCommand is an implementation
of DbCommand. It allows you to create and send an SQL statement to the database. Use the
CommandText method to assign an SQL statement to the command and then execute the SQL by
calling the ExecuteNonQuery method. The ExecuteNonQuery method is used for executing
statements that do not return result sets.
To insert a single row of data:
1 Create a connection to the database (page 164).
2 Create a command object using the connection.
VerticaCommand command = _conn.CreateCommand();
3 Insert data using an INSERT statement. The following is an example of a simple insert. Note
that is does not contain a COMMIT statement because the HP Vertica ADO.NET driver
operates in autocommit mode.
command.CommandText =
"INSERT into test values(2, 'username', 'email', 'password')";
4 Execute the query. The rowsAdded variable contains the number of rows added by the insert
statement.
Int32 rowsAdded = command.ExecuteNonQuery();
-173-
Programmer's Guide
The ExecuteNonQuery() method returns the number of rows affected by the command for
UPDATE, INSERT, and DELETE statements. For all other types of statements it returns -1. If a
rollback occurs then it is also set to -1.
Example Usage:
using
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
System.Data;
Vertica.Data.VerticaClient;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
VerticaCommand command = _conn.CreateCommand();
command.CommandText =
"INSERT into test values(2, 'username', 'email', 'password')";
Int32 rowsAdded = command.ExecuteNonQuery();
Console.WriteLine( rowsAdded + " rows added!");
_conn.Close();
}
}
}
Using Parameters
You can use parameters to execute similar SQL statements repeatedly and efficiently.
Using Parameters
VerticaParameters are an extension of the System.Data.DbParameter base class in ADO.NET
and are used to set parameters in commands sent to the server. Use Parameters in all queries
(SELECT/INSERT/UPDATE/DELETE) for which the values in the WHERE clause are not static;
that is for all queries that have a known set of columns, but whose filter criteria is set dynamically
by an application or end user. Using parameters in this way greatly decreases the chances of an
SQL injection issue that can occur when simply creating a SQL query from a number of variables.
-174-
HP Vertica C lient Libraries
Parameters require that a valid DbType, VerticaDbType, or System type be assigned to the
parameter. See SQL Data Types and ADO.NET Data Types for a mapping of System, Vertica,
and DbTypes.
To create a parameter placeholder, place either the at sign (@) or a colon (:) character in front of
the parameter name in the actual query string. Do not insert any spaces between the placeholder
indicator (@ or :) and the placeholder.
Note: The @ character is the preferred way to identify parameters. The colon (:) character is
supported for backward compatibility.
For example, the following typical query uses the string 'MA' as a filter.
SELECT customer_name, customer_address, customer_city, customer_state
FROM customer_dimension WHERE customer_state = 'MA';
Instead, the query can be written to use a parameter. In the following example, the string MA is
replaced by the parameter placeholder @STATE.
SELECT customer_name, customer_address, customer_city, customer_state
FROM customer_dimension WHERE customer_state = @STATE;
For example, the ADO.net code for the prior example would be written as:
VerticaCommand command = new VerticaCommand();
command.Text = “SELECT customer_name, customer_address, customer_city,
customer_state
FROM customer_dimension WHERE customer_state = @STATE”, _conn );
command.Parameters.Add(new VerticaParameter( “STATE”, VerticaType.VarChar));
command.Parameters["STATE"] = 'MA';
Note: Although the VerticaCommand class supports a Prepare() method, you do not need to
call the Prepare() method for parameterized statements because HP Vertica automatically
prepares the statement for you.
Creating and Rolling Back Transactions
Creating Transactions
Transactions in HP Vertica are atomic, consistent, isolated, and durable. When you connect to a
database using the Vertica ADO.NET Driver, the connection is in autocommit mode and each
individual query is committed upon execution. You can collect multiple statements into a single
transaction and commit them at the same time by using a transaction. You can also choose to
rollback a transaction before it is committed if your code determines that a transaction should not
commit.
Transactions use the VerticaTransaction object, which is an implementation of DbTransaction.
You must associate the transaction with the VerticaCommand object.
The following code uses an explicit transaction to insert one row each into to tables of the VMart
schema.
-175-
Programmer's Guide
To create a transaction in HP Vertica using the ADO.NET driver:
1 Create a connection to the database (page 164).
2 Create a command object using the connection.
VerticaCommand command = _conn.CreateCommand();
3 Start an explicit transaction, and associate the command with it.
VerticaTransaction txn = _conn.BeginTransaction();
command.Connection = _conn;
command.Transaction = txn;
4 Execute the individual SQL statements to add rows.
command.CommandText =
"insert into product_dimension values( ... )";
command.ExecuteNonQuery();
command.CommandText =
"insert into store_orders_fact values( ... )";
5 Commit the transaction.
txn.Commit();
Rolling Back Transactions
If your code checks for errors, then you can catch the error and rollback the entire transaction.
-176-
HP Vertica C lient Libraries
VerticaTransaction txn = _conn.BeginTransaction();
VerticaCommand command = new VerticaCommand("insert into product_dimension
values( 838929, 5, 'New item 5' )", _conn);
// execute the insert
command.ExecuteNonQuery();
command.CommandText = "insert into product_dimension values( 838929, 6, 'New item
6' )";
// try insert and catch any errors
bool error = false;
try
{
command.ExecuteNonQuery();
}
catch (Exception e)
{
Console.WriteLine(e.Message);
error = true;
}
if (error)
{
txn.Rollback();
Console.WriteLine("Errors. Rolling Back.");
}
else
{
txn.Commit();
Console.WriteLine("Queries Successful. Committing.");
}
Commit and Rollback Example
This example details how you can commit or rollback queries during a transaction.
-177-
Programmer's Guide
using
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
System.Data;
Vertica.Data.VerticaClient;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
bool error = false;
VerticaCommand command = _conn.CreateCommand();
VerticaCommand command2 = _conn.CreateCommand();
VerticaTransaction txn = _conn.BeginTransaction();
command.Connection = _conn;
command.Transaction = txn;
command.CommandText =
"insert into test values(1, 'test', 'test', 'test' )";
Console.WriteLine(command.CommandText);
try
{
command.ExecuteNonQuery();
}
catch (Exception e)
{
Console.WriteLine(e.Message);
error = true;
}
command.CommandText =
"insert into test values(2, 'ear', 'eye', 'nose', 'extra' )";
Console.WriteLine(command.CommandText);
try
{
command.ExecuteNonQuery();
}
catch (Exception e)
{
Console.WriteLine(e.Message);
error = true;
}
if (error)
{
txn.Rollback();
-178-
HP Vertica C lient Libraries
Console.WriteLine("Errors. Rolling Back.");
}
else
{
txn.Commit();
Console.WriteLine("Queries Successful. Committing.");
}
_conn.Close();
}
}
}
The example displays the following output on the console:
insert into test values(1, 'test', 'test', 'test' )
insert into test values(2, 'ear', 'eye', 'nose', 'extra' )
[42601]ERROR: INSERT has more expressions than target columns
Errors. Rolling Back.
See Also
Setting the Transaction Isolation Level (page 179)
Setting the Transaction Isolation Level
You can set the transaction isolation level on a per-connection and per-transaction basis. See
Transaction for an overview of the transaction isolation levels supported in HP Vertica. To set the
default transaction isolation level for a connection, use the IsolationLevel keyword in the
VerticaConnectionStringBuilder string (see Connection String Keywords (page 166) for details).
To set the isolation level for an individual transaction, pass the isolation level to the
VerticaConnection.BeginTransaction() method call to start the transaction.
To set the Isolation Level on a connection-basis:
1 Use the VerticaConnectionStringBuilder to build the connection string.
2 Provide a value for the IsolationLevel builder string. It can take one of two values:
IsolationLevel.ReadCommited (default) or IsolationLevel.Serializeable. For example:
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.100";
builder.Database = "VMart";
builder.User = "dbadmin";
builder.IsolationLevel = System.Data.IsolationLevel.Serializeable
VerticaConnection _conn1 = new
VerticaConnection(builder.ToString());
_conn1.Open();
To set the Isolation Level on a transaction basis:
1 Set the IsolationLevel on the BeginTransaction method, for example
-179-
Programmer's Guide
VerticaTransaction txn =
_conn.BeginTransaction(IsolationLevel.Serializable);
Example usage:
The following example demonstrates:



getting the connection's transaction isolation level.
setting the connection's isolation level using connection property.
setting the transaction isolation level for a new transaction.
-180-
HP Vertica C lient Libraries
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
Vertica.Data.VerticaClient;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
VerticaConnection _conn1 = new
VerticaConnection(builder.ToString());
_conn1.Open();
VerticaTransaction txn1 = _conn1.BeginTransaction();
Console.WriteLine("\n Transaction 1 Transaction Isolation Level: " +
txn1.IsolationLevel.ToString());
txn1.Rollback();
VerticaTransaction txn2 =
_conn1.BeginTransaction(IsolationLevel.Serializable);
Console.WriteLine("\n Transaction 2 Transaction Isolation Level: " +
txn2.IsolationLevel.ToString());
txn2.Rollback();
VerticaTransaction txn3 =
_conn1.BeginTransaction(IsolationLevel.ReadCommitted);
Console.WriteLine("\n Transaction 3 Transaction Isolation Level: " +
txn3.IsolationLevel.ToString());
_conn1.Close();
}
}
}
When run, the example code prints the following to the system console:
Transaction 1 Transaction Isolation Level: ReadCommitted
Transaction 2 Transaction Isolation Level: Serializable
Transaction 3 Transaction Isolation Level: ReadCommitted
-181-
Programmer's Guide
Reading Data (ADO.Net)
To read data from the database use VerticaDataReader, an implementation of DbDataReader.
This implementation is useful for moving large volumes of data quickly off the server where it can
be run through analytic applications.
Note: that the VerticaDataReader.HasRows property returns true if the result generated any
rows. In versions of HP Vertica prior to 5.1, HasRows returned rows if there were more rows to
be read.
Note: A VerticaCommand cannot execute anything else while it has an open
VerticaDataReader associated with it. To execute something else, close the data reader or use
a different VerticaCommand object.
To read data from the database using VerticaDataReader:
1 Create a connection to the database (page 164).
2 Create a command object using the connection.
VerticaCommand command = _conn.CreateCommand();
3 Create a query. This query works with the example VMart database.
command.CommandText =
"SELECT fat_content, product_description " +
"FROM (SELECT DISTINCT fat_content, product_description" +
"
FROM product_dimension " +
"
WHERE department_description " +
"
IN
('Dairy') " +
"
ORDER BY fat_content) AS food " +
"LIMIT 10;";
4 Execute the reader to return the results from the query. The following command calls the
ExecuteReader method of the VerticaCommand object to obtain the VerticaDataReader
object.
VerticaDataReader dr = command.ExecuteReader();
5 Read the data. The data reader returns results in a sequential stream. Therefore, you must
read data from tables row-by-row. The following example uses a while loop to accomplish this:
Console.WriteLine("\n\n Fat Content\t Product Description");
Console.WriteLine("------------\t -------------------");
int rows = 0;
while (dr.Read())
{
Console.WriteLine("
" + dr[0] + "
\t " + dr[1]);
++rows;
-182-
HP Vertica C lient Libraries
}
Console.WriteLine("------------\n (" + rows + " rows)\n");
6 When you're finished, close the data reader to free up resources.
dr.Close();
-183-
Programmer's Guide
Example Usage:
using
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
System.Data;
Vertica.Data.VerticaClient;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
VerticaCommand command = _conn.CreateCommand();
command.CommandText =
"SELECT fat_content, product_description " +
"FROM (SELECT DISTINCT fat_content, product_description" +
"
FROM product_dimension " +
"
WHERE department_description " +
"
IN ('Dairy') " +
"
ORDER BY fat_content) AS food " +
"LIMIT 10;";
VerticaDataReader dr = command.ExecuteReader();
Console.WriteLine("\n\n Fat Content\t Product Description");
Console.WriteLine("------------\t -------------------");
int rows = 0;
while (dr.Read())
{
Console.WriteLine("
" + dr[0] + "
\t " + dr[1]);
++rows;
}
Console.WriteLine("------------\n (" + rows + " rows)\n");
dr.Close();
_conn.Close();
}
}
}
-184-
HP Vertica C lient Libraries
Loading Data Through ADO.Net
This section details the different ways that you can load data in HP Vertica using the ADO.NET
client driver:



Using the HP Vertica Data Adapter "Using the HP Vertica Data Adapter"
Using Batch Inserts and Prepared Statements "Using Batch Inserts and Prepared
Statements"
Streaming Data via ADO.NET "Streaming Data via ADO.NET"
Using the HP Vertica Data Adapter
The HP Vertica data adapter (VerticaDataAdapter) enables a client to exchange data between a
data set and an HP Vertica database. It is an implementation of DbDataAdapter. You can use
VerticaDataAdapter to simply read data, or, for example, read data from a database into a data
set, and then write changed data from the data set back to the database.
Batching Updates
When using the Update() method to update a dataset, you can optionally use the
UpdateBatchSize() method prior to calling Update() to reduce the number of times the client
communicates with the server to perform the update. The default value of UpdateBatchSize is 1. If
you have multiple rows.Add() commands for a data set, then you can change the batch size to an
optimal size to speed up the operations your client must perform to complete the update.
Reading data from HP Vertica using the data adapter:
The following example details how to perform a select query on the VMart schema and load the
result into a DataTable, then output the contents of the DataTable to the console.
-185-
Programmer's Guide
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
Vertica.Data.VerticaClient;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
// Try/Catch any exceptions
try
{
using (_conn)
{
// Create the command
VerticaCommand command = _conn.CreateCommand();
command.CommandText = "select product_key,
product_description " +
"from product_dimension where product_key < 10";
// Associate the command with the connection
command.Connection = _conn;
// Create the DataAdapter
VerticaDataAdapter adapter = new VerticaDataAdapter();
adapter.SelectCommand = command;
// Fill the DataTable
-186-
HP Vertica C lient Libraries
DataTable table = new DataTable();
adapter.Fill(table);
// Display each row and column value.
int i = 1;
foreach (DataRow row in table.Rows)
{
foreach (DataColumn column in table.Columns)
{
Console.Write(row[column] + "\t");
}
Console.WriteLine();
i++;
}
Console.WriteLine(i + " rows returned.");
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
_conn.Close();
}
}
}
Reading Data from HP Vertica into a data set and changing data:
The following example shows how to use a data adapter to read from and insert into a dimension
table of the VMart schema.
-187-
Programmer's Guide
using
using
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
System.Data;
System.Data.SqlClient;
Vertica.Data.VerticaClient
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
// Try/Catch any exceptions
try
{
using (_conn)
{
//Create a data adapter object using the connection
VerticaDataAdapter da = new VerticaDataAdapter();
//Create a select statement that retrieves data from the table
da.SelectCommand = new VerticaCommand("select * from
product_dimension where product_key < 10", _conn);
//Set up the insert command for the data adapter, and bind variables for some
of the columns
-188-
HP Vertica C lient Libraries
da.InsertCommand = new VerticaCommand("insert into
product_dimension values( :key, :version, :desc )", _conn);
da.InsertCommand.Parameters.Add(new
VerticaParameter("key", VerticaType.BigInt));
da.InsertCommand.Parameters.Add(new
VerticaParameter("version", VerticaType.BigInt));
da.InsertCommand.Parameters.Add(new
VerticaParameter("desc", VerticaType.VarChar));
da.InsertCommand.Parameters[0].SourceColumn =
"product_key";
da.InsertCommand.Parameters[1].SourceColumn =
"product_version";
da.InsertCommand.Parameters[2].SourceColumn =
"product_description";
da.TableMappings.Add("product_key", "product_key");
da.TableMappings.Add("product_version",
"product_version");
da.TableMappings.Add("product_description",
"product_description");
//Create and fill a Data set for this dimension table, and get the resulting
DataTable.
DataSet ds = new DataSet();
da.Fill(ds, 0, 0, "product_dimension");
DataTable dt = ds.Tables[0];
//Bind parameters and add two rows to the table.
DataRow dr = dt.NewRow();
dr["product_key"] = 838929;
dr["product_version"] = 5;
dr["product_description"] = "New item 5";
dt.Rows.Add(dr);
dr = dt.NewRow();
dr["product_key"] = 838929;
dr["product_version"] = 6;
dr["product_description"] = "New item 6";
dt.Rows.Add(dr);
//Extract the changes for the added rows.
DataSet ds2 = ds.GetChanges();
//Send the modifications to the server.
-189-
Programmer's Guide
int updateCount = da.Update(ds2, "product_dimension");
//Merge the changes into the original Data set, and mark it up to date.
ds.Merge(ds2);
ds.AcceptChanges();
Console.WriteLine(updateCount + " updates made!");
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
_conn.Close();
}
}
}
Using Batch Inserts and Prepared Statements
You can load data in batches using a prepared statement with parameters. You can also use
transactions to rollback the batch load if any errors are encountered.
If you are loading large batches of data (more than 100MB), then consider using a direct batch
insert.
The following example details using data contained in arrays, parameters, and a transaction to
batch load data.
The test table used in the example is created with the command: create table test (id
INT, username VARCHAR(24), email VARCHAR(64), password VARCHAR(8));
-190-
HP Vertica C lient Libraries
Example Batch Insert Using Parameters and Transactions
using
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
System.Data;
Vertica.Data.VerticaClient;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
// create arrays for column data
int[] ids = {1, 2, 3, 4};
string[] usernames = {"user1", "user2", "user3", "user4"};
string[] emails = { "user1@example.com",
"user2@example.com","user3@example.com","user4@example.com" };
string[] passwords = { "pass1", "pass2", "pass3", "pass4" };
// create counters for accepted and rejected rows
int rows = 0;
int rejRows = 0;
bool error = false;
// create the transaction
VerticaTransaction txn = _conn.BeginTransaction();
// create the parametrized query and assign parameter types
-191-
Programmer's Guide
VerticaCommand command = _conn.CreateCommand();
command.CommandText = "insert into TEST values (@id, @username,
@email, @password)";
command.Parameters.Add(new VerticaParameter("id",
VerticaType.BigInt));
command.Parameters.Add(new VerticaParameter("username",
VerticaType.VarChar));
command.Parameters.Add(new VerticaParameter("email",
VerticaType.VarChar));
command.Parameters.Add(new VerticaParameter("password",
VerticaType.VarChar));
// prepare the statement
command.Prepare();
// loop through the column arrays and insert the data
for (int i = 0; i < ids.Length; i++)
{
command.Parameters["id"].Value = ids[i];
command.Parameters["username"].Value = usernames[i];
command.Parameters["email"].Value = emails[i];
command.Parameters["password"].Value = passwords[i];
try
{
rows += command.ExecuteNonQuery();
}
catch (Exception e)
{
Console.WriteLine("\nInsert failed - \n " + e.Message +
"\n");
++rejRows;
error = true;
}
}
if (error)
{
//roll back if errors
Console.WriteLine("Errors. Rolling Back Transaction.");
Console.WriteLine(rejRows + " rows rejected.");
txn.Rollback();
}
else
{
//commit if no errors
-192-
HP Vertica C lient Libraries
Console.WriteLine("No Errors. Committing Transaction.");
txn.Commit();
Console.WriteLine("Inserted " + rows + " rows. ");
}
_conn.Close();
}
}
}
Loading Batches Directly into ROS
When loading large batches of data (more than 100MB or so), you should load the data directly
into ROS containers. Inserting directly into ROS is more efficient for large loads than AUTO mode,
since it avoids overflowing the WOS and spilling the remainder of the batch to ROS. Otherwise,
the Tuple Mover has to perform a moveout on the data in the WOS, while subsequent data is
directly written into ROS containers. This results in the data from your batch being segmented
across containers.
To directly load batches into ROS, set the DirectBatchInsert connection property to true. See
Opening and Closing the Database Connection (page 164) for details on all of the connection
properties. When the DirectBatchInsert property is set to true, all batch inserts bypass the WOS
and load directly into a ROS container.
-193-
Programmer's Guide
Example usage:
using
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
System.Data;
Vertica.Data.VerticaClient;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
builder.DirectBatchInsert = true;
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
//Perform some operations
_conn.Close();
}
}
}
Streaming Data via ADO.NET
There are two options to stream data from a file on the client to your HP Vertica database through
ADO.NET:
 Use the VerticaCopyStream ADO.NET class to stream data in an object-oriented manner
 Execute a COPY SQL statement to stream the data
The topics in this section explain how to use these options.
Streaming from the Client via VerticaCopyStream
The VerticaCopyStream class lets you stream data from the client system to an HP Vertica
database. It lets you use the SQL COPY statement directly without having to copy the data to a
host in the database cluster first by substituting one or more data stream(s) for STDIN.
Notes:

Use Transactions and disable auto commit on the copy command for better performance.
-194-
HP Vertica C lient Libraries




Disable auto commit using the copy command with the 'no commit' modifier. You must
explicitly disable commits. Enabling transactions does not disable autocommit when using
VerticaCopyStream.
The copy command used with VerticaCopyStream uses copy syntax.
VerticaCopyStream.rejects is zeroed every time execute is called. If you want to capture the
number of rejects, assign the value of VerticaCopyStream.rejects to another variable before
calling execute again.
You can add multiple streams using multiple AddStream() calls.
Example usage:
The following example demonstrates using VerticaCopyStream to copy a file stream into HP
Vertica.
using
using
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
System.Data;
System.IO;
Vertica.Data.VerticaClient;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
//configure connection properties
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
//open the connection
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
try
{
using (_conn)
{
//start a transaction
-195-
Programmer's Guide
VerticaTransaction txn = _conn.BeginTransaction();
//create a table for this example
VerticaCommand command = new VerticaCommand("DROP TABLE IF
EXISTS copy_table", _conn);
command.ExecuteNonQuery();
command.CommandText = "CREATE TABLE copy_table (Last_Name
char(50), "
+ "First_Name char(50),Email char(50), "
+ "Phone_Number char(15))";
command.ExecuteNonQuery();
//create a new filestream from the data file
string filename = "C:/customers.txt";
Console.WriteLine("\n\nLoading File: " + filename);
FileStream inputfile = File.OpenRead(filename);
//define the copy command
string copy = "copy copy_table from stdin record terminator
E'\n' delimiter '|'"
+ " enforcelength "
+ " no commit";
//create a new copy stream instance with the connection and copy statement
VerticaCopyStream vcs = new VerticaCopyStream(_conn, copy);
//start the VerticaCopyStream process
vcs.Start();
//add the file stream
vcs.AddStream(inputfile, false);
//execute the copy
vcs.Execute();
//finish stream and write out the list of inserted and rejected rows
-196-
HP Vertica C lient Libraries
long rowsInserted = vcs.Finish();
IList<long> rowsRejected = vcs.Rejects; // does not work when
rejected or exceptions defined
Console.WriteLine("Number of Rows inserted: " +
rowsInserted);
Console.WriteLine("Number of Rows rejected: " +
rowsRejected.Count);
if (rowsRejected.Count > 0)
{
for (int i = 0; i < rowsRejected.Count; i++)
{
Console.WriteLine("Rejected row #{0} is row {1}", i,
rowsRejected[i]);
}
}
//commit the changes
txn.Commit();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
//close the connection
_conn.Close();
}
}
}
Using Copy with ADO.NET
To use COPY with ADO.NET, just execute a COPY statement and the path to the source file on
the client system. This method is simpler than using the VerticaCopyStream class. However, you
may prefer using VerticaCopyStream if you have many files to copy to the database or if your data
comes from a source other than a local file (streamed over a network connection, for example).
The following example code demonstrates using COPY to copy a file from the client to the
database. It is the same as the code shown in Bulk Loading Using the COPY Statement and the
path to the data file is on the client system, rather than on the server.
To load data that is stored on a database node, use a VerticaCommand object to create a COPY
command:
1 Create a connection to the database (page 164) through the node on which the data file is
stored.
2 Create a command object using the connection.
-197-
Programmer's Guide
VerticaCommand command = _conn.CreateCommand();
3 Copy data. The following is an example of using the COPY command to load data. It uses the
LOCAL modifier to copy a file local to the client issuing the command.
command.CommandText = "copy lcopy_table from LOCAL 'c://customers.txt'"
+ " record terminator E'\n' delimiter '|'"
+ " enforcelength ";
Int32 insertedRows = command.ExecuteNonQuery();
Console.WriteLine(insertedRows + " inserted.");
Example Usage:
using
using
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
System.Data;
System.IO;
Vertica.Data.VerticaClient;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
//configure connection properties
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
//open the connection
-198-
HP Vertica C lient Libraries
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
try
{
using (_conn)
{
//start a transaction
VerticaTransaction txn = _conn.BeginTransaction();
//create a table for this example
VerticaCommand command = new VerticaCommand("DROP TABLE IF
EXISTS lcopy_table", _conn);
command.ExecuteNonQuery();
command.CommandText = "CREATE TABLE IF NOT EXISTS lcopy_table
(Last_Name char(50), "
+ "First_Name char(50),Email char(50), "
+ "Phone_Number char(15))";
command.ExecuteNonQuery();
//define the copy command
command.CommandText = "copy lcopy_table from LOCAL
'c://customers.txt'"
+ " record terminator E'\n' delimiter '|'"
+ " enforcelength "
+ " no commit";
//execute the copy
Int32 insertedRows = command.ExecuteNonQuery();
Console.WriteLine(insertedRows + " inserted.");
//commit the changes
txn.Commit();
}
}
catch (Exception e)
{
Console.WriteLine("Exception: " + e.Message);
}
//close the connection
-199-
Programmer's Guide
_conn.Close();
}
}
}
Handling Messages (ADO.NET)
You can capture info and warning messages that HP Vertica provides to the ADO.NET driver by
using the InfoMessage event on the VerticaConnection delegate class. This class captures
messages that are not severe enough to force an exception to be triggered, but might still provide
information that can benefit your application.
To use the VerticaInfoMessageEventHander class:
1 Create a method to handle the message sent from the even handler:
static void conn_InfoMessage(object sender,
VerticaInfoMessageEventArgs e)
{
Console.WriteLine(e.SqlState + ": " + e.Message);
}
2 Create a connection (page 164) and register a new VerticaInfoMessageHandler delegate for
the InfoMessage event:
_conn.InfoMessage += new
VerticaInfoMessageEventHandler(conn_InfoMessage);
3 Execute your queries. If a message is generated, then the event handle function is run.
4 You can unsubscribe from the event with the following command:
_conn.InfoMessage -= new
VerticaInfoMessageEventHandler(conn_InfoMessage);
-200-
HP Vertica C lient Libraries
Example usage:
using
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
System.Data;
Vertica.Data.VerticaClient;
namespace ConsoleApplication
{
class Program
{
// define message handler to deal with messages
static void conn_InfoMessage(object sender, VerticaInfoMessageEventArgs
e)
{
Console.WriteLine(e.SqlState + ": " + e.Message);
}
static void Main(string[] args)
{
//configure connection properties
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
//open the connection
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
//create message handler instance by subscribing it to the InfoMessage event of the
connection
_conn.InfoMessage += new
VerticaInfoMessageEventHandler(conn_InfoMessage);
//create and execute the command
-201-
Programmer's Guide
VerticaCommand cmd = _conn.CreateCommand();
cmd.CommandText = "drop table if exists fakeTable";
cmd.ExecuteNonQuery();
//close the connection
_conn.Close();
}
}
}
This examples displays the following when run:
00000: Nothing was dropped
Getting Table Metadata (ADO.Net)
You can get the table metadata by using the GetSchema() method on a connection and loading
the metadata into a DataTable:
DataTable table = _conn.GetSchema("Tables", new string[] { database_name,
schema_name, table_name, table_type });
For example:
DataTable table = _conn.GetSchema("Tables", new string[] { null, null, null, "SYSTEM TABLE" });
database_name, schema_name, table_name can be set to null, be a specific name, or use a LIKE
pattern.
table_type can be one of:






"SYSTEM TABLE"
"TABLE"
"GLOBAL TEMPORARY"
"LOCAL TEMPORARY"
"VIEW"
null
If table_type is set to null, then the metadata for all metadata tables is returned.
-202-
HP Vertica C lient Libraries
Example Usage:
using
using
using
using
using
using
System;
System.Collections.Generic;
System.Linq;
System.Text;
System.Data;
Vertica.Data.VerticaClient;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
//configure connection properties
VerticaConnectionStringBuilder builder = new
VerticaConnectionStringBuilder();
builder.Host = "192.168.1.10";
builder.Database = "VMart";
builder.User = "dbadmin";
//open the connection
VerticaConnection _conn = new VerticaConnection(builder.ToString());
_conn.Open();
//create a new data table containing the schema
//the last argument can be "SYSTEM TABLE", "TABLE", "GLOBAL
TEMPORARY",
// "LOCAL TEMPORARY", "VIEW", or null for all types
DataTable table = _conn.GetSchema("Tables", new string[] { null, null,
null, "SYSTEM TABLE" });
//print out the schema
-203-
Programmer's Guide
foreach (DataRow row in table.Rows)
{
foreach (DataColumn col in table.Columns)
{
Console.WriteLine("{0} = {1}", col.ColumnName, row[col]);
}
Console.WriteLine("============================");
}
//close the connection
_conn.Close();
}
}
}
ADO.NET Data Types
his table details the mapping between HP Vertica data type's and .NET and ADO.NET data types.
.NET Framework
Type
ADO.N ET Db Type
VerticaType
HP Vertica Data
Type
VerticaDataReader getter
Boolean
Boolean
Bit
Boolean
GetBoolean()
byte[]
Binary
Binary
Binary
GetBytes()
VarBinary
VarBinary
Datetime
DateTime
Date
Time
TimeStamp
Date
Time
TimeStamp
GetDateTime()
Note: The Time portion of
the DateTime object for
vertica dates is set to
DateTime.MinValue.
Previously,
VerticaType. DateTime was
used for all date/time types.
VerticaType. DateTime still
exists for backwards
compatibility, but now there
are more specific
VerticaTypes for each type.
DateTimeOffset
DateTimeOffset
TimestampT
Z
TimeTZ
TimestampTZ
GetDateTimeOffset()
TimeTZ
Note: The Date portion of
-204-
the DateTime is set to
DateTime.MinValue
HP Vertica C lient Libraries
Decimal
Decimal
Numeric
Numeric
GetDecimal()
Double
Double
Double
Double
Prevision
GetDouble()
Int64
Int64
BigInt
Integer
GetInt64()
TimeSpan
Object
13 Interval
Types
13 Interval
Types
GetInterval()
Note: There are 13
VerticaType values for the 13
types of intervals. The
specific VerticaType us ed
determines the conversion
rules that the driver applies.
Year/Month intervals
represented as 365/30 days
String
String
Varchar
Varchar
GetString()
String
StringFixedLengt
Char
Char
GetString()
Object
Object
N/A
N/A
GetValue()
Microsoft Connectivity Pack
The 6.0.0.1 and later versions of the Microsoft Connectivity Pack/ADO.NET client driver provide
additional components for integration with Visual Studio and Microsoft SQL Server. This
enhanced HP Vertica .NET data provider package provides integration with the following:

Microsoft Business Intelligence Development Studio (BIDS) plugin and Visual Studio Shell
Integration
 SQl Server Analysis Services (SSAS)
 SQL Server Integration Services (SSIS) - which provides SQL Server Type Mappings to map
data types between HP Vertica and SQL Server.
 SQL Server Reporting Services (SSRS)
See Microsoft Connectivity Pack/ADO.NET Prerequisites (page 16) for details on which
Microsoft components must be installed to use this integration.
Microsoft Business Intelligence Development Studio (BIDS) plugin and Visual
Studio Shell Integration
Business Intelligence Development Studio (BIDS) is Microsoft Visual Studio 2008 with additional
project types that are specific to SQL Server business intelligence. BIDS is the primary
environment that you use to develop business solutions that include Analysis Services, Integration
Services, and Reporting Services projects.
-205-
Programmer's Guide
The HP Vertica Microsoft Business Intelligence Development Studio (BIDS) plugin lets you view a
list of tables and data in your HP Vertica database from within Visual Studio's Server Explorer and
from within C# application when Visual Studio is used in BIDS mode.
Visual Studio Shell Integration allows you to browse an HP Vertica database from within Visual
Studio's server explorer outside of BIDS development.
Notes:
 Business Intelligence Development Studio and the the 32-bit development environment for
Analysis Services, Integration Services, and Reporting Services projects are not designed to
run on the Itanium 64-bit architecture and is not installed on Itanium servers.
 Microsoft Visual Studio 2010 does not support Business Intelligence Development Studio
Integration Services, Report Services and Analysis Services projects for SQL Server 2008 and
SQL Server 2008 R2. To work around this issue, install Visual Studio 2008 alongside Visual
Studio 2010 on the same workstation and open your Business Intelligence Development
Studio projects in Visual Studio 2008.
Microsoft SSAS, SSIS, and SSRS Components
Both client and server side components are necessary in order to develop and implement the
Microsoft Business Intelligence capabilities. This section describes the offerings provided by HP
Vertica.
SQL Server Analysis Services (SSAS) Support
Business Intelligence Development Studio includes the Analysis Services project for developing
online analytical processing (OLAP) for business intelligence applications. This project type
includes the templates for cubes, dimensions, data sources, and data source views, and provides
the tools for working with these objects.
Support for using HP Vertica as a data source and target is now available from both SSIS and the
import/export wizard. HP Vertica specific mapping files must be installed on the Integration
Server and BIDS workstation in order to enable this capability. The ADO.NET client driver installs
these mapping files as the "SQL Server Type Mappings" component(s) (both 32-bit and 64-bit
versions).
SQL Server Integration Services (SSIS) Support
Business Intelligence Development Studio includes the Integration Services project for developing
ETL solutions. This project type includes the templates for packages, data sources, and data
source views, and provides the tools for working with these objects.
Integration Services Limitations.
1 The .NET data provider for ODBC DDL SQL generation for target source generates a syntax
error:
When using HP Vertica as a target data source using the .net data provider for ODBC, the
following error occurs when you expect SSIS to generate valid SQL for HP Vertica:
-206-
HP Vertica C lient Libraries
ERROR [42601] ERROR 4856: Syntax error at or near "(" at character 60
This error occurs because, when using the ODBC bridge, SSIS does not know the flavor of
SQL required to create a new table, so it defaults to SQL Server syntax. To use the ODBC
bridge and create a new table with SSIS, you must manually edit the SQL in the CREATE
statement to conform to valid HP Vertica syntax.
2 SSIS Package Upgrades:
When HP Vertica provides a new version of the ADO.NET data provider and the version
number changes, any SSIS packages created must have the version number modified.
Currently this is a manual process. Listed below is an example of a SSIS package XML tag
with the version number.
<DTS:Property
DTS:Name="CreationName">ADO.NET:Vertica.Data.VerticaClient.VerticaCo
nnection, Vertica.Data, Version=6.0.1.0, Culture=neutral,
PublicKeyToken=725728a8837b3bd5</DTS:Property><DTS:PropertyExpressio
n
DTS:Name="ConnectionString">@[User::SrcConnectionString]</DTS:Proper
tyExpression><DTS:ObjectData><DTS:ConnectionManager>
3 Schema Name Not provided when Mapping HP Vertica Destination:
Currently when mapping an HP Vertica destination, the schema name is not automatically
provided. You must enter it manually or pick it from the drop-down menu as follows.
4 Null values when using the ODBC bridge data provider:
Essentially, the existence of null values in the source data can cause the batched insert to fail
or partially fail. For example, the null value stops the insertion mid-update. All the rows that do
not have null values that came before the null row are returned, but none after. For example, if
transferring 5 rows with SSIS using the bridge for a destination and the data returned by the
provider in this order:
A
-207-
Programmer's Guide
B
C (contains null)
D
E
After the package executes, the destination has only A, B, and C in it.
SQL Server Reporting Services (SSRS) Support
Business Intelligence Development Studio includes the Report projects for developing reporting
solutions.
Support for using HP Vertica as a data source is now available for Reporting Service. The installer
implements various configuration file modifications to enable this capability on both the BIDS
workstation and the Reporting Services server.
Reporting Services Limitations:
1 Data Connection Wizard
The Report Wizard provides a data connection wizard. When selecting the wizard and entering
all the connection information, it is not possible to click the OK button. Because of this, it is not
possible to save your work and continue. The workaround is not to use the wizard and to use
the following panel instead as follows shown:
2 Report Wizard - Query Designer
The HP Vertica Data Provider uses the Generic Query Designer. Other data sources use a
Graphical Query Designer that supports visually building queries. The Graphical Query Designer
is a part of a package call Visual Data Tools (VDT). The Graphical Query Designer only works with
Generic OLEDB providers and the built-in providers; it cannot be used with the HP Vertica Data
Provider.
1 Report Builder
-208-
HP Vertica C lient Libraries
Report Builder is a web-based report design tool. It does not support creating reports using
custom data extensions, so it cannot be used with the HP Vertica Data Provider. When
creating a report using Report Builder, existing HP Vertica data sources appear in the list of
available data sources, but choosing an HP Vertica data source causes an error.
Data Type Limitations in SSIS
Consider the following data type limitations when using SQL Server Integration Services (SSIS):
DATE and DATETIME
The range for DATE and DATETIME, in order to function without errors, falls between 0001-01-01
00:00:00.0000000 and 9999-12-31 23:59:59.999999.
In SSIS, the DATETIME type (DT_TIMESTAMP) only supports a scale up to three decimal places
for the seconds. Any decimal places after that are automatically discarded. Derived column
transformations can only be performed on DATETIME values between January 1, 1753 through
December 31, 9999.
Numeric
The maximum and minimum decimal allowed is:


Max: +79,228,162,514,264,337,593,543,950,335
Min: -79,228,162,514,264,337,593,543,950,335
For example, if the scale is 16, the range of values is:
+/- 7,922,816,251,426.4337593543950335
The valid scale range is any number that is smaller than 29 and greater than 38. Using a scale
between 29 and 38 generates an error.
See : http://msdn.microsoft.com/en-us/library/system.decimal.maxvalue.aspx
http://msdn.microsoft.com/en-us/library/system.decimal.maxvalue.aspx
Character
The CHAR and VARCHAR data type used in SSIS is DT_WSTR; it has a maximum length of 4000
characters.
In SSIS, HP Vertica strings are converted to Unicode strings in SSIS to handle multi-lingual data.
The strings can be converted to ASCII using a Data Conversion Task.
Floating point
SQL Server does not support the values NaN, Infinity, or –Infinity. These values work when you
use SSIS to transfer between HP Vertica instances, but they do not work with a SQL Server
Destination.
Scale
Whenever a scale greater than 38 is used, SSIS replaces it with a value of 4.
-209-
Programmer's Guide
Interval
SSIS does not support interval types. It converts them to TIME and strips off the day component.
Any package that has interval types greater than a day fails with truncation.
-210-
Programming Python Client Applications
HP Vertica provides an ODBC driver so applications can connect to the HP Vertica database.
In order to use Python with HP Vertica, you must install the pyodbc module and an HP Vertica
ODBC driver on the machine where Python is installed. See Python Prerequisites (page 17).
Python on Linux
Most Linux distributions come with Python preinstalled. If you want a more recent version, you can
download and build it from the source code, though sometimes RPMs are also available. See the
the Python Web site http://www.python.org/download/ and click an individual release for
details. See also Python documentation http://www.python.org/doc/.
To determine the Python version on your Linux operating systems, type the following at a
command prompt:
# python -V
The system returns the version; for example:
Python 2.5.2
Python on Windows
Python is not required to run natively on Windows operating systems, so it is not preinstalled. The
ActiveState Web site distributes a free Windows installer for Python called ActivePython
http://www.activestate.com/activepython/.
If you need installation instructions for Windows, see Using Python on Windows
http://docs.python.org/using/windows.html at python.org. Python on Windows
http://diveintopython.org/installing_python/windows.html at diveintopython.org provides
installation instructions for both the ActivePython and python.org packages.
The Python Driver Module (pyodbc)
The native python driver is not supported.
Before you can connect to HP Vertica using Python, you need the pyodbc module, which
communicates with iODBC/unixODBC driver on UNIX operating systems and the ODBC Driver
Manager for Windows operating systems.
-211-
Programmer's Guide
The pyodbc module is an open source , MIT-licensed Python module that implements the Python
Database API Specification v2.0, letting you use ODBC to connect to almost any database from
Windows, Linux, Mac OS/X, and other operating systems.
HP Vertica supports pyodbc version 2.1.6, which requires Python 2.4 or greater, up to 2.6. HP
Vertica does not support Python version 3.x. See Python Prerequisites (page 17) for additional
details.
Download the source distribution from the pyodbc Web site
http://code.google.com/p/pyodbc/, unpack it and build it. See the pyodbc wiki
http://code.google.com/p/pyodbc/w/list for instructions.
Note: Links to external Web sites could change between HP Vertica releases.
External Resources
Python Database API Specification v2.0 http://www.python.org/dev/peps/pep-0249/
Python documentation http://www.python.org/doc/
Configuring the ODBC Run-time Environment on Linux
To configure the ODBC run-time environment on Linux:
1 Create the odbc.ini file if it does not already exist.
2 Add the ODBC driver directory to the LD_LIBRARY_PATH system environment variable:
export LD_LIBRARY_PATH=/path-to-vertica-odbc-driver:$LD_LIBRARY_PATH
IMPORTANT! If you skip Step 2, the ODBC manager cannot find the driver in order to load it.
These steps are relevant only for unixODBC and iODBC. See their respective documentation for
details on odbc.ini.
See Also
unixODBC Web site http://www.unixodbc.org/
iODBC Web site http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/
Querying the Database Using Python
The example session below uses pyodbc with the HP Vertica ODBC driver to connect Python to
the HP Vertica database.
-212-
HP Vertica C lient Libraries
iODBC Users:
SQLFetchScroll and SQLFetch functions cannot be mixed together
in iODBC code.
When using pyodbc with the iODBC driver manager, skip cannot
be used with the fetchall, fetchone, and fetchmany functions.
1 Open a database connection, create a table called TEST:
cnxn = pyodbc.connect(connection_string, ansi=True) cursor =
cnxn.cursor()
# create table
cursor.execute("CREATE TABLE TEST("
"C_ID INT,"
"C_FP FLOAT,"
"C_VARCHAR VARCHAR(100),"
"C_DATE DATE, C_TIME TIME,"
"C_TS TIMESTAMP,"
"C_BOOL BOOL)")
2 Insert records into table TEST:
cursor.execute("INSERT into test
values(1,1.1,'abcdefg1234567890','1901-01-01','23:12:34','1901-01-01
09:00:09','t')")
3 Insert records using bind values:
values =
(2,2.28,'abcdefg1234567890','1901-01-01','23:12:34','1901-01-01
09:00:09','t')
cursor.execute("INSERT into test values(?,?,?,?,?,?,?)",
values[0], values[1], values[2], values[3], values[4],
values[5], values[6])
4 Select data from the TEST table:
cursor.execute("SELECT * FROM TEST")
rows = cursor.fetchall()
for row in rows:
print row
The following is the example output:
(1L, 1.1000000000000001, 'abcdefg1234567890', datetime.date(1901, 1,
1), datetime.time(23, 12, 34), datetime.datetime(1901, 1, 1, 9, 0, 9),
'1') (2L, 2.2799999999999998, 'abcdefg1234567890', datetime.date(1901,
1, 1), datetime.time(23, 12, 34), datetime.datetime(1901, 1, 1, 9, 0,
9), '1')
5 Drop the TEST table and its associated projections and close the database connection:
cursor.execute("DROP TABLE TEST CASCADE")
cursor.close()
cnxn.close()
-213-
Programmer's Guide
Notes
SQLPrimaryKeys returns the table name in the primary (pk_name) column for unnamed primary
constraints. For example:


Unnamed primary key:
CREATE TABLE schema.test(c INT PRIMARY KEY);
SQLPrimaryKeys
"TABLE_CAT", "TABLE_SCHEM", "TABLE_NAME", "COLUMN_NAME", "KEY_SEQ",
"PK_NAME" <Null>, "SCHEMA", "TEST", "C", 1, "TEST"
Named primary key:
CREATE TABLE schema.test(c INT CONSTRAINT pk_1 PRIMARY KEY);
SQLPrimaryKeys
"TABLE_CAT", "TABLE_SCHEM", "TABLE_NAME", "COLUMN_NAME", "KEY_SEQ",
"PK_NAME" <Null>, "SCHEMA", "TEST", "C", 1, "PK_1"
HP recommends that you name your constraints.
See Also
Loading Data Through ODBC (page 81)
-214-
Programming Perl Client Applications
The Perl programming language has a Database Interface module (DBI) that creates a standard
interface for Perl scripts to interact with databases. The interface module relies on Database
Driver modules (DBDs) to handle all of the database-specific communication tasks. The result is
an interface that provides a consistent way for Perl scripts to interact with many different types of
databases.
Your Perl script can interact with HP Vertica using the Perl DBI module along with the DBD::ODBC
database driver to interface to HP Vertica's ODBC driver. See the CPAN pages for Perl's DBI
http://search.cpan.org/dist/DBI/DBI.pm and DBD::ODBC
http://search.cpan.org/~mjevans/DBD-ODBC-1.43/ODBC.pm modules for detailed
documentation.
The topics in this chapter explain how to:




Configure Perl to access HP Vertica
Connect to HP Vertica
Query data stored in HP Vertica
Insert data into HP Vertica
Perl Client Prerequisites
In order run a Perl client script that connects to HP Vertica, your client system must have:




The HP Vertica ODBC drivers installed and configured. See Installing the HP Vertica Client
Drivers (page 15) for details.
A Data Source Name (DSN) containing the connection parameters for your HP Vertica. See
Creating an ODBC Data Source Name (page 29). (Optionally, your Perl script can connect to
HP Vertica without using a DSN as described in Connecting from Perl Without a DSN (page
221)).
A supported version of Perl installed
The DBI and DBD::ODBC Perl modules (see below)
Supported Perl Versions
HP Vertica supports Perl versions 5.8 and 5.10. Versions later than 5.10 may also work.
-215-
Programmer's Guide
Perl on Linux
Most Linux distributions come with Perl preinstalled. See your Linux distribution's documentation
for details of installing and configuring its Perl package is it is not already installed.
To determine the Perl version on your Linux operating systems, type the following at a command
prompt:
# perl -v
The system returns the version; for example:
This is perl, v5.10.0 built for x86_64-linux-thread-multi
Perl on Windows
Perl is not installed by default on Windows platforms. There are several different Perl packages
you can download and install on your Windows system:


ActivePerl http://www.activestate.com/activeperl/ by Activestate is a
commercially-supported version of Perl for Windows platforms.
Strawberry Perl http://strawberryperl.com/ is an open-source port of Perl for Windows.
The Perl Driver Modules (DBI and DBD::ODBC)
Before you can connect to HP Vertica using Perl, your Perl installation needs to have the Perl
Database Interface module (DBI) and the Database Driver for ODBC (DBD::ODBC). These
modules communicate with iODBC/unixODBC driver on UNIX operating systems or the ODBC
Driver Manager for Windows operating systems.
HP Vertica supports the following Perl modules:


DBI version 1.609 (DBI-1.609.tar.gz)
DBD::ODBC version 1.22 (DBD-ODBC-1.22.tar.gz)
Later versions of DBI and DBD::ODBC may also work.
DBI is installed by default with many Perl installations. You can test whether it is installed by
executing the following command on the Linux or Windows command line:
# perl -e "use DBI;"
If the command exits without printing anything, then DBI is installed. If it prints an error, such as:
Can't locate DBI.pm in @INC (@INC contains: /usr/local/lib64/perl5
/usr/local/share/perl5 /usr/lib64/perl5/vendor_perl
/usr/share/perl5/vendor_perl
/usr/lib64/perl5 /usr/share/perl5 .) at -e line 1.
BEGIN failed--compilation aborted at -e line 1.
then DBI is not installed.
Similarly, you can see if DBD::ODBC is installed by executing the command:
# perl -e "use DBD::ODBC;"
You can also run the following Perl script to determine if DBI and DBD::ODBC are installed. If they
are, the script lists any available DSNs.
-216-
HP Vertica C lient Libraries
#!/usr/bin/perl
use strict;
# Attempt to load the DBI module in an eval using require. Prevents
# script from erroring out if DBI is not installed.
eval
{
require DBI;
DBI->import();
};
if ($@) {
# The eval failed, so DBI must not be installed
print "DBI module is not installed\n";
} else {
# Eval was successful, so DBI is installed
print "DBI Module is installed\n";
# List the drivers that DBI knows about.
my @drivers = DBI->available_drivers;
print "Available Drivers: \n";
foreach my $driver (@drivers) {
print "\t$driver\n";
}
# See if DBD::ODBC is installed by searching driver array.
if (grep {/ODBC/i} @drivers) {
print "\nDBD::ODBC is installed.\n";
# List the ODBC data sources (DSNs) defined on the system
print "Defined ODBC Data Sources:\n";
my @dsns = DBI->data_sources('ODBC');
foreach my $dsn (@dsns) {
print "\t$dsn\n";
}
} else {
print "DBD::ODBC is not installed\n";
}
}
The exact output of the above code will depend on the configuration of your system. The following
is an example of running the code on a Windows computer:
DBI Module is installed
Available Drivers:
ADO
DBM
ExampleP
File
Gofer
ODBC
Pg
Proxy
SQLite
Sponge
mysql
-217-
Programmer's Guide
DBD::ODBC is installed.
Defined ODBC Data Sources:
dbi:ODBC:dBASE Files
dbi:ODBC:Excel Files
dbi:ODBC:MS Access Database
dbi:ODBC:VerticaDSN
Installing Missing Perl Modules
If Perl's DBI or DBD::ODBC modules are not installed on your client system, you must install them
before your Perl scripts can connect to HP Vertica. How you install modules depends on your Perl
configuration:

For most Perl installations, you use the cpan command to install modules. If the cpan
command alias isn't installed on your system, you can try to start CPAN by using the
command:
perl -MCPAN -e shell
 Some Linux distributions provide Perl modules as packages that can be installed with the
system package manager (such as yum or apt). See your Linux distribution's documentation
for details.
 On ActiveState Perl for Windows, you use the Perl Package Manager (PPM) program to install
Perl modules. See the Activestate's PPM documentation
http://docs.activestate.com/activeperl/5.16/faq/ActivePerl-faq2.html for details.
Note: Installing Perl modules usually requires administrator or root privileges. If you do not
have these permissions on your client system, you need to ask your system administrator to
install these modules for you.
Connecting to HP Vertica Using Perl
You use the Perl DBI module's connect function to connect to HP Vertica. This function takes a
required data source string argument and optional arguments for the username, password, and
connection attributes.
The data source string must start with "dbi:ODBC:", which tells the DBI module to use the
DBD::ODBC driver to connect to HP Vertica. The remainder of the string is interpreted by the
DBD::ODBC driver. It usually contains the name of a DSN that contains the connection
information needed to connect to your HP Vertica database. For example, to tell the DBD::ODBC
driver to use the DSN named VerticaDSN, you use the data source string:
"dbi:ODBC:VerticaDSN"
The username and password parameters are optional. However, if you do not supply them (or just
the username for a passwordless account) and they are not set in the DSN, attempting to connect
always fails.
The connect function returns a database handle if it connects to HP Vertica. If it does not, it
returns undef. In that case, you can access the DBI module's error string property
($DBI::errstr) to get the error message.
-218-
HP Vertica C lient Libraries
Note: By default, the DBI module prints an error message to STDERR whenever it encounters
an error. If you prefer to display your own error messages or handle errors in some other
manner, you may want to disable these automatic messages by setting DBI's PrintError
connection attribute to false. See Setting Perl DBI Connection Attributes (page 220) for
details. Otherwise, users may see two error messages: the one that DBI prints automatically,
and the one that your script prints on its own.
The following example demonstrates connecting to HP Vertica using a DSN named VerticaDSN.
The call to connect supplies a username and password. After connecting, it calls the database
handle's disconnect function, which closes the connection.
#!/usr/bin/perl -w
use strict;
use DBI;
# Open a connection using a DSN. Supply the username and password.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123");
unless (defined $dbh) {
# Conection failed.
die "Failed to connect: $DBI::errstr";
}
print "Connected!\n";
$dbh->disconnect();
Setting ODBC Connection Parameters in Perl
To set ODBC connection parameters, replace the DSN name with a semicolon delimited list of
parameter name and value pairs in the source data string. Use the DSN parameter to tell
DBD::ODBC which DSN to use, then add in other the other ODBC parameters you want to set. For
example, the following code connects using a DSN named VerticaDSN and sets the connection's
locale to en_GB.
-219-
Programmer's Guide
#!/usr/bin/perl -w
use strict;
use DBI;
# Instead of just using the DSN name, use name and value pairs.
my $dbh =
DBI->connect("dbi:ODBC:DSN=VerticaDSN;Locale=en_GB","ExampleUser","password123
");
unless (defined $dbh) {
# Conection failed.
die "Failed to connect: $DBI::errstr";
}
print "Connected!\n";
$dbh->disconnect();
See DSN Parameters (page 43) for a list of the connection parameters you can set in the source
data string.
Setting Perl DBI Connection Attributes
The Perl DBI module has attributes that you can use to control the behavior of its database
connection. These attributes are similar to the ODBC connection parameters (in several cases,
they duplicate each other's functionality). The DBI connection attributes are a cross-platform way
of controlling the behavior of the database connection.
You can set the DBI connection attributes when establishing a connection by passing the DBI
connect function a hash containing attribute and value pairs. For example, to set the DBI
connection attribute AutoCommit to false, you would use:
# Create a hash that holds attributes for the connection
my $attr = {AutoCommit => 0};
# Open a connection using a DSN. Supply the username and password.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123",
$attr);
See the DBI documentation's Database Handle Attributes
http://search.cpan.org/~timb/DBI/DBI.pm#ATTRIBUTES_COMMON_TO_ALL_HANDLES
section for a full description of the attributes you can set on the database connection.
After your script has connected, it can access and modify the connection attributes through the
database handle by using it as a hash reference. For example:
print "The AutoCommit attribute is: " . $dbh->{AutoCommit} . "\n";
The following example demonstrates setting two connection attributes:

RaiseError controls whether the DBI driver generates a Perl error if it encounters a database
error. Usually, you set this to true (1) if you want your Perl script to exit if there is a database
error.
-220-
HP Vertica C lient Libraries

AutoCommit controls whether statements automatically commit their transactions when they
complete. DBI defaults to HP Vertica's default AutoCommit value of true. Always set
AutoCommit to false (0) when bulk loading data to increase database efficiency.
#!/usr/bin/perl
use strict;
use DBI;
# Create a hash that holds attributes for the connection
my $attr = {
RaiseError => 1, # Make database errors fatal to script
AutoCommit => 0, # Prevent statements from committing
# their transactions.
};
# Open a connection using a DSN. Supply the username and password.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123",
$attr);
if (defined $dbh->err) {
# Connection failed.
die "Failed to connect: $DBI::errstr";
}
print "Connected!\n";
# The database handle lets you access the connection attributes directly:
print "The AutoCommit attribute is: " . $dbh->{AutoCommit} . "\n";
print "The RaiseError attribute is: " . $dbh->{RaiseError} . "\n";
# And you can change values, too...
$dbh->{AutoCommit} = 1;
print "The AutoCommit attribute is now: " . $dbh->{AutoCommit} . "\n";
$dbh->disconnect();
The example outputs the following when run:
Connected!
The AutoCommit attribute is: 0
The RaiseError attribute is: 1
The AutoCommit attribute is now: 1
Connecting from Perl Without a DSN
If you do not want to set up a Data Source Name (DSN) for your database, you can supply all of
the information Perl's DBD::ODBC driver requires to connect to your HP Vertica database in the
data source string. This source string must the DRIVER= parameter that tells DBD::ODBC which
driver library to use in order to connect. The value for this parameter is the name assigned to the
driver by the client system's driver manager:

On Windows, the name assigned to the HP Vertica ODBC driver by the driver manager is
Vertica.
-221-
Programmer's Guide

On Linux and other UNIX-like operating systems, the HP Vertica ODBC driver's name is
assigned in the system's odbcinst.ini file. For example, if your /etc/odbcint.ini
contains the following:
[HPVertica]
Description = HP Vertica ODBC Driver
Driver = /opt/vertica/lib64/libverticaodbc.so
you would use the name HPVertica. See Creating an ODBC DSN for Linux, Solaris, AIX,
and HP-UX (page 29) for more information about the odbcinst.ini file.
You can take advantage of Perl's variable expansion within strings to use variables for most of the
connection properties, as the following example demonstrates.
#!/usr/bin/perl
use strict;
use DBI;
my
my
my
my
my
$server='VerticaHost';
$port = '5433';
$database = 'VMart';
$user = 'ExampleUser';
$password = 'password123';
# Connect without a DSN by supplying all of the information for the connection.
# The DRIVER value on UNIX platforms depends on the entry in the odbcinst.ini
# file.
my $dbh = DBI->connect("dbi:ODBC:DRIVER={Vertica};Server=$server;" .
"Port=$port;Database=$database;UID=$user;PWD=$password")
or die "Could not connect to database: " . DBI::errstr;
print "Connected!\n";
$dbh->disconnect();
Note: Surrounding the driver name with braces ({ and }) in the source string is optional.
Executing Statements Using Perl
Once your Perl script has connected to HP Vertica (see Connecting to HP Vertica Using Perl
(page 218)), it can execute simple statements that return a value rather than a result set by using
the Perl DBI module's do function. You usually use this function to execute DDL statements or
data loading statements such as COPY (see Using COPY LOCAL to Load Data in Perl (page
226)).
-222-
HP Vertica C lient Libraries
#!/usr/bin/perl
use strict;
use DBI;
# Disable autocommit
my $attr = {AutoCommit => 0};
# Open a connection using a DSN.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123",
$attr);
unless (defined $dbh) {
# Conection failed.
die "Failed to connect: $DBI::errstr";
}
# You can use the do function to perform DDL commands.
# Drop any existing table.
$dbh->do("DROP TABLE IF EXISTS TEST CASCADE;");
# Create a table to hold data.
$dbh->do("CREATE TABLE TEST( \
C_ID INT, \
C_FP FLOAT,\
C_VARCHAR VARCHAR(100),\
C_DATE DATE, C_TIME TIME,\
C_TS TIMESTAMP,\
C_BOOL BOOL)");
# Commit changes and exit.
$dbh->commit();
$dbh->disconnect();
Note: The do function returns the number of rows that were affected by the statement (or -1 if
the count of rows doesn't apply or is unavailable). Usually, the only time you need to consult
this value is after you deleted a number of rows or if you used a bulk load command such as
COPY. You use other DBI functions instead of do to perform batch inserts and selects (see
Batch Loading Data Using Perl (page 223) and Querying HP Vertica Using Perl (page 229)
for details).
Batch Loading Data Using Perl
To load large batches of data into HP Vertica using Perl:
1 Set DBI's AutoCommit connection attribute to false to improve the batch load speed. See
Setting Perl DBI Connection Attributes (page 220) for an example of disabling AutoCommit.
2 Call the database handle's prepare function to prepare a SQL INSERT statement that
contains placeholders for the data values you want to insert. For example:
# Prepare an INSERT statement for the test table
$sth = $dbh->prepare("INSERT into test values(?,?,?,?,?,?,?)");
The prepare function returns a statement handle that you will use to insert the data.
-223-
Programmer's Guide
3 Assign data to the placeholders. There are several ways to do this. The easiest is to populate
an array with a value for each placeholder in your INSERT statement.
4 Call the statement handle's execute function to insert a row of data into HP Vertica. The
return value of this function call lets you know whether HP Vertica accepted or rejected the
row.
5 Repeat steps 3 and 4 until you have loaded all of the data you need to load.
6 Call the database handle's commit function to commit the data you inserted.
The following example demonstrates inserting a small batch of data by populating an array of
arrays with data, then looping through it and inserting each row.
-224-
HP Vertica C lient Libraries
#!/usr/bin/perl
use strict;
use DBI;
# Create a hash reference that holds a hash of parameters for the
# connection.
my $attr = {AutoCommit => 0, # Turn off autocommit
PrintError => 0
# Turn off automatic error printing.
# This is handled manually.
};
# Open a connection using a DSN. Supply the username and password.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123",
$attr);
if (defined DBI::err) {
# Conection failed.
die "Failed to connect: $DBI::errstr";
}
print "Connection AutoCommit state is: " . $dbh->{AutoCommit} . "\n";
# Create table to hold inserted data
$dbh->do("DROP TABLE IF EXISTS TEST CASCADE;") or die "Could not drop table";
$dbh->do("CREATE TABLE TEST( \
C_ID INT, \
C_FP FLOAT,\
C_VARCHAR VARCHAR(100),\
C_DATE DATE, C_TIME TIME,\
C_TS TIMESTAMP,\
C_BOOL BOOL)") or die "Could not create table";
# Populate an array of arrays with values. One of these rows contains
# data that will not be sucessfully inserted. Another contains an
# undef value, which gets inserted into the database as a NULL.
my @data = (
[1,1.111,'Hello World!','2001-01-01','01:01:01'
,'2001-01-01 01:01:01','t'],
[2,2.22222,'How are you?','2002-02-02','02:02:02'
,'2002-02-02 02:02:02','f'],
['bad value',2.22222,'How are you?','2002-02-02','02:02:02'
,'2002-02-02 02:02:02','f'],
[4,4.22222,undef,'2002-02-02','02:02:02'
,'2002-02-02 02:02:02','f'],
);
# Create a prepared statement to use parameters for inserting values.
my $sth = $dbh->prepare_cached("INSERT into test values(?,?,?,?,?,?,?)");
my $rowcount = 0; # Count # of rows
# Loop through the arrays to insert values
foreach my $tuple (@data) {
$rowcount++;
# Insert the row
-225-
Programmer's Guide
my $retval = $sth->execute(@$tuple);
# See if the row was successfully inserted.
if ($retval == 1) {
# Value of 1 means the row was inserted (1 row was affected by insert)
print "Row $rowcount successfully inserted\n";
} else {
print "Inserting row $rowcount failed";
# Error message is not set on some platforms/versions of DBUI. Check to
# ensure a message exists to avoid getting an unitialized var warning.
if ($sth->err()) {
print ": " . $sth->errstr();
}
print "\n";
}
}
# Commit changes. With AutoCommit off, you need to use commit for batched
# data to actually be committed into the database. If your Perl script exits
# without committing its data, HP Vertica rolls back the transaction and the
# data is not committed.
$dbh->commit();
$dbh->disconnect();
The previous example displays the following when successfully run:
Connection AutoCommit state is: 0
Row 1 successfully inserted
Row 2 successfully inserted
Inserting row 3 failed with error 01000 [Vertica][VerticaDSII] (20) An
error occurred during query execution: Row rejected by server; see
server log for details (SQL-01000)
Row 4 successfully inserted
Note that one of the rows was not inserted because it contained a string value that could not be
stored in an integer column. See Conversions between Perl and HP Vertica Data Types (page
232) for details of data type handling in Perl scripts that communicate with HP Vertica.
Using COPY LOCAL to Load Data in Perl
If you have delimited files (for example, a file with comma-separated values) on your client system
that you want to load into HP Vertica, you can use the COPY LOCAL statement to directly load the
file's contents into HP Vertica instead of using Perl to read, parse, and then batch insert the data.
You execute a COPY LOCAL statement to load the file from the local filesystem. The result of
executing the statement is the number of rows that were successfully inserted.
The following example code demonstrates loading a file named data.txt and located in the same
directory as the Perl file into HP Vertica using a COPY LOCAL statement.
-226-
HP Vertica C lient Libraries
#!/usr/bin/perl
use strict;
use DBI;
# Filesystem path handling module
use File::Spec;
# Create a hash reference that holds a hash of parameters for the
# connection.
my $attr = {AutoCommit => 0}; # Turn off AutoCommit
# Open a connection using a DSN. Supply the username and password.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123",
$attr) or die "Failed to connect: $DBI::errstr";
print "Connected!\n";
# Drop any existing table.
$dbh->do("DROP TABLE IF EXISTS Customers CASCADE;");
# Create a table to hold data.
$dbh->do("CREATE TABLE Customers( \
ID INT, \
FirstName VARCHAR(100),\
LastName
VARCHAR(100),\
Email
VARCHAR(100),\
Birthday
DATE)");
# Find the absolute path to the data file located in the current working
# directory and named data.txt
my $currDir = File::Spec->rel2abs(File::Spec->curdir());
my $dataFile = File::Spec->catfile($currDir, 'data.txt');
print "Loading file $dataFile\n";
# Load local file using copy local. Return value is the # of rows affected
# which equates to the number of rows inserted.
my $rows = $dbh->do("COPY Customers FROM LOCAL '$dataFile' DIRECT")
or die $dbh->errstr;
print "Copied $rows rows into database.\n";
$dbh->commit();
# Prepare a query to get the first 15 rows of the results
my $sth = $dbh->prepare("SELECT * FROM Customers WHERE ID < 15 \
ORDER BY ID");
$sth->execute() or die "Error querying table: " . $dbh->errstr;
my @row; # Pre-declare variable to hold result row used in format statement.
# Use Perl formats to pretty print the output. Declare the heading for the
# form.
format STDOUT_TOP =
ID First
Last
EMail
Birthday
== =====
====
=====
========
.
-227-
Programmer's Guide
# The Perl write statement will output a formatted line with values from the
# @row array. See http://perldoc.perl.org/perlform.html for details.
format STDOUT =
@> @<<<<<<<<<<<<< @<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<
@row
.
# Loop through result rows while we have them
while (@row = $sth->fetchrow_array()) {
write; # Format command does the work of extracting the columns from
# the @row array and writing them out to STDOUT.
}
# Call commit to prevent Perl from complaining about uncommitted transactions
# when disconnecting
$dbh->commit();
$dbh->disconnect();
The data.txt file is a text file containing a row of data on each line. The columns are delimited by
pipe (|) characters. This is the default format that the COPY command accepts, which makes the
COPY LOCAL statement in the example code simple. See the COPY statement entry in the SQL
Reference Manual for handling data files that are in different formats. Here is an example of the
content in this file:
1|Georgia|Gomez|Rhiannon@magna.us|1937-10-03
2|Abdul|Alexander|Kathleen@ipsum.gov|1941-03-10
3|Nigel|Contreras|Tanner@et.com|1955-06-01
4|Gray|Holt|Thomas@Integer.us|1945-12-06
5|Candace|Bullock|Scott@vitae.gov|1932-05-27
6|Matthew|Dotson|Keith@Cras.com|1956-09-30
7|Haviva|Hopper|Morgan@porttitor.edu|1975-05-10
8|Stewart|Sweeney|Rhonda@lectus.us|2003-06-20
9|Allen|Rogers|Alexander@enim.gov|2006-06-17
10|Trevor|Dillon|Eagan@id.org|1988-11-27
11|Leroy|Ashley|Carter@turpis.edu|1958-07-25
12|Elmo|Malone|Carla@enim.edu|1978-08-29
13|Laurel|Ball|Zelenia@Integer.us|1989-09-20
14|Zeus|Phillips|Branden@blandit.gov|1996-08-08
15|Alexis|Mclean|Flavia@Suspendisse.org|2008-01-07
The example code produces the following output when run on a large sample file:
Connected!
Loading file /home/dbadmin/Perl/data.txt
Copied 1000000 rows into database.
ID First
Last
EMail
== =====
====
=====
1 Georgia
Gomez
Rhiannon@magna.us
2 Abdul
Alexander
Kathleen@ipsum.gov
3 Nigel
Contreras
Tanner@et.com
4 Gray
Holt
Thomas@Integer.us
5 Candace
Bullock
Scott@vitae.gov
6 Matthew
Dotson
Keith@Cras.com
7 Haviva
Hopper
Morgan@porttitor.edu
8 Stewart
Sweeney
Rhonda@lectus.us
9 Allen
Rogers
Alexander@enim.gov
-228-
Birthday
========
1937-10-03
1941-03-10
1955-06-01
1945-12-06
1932-05-27
1956-09-30
1975-05-10
2003-06-20
2006-06-17
HP Vertica C lient Libraries
10
11
12
13
14
Trevor
Leroy
Elmo
Laurel
Zeus
Dillon
Ashley
Malone
Ball
Phillips
Eagan@id.org
Carter@turpis.edu
Carla@enim.edu
Zelenia@Integer.us
Branden@blandit.gov
1988-11-27
1958-07-25
1978-08-29
1989-09-20
1996-08-08
Note: Loading a single, large data file into HP Vertica through a single data connection is less
efficient than loading a number of smaller files onto multiple nodes in parallel. Loading onto
multiple nodes prevents any one node from becoming a bottleneck.
Querying HP Vertica Using Perl
To query HP Vertica using Perl:
1 Prepare a query statement using the Perl DBI module's prepare function. This function
returns a statement handle that you use to execute the query and get the result set.
2 Execute the prepared statement by calling the execute function on the statement handle.
3 Retrieve the results of the query from the statement handle using one of several methods,
such as calling the statement handle's fetchrow_array function to retrieve a row of data, or
fetchall_array to get an array of arrays containing the entire result set (not a good idea if
your result set may be very large!).
The following example demonstrates querying the table created by the example shown in Batch
Loading Data Using Perl (page 223). It executes a query to retrieve all of the content of the table,
then repeatedly calls the statement handle's fetchrow_array function to get rows of data in an
array. It repeats this process until fetchrow_array returns undef, which means that there are
no more rows to be read.
-229-
Programmer's Guide
#!/usr/bin/perl
use strict;
use DBI;
my $attr = {RaiseError => 1 }; # Make errors fatal to the Perl script.
# Open a connection using a DSN. Supply the username and password.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123",
$attr);
# Prepare a query to get the content of the table
my $sth = $dbh->prepare("SELECT * FROM TEST ORDER BY C_ID ASC");
# Execute the query by calling execute on the statement handle
$sth->execute();
# Loop through result rows while we have them, getting each row as an array
while (my @row = $sth->fetchrow_array()) {
# The @row array contains the column values for this row of data
# Loop through the column values
foreach my $column (@row) {
if (!defined $column) {
# NULLs are signaled by undefs. Set to NULL for clarity
$column = "NULL";
}
print "$column\t"; # Output the column separated by a tab
}
print "\n";
}
$dbh->disconnect();
The example prints the following when run:
1
2
4
1.111 Hello World!
2001-01-01
01:01:01
2001-01-01 01:01:01
2.22222 How are you?
2002-02-02
02:02:02
2002-02-02 02:02:02
4.22222 NULL
2002-02-02
02:02:02
2002-02-02 02:02:02
0
1
0
Binding Variables to Column Values
Another method of retrieving the query results is to bind variables to columns in the result set using
the statement handle's bind_columns function. You may find this method convenient if you need
to perform extensive processing on the returned data, since your code can use variables rather
than array references to access the data. The following example demonstrates binding variables
to the result set, rather than looping through the row and column values.
-230-
HP Vertica C lient Libraries
#!/usr/bin/perl
use strict;
use DBI;
my $attr = {RaiseError => 1 }; # Make SQL errors fatal to the Perl script.
# Open a connection using a DSN. Supply the username and password.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN32","ExampleUser","password123",
$attr);
# Prepare a query to get the content of the table
my $sth = $dbh->prepare("SELECT * FROM TEST ORDER BY C_ID ASC");
$sth->execute();
# Create a set of variables to bind to the column values.
my ($C_ID, $C_FP, $C_VARCHAR, $C_DATE, $C_TIME, $C_TS, $C_BOOL);
# Bind the variable references to the columns in the result set.
$sth->bind_columns(\$C_ID, \$C_FP, \$C_VARCHAR, \$C_DATE, \$C_TIME,
\$C_TS, \$C_BOOL);
# Now, calling fetch() to get a row of data updates the values of the bound
# variables. Continue calling fetch until it returns undefined.
while ($sth->fetch()) {
# Note, you should always check that values are defined before using them,
# since NULL values are translated into Perl as undefined. For this
# example, just check the VARCHAR column for undefined values.
if (!defined $C_VARCHAR) {
$C_VARCHAR = "NULL";
}
# Just print values separated by tabs.
print "$C_ID\t$C_FP\t$C_VARCHAR\t$C_DATE\t$C_TIME\t$C_TS\t$C_BOOL\n";
}
$dbh->disconnect();
The output of this example is identical to the output of the previous example.
Preparing, Querying, and Returning a Single Row
If you expect a single row as the result of a query (for example, when you execute a COUNT (*)
query), you can use the DBI module's selectrow_array function to combine executing a
statement and retrieving an array as a result.
The following example shows using selectrow_array to execute and get the results of the
SHOW LOCALE statement. It also demonstrates changing the locale using the do function.
-231-
Programmer's Guide
#!/usr/bin/perl
use strict;
use DBI;
my $attr = {RaiseError => 1 }; # Make SQL errors fatal to the Perl script.
# Open a connection using a DSN. Supply the username and password.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123",
$attr);
# Demonstrate setting/getting locale.
# Use selectrow_array to combine preparing a statement, executing it, and
# getting an array as a result.
my @localerv = $dbh->selectrow_array("SHOW LOCALE;");
# The locale name is the 2nd column (array index 1) in the result set.
print "Locale: $localerv[1]\n";
# Use do() to execute a SQL statement to set the locale.
$dbh->do("SET LOCALE TO en_GB");
# Get the locale again.
@localerv = $dbh->selectrow_array("SHOW LOCALE;");
print "Locale is now: $localerv[1]\n";
$dbh->disconnect();
The result of running the example is:
Locale: en_US@collation=binary (LEN_KBINARY)
Locale is now: en_GB (LEN)
Conversions between Perl and HP Vertica Data Types
Perl is a loosely-typed programming language that does not assign specific data types to values. It
converts between string and numeric values based on the operations being performed on the
values. For this reason, Perl has little problem extracting most string and numeric data types from
HP Vertica. All interval data types (DATE, TIMESTAMP, etc.) are converted to strings. You can
use several different date and time handling Perl modules to manipulate these values in your
scripts.
HP Vertica NULL values translate to Perl's undefined (undef) value. When reading data from
columns that can contain NULL values, you should always test whether a value is defined before
using it.
When inserting data into HP Vertica, Perl's DBI module attempts to coerce the data into the correct
format. By default, it assumes column values are VARCHAR unless it can determine that they are
some other data type. If given a string value to insert into a column that has an integer or numeric
data type, DBI attempts to convert the string's contents to the correct data type. If the entire string
can be converted to a value of the appropriate data type, it inserts the value into the column. If not,
inserting the row of data fails.
-232-
HP Vertica C lient Libraries
DBI transparently converts integer values into numeric or float values when inserting into column
of FLOAT, NUMERIC, or similar data types. It converts numeric or floating values to integers only
when there would be no loss of precision (the value to the right of the decimal point is 0). For
example, it can insert the value 3.0 into an INTEGER column since there is no loss of precision
when converting the value to an integer. It cannot insert 3.1 into an INTEGER column, since that
would result in a loss of precision. It returns an error instead of truncating the value to 3.
The following example demonstrates some of the conversions that the DBI module performs when
inserting data into HP Vertica.
-233-
Programmer's Guide
#!/usr/bin/perl
use strict;
use DBI;
# Create a hash reference that holds a hash of parameters for the
# connection.
my $attr = {AutoCommit => 0, # Turn off autocommit
PrintError => 0
# Turn off print error. Manually handled
};
# Open a connection using a DSN. Supply the username and password.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123",
$attr);
if (defined DBI::err) {
# Conection failed.
die "Failed to connect: $DBI::errstr";
}
print "Connection AutoCommit state is: " . $dbh->{AutoCommit} . "\n";
# Create table to hold inserted data
$dbh->do("DROP TABLE IF EXISTS TEST CASCADE;");
$dbh->do("CREATE TABLE TEST( \
C_ID INT, \
C_FP FLOAT,\
C_VARCHAR VARCHAR(100),\
C_DATE DATE, C_TIME TIME,\
C_TS TIMESTAMP,\
C_BOOL BOOL)");
# Populate an array of arrays with values.
my @data = (
# Start with matching data types
[1,1.111,'Matching datatypes','2001-01-01','01:01:01'
,'2001-01-01 01:01:01','t'],
# Force floats -> int and int -> float.
[2.0,2,"Ints <-> floats",'2002-02-02','02:02:02'
,'2002-02-02 02:02:02',1],
# Float -> int *only* works when there is no loss of precision.
# this row will fail to insert:
[3.1,3,"float -> int with trunc?",'2003-03-03','03:03:03'
,'2003-03-03 03:03:03',1],
# String values are converted into numbers
["4","4.4","Strings -> numbers", '2004-04-04','04:04:04',
,'2004-04-04 04:04:04',0],
# String -> numbers only works if the entire string can be
# converted into a number
["5 and a half","5.5","Strings -> numbers", '2005-05-05',
'05:05:05', ,'2005-05-05 05:05:05',0],
# Number are converted into string values automatically,
# assuming they fit into the column width.
[6,6.6,3.14159, '2006-06-06','06:06:06',
,'2006-06-06 06:06:06',0],
-234-
HP Vertica C lient Libraries
# There are some variations in the accepted date strings
[7,7.7,'Date/time formats', '07/07/2007','07:07:07',
,'07-07-2007 07:07:07',1],
);
# Create a prepared statement to use parameters for inserting values.
my $sth = $dbh->prepare_cached("INSERT into test values(?,?,?,?,?,?,?)");
my $rowcount = 0; # Count # of rows
# Loop through the arrays to insert values
foreach my $tuple (@data) {
$rowcount++;
# Insert the row
my $retval = $sth->execute(@$tuple);
# See if the row was successfully inserted.
if ($retval == 1) {
# Value of 1 means the row was inserted (1 row was affected by insert)
print "Row $rowcount successfully inserted\n";
} else {
print "Inserting row $rowcount failed with error " .
$sth->state . " " . $sth->errstr . "\n";
}
}
# Commit the data
$dbh->commit();
# Prepare a query to get the content of the table
$sth = $dbh->prepare("SELECT * FROM TEST ORDER BY C_ID ASC");
$sth->execute() or die "Error: " . $dbh->errstr;
my @row; # Need to pre-declare to use in the format statement.
# Use Perl formats to pretty print the output.
format STDOUT_TOP =
Int Float
VarChar
Date
Time
Timestamp
Bool
=== ===== ================== ========== ======== ================ ====
.
format STDOUT =
@>> @<<<< @<<<<<<<<<<<<<<<<< @<<<<<<<<< @<<<<<<< @<<<<<<<<<<<<<<< @<<<<
@row
.
# Loop through result rows while we have them
while (@row = $sth->fetchrow_array()) {
write; # Format command does the work of extracting the columsn from
# the array.
}
# Commit to stop Perl complaining about in-progress transactions.
$dbh->commit();
$dbh->disconnect();
The example produces the following output when run:
-235-
Programmer's Guide
Connection AutoCommit state is: 0
Row 1 successfully inserted
Row 2 successfully inserted
Inserting row 3 failed with error 01000 [Vertica][VerticaDSII] (20) An error
occurred during query execution: Row rejected by server; see server log for details
(SQL-01000)
Row 4 successfully inserted
Inserting row 5 failed with error 01000 [Vertica][VerticaDSII] (20) An error
occurred during query execution: Row rejected by server; see server log for details
(SQL-01000)
Row 6 successfully inserted
Row 7 successfully inserted
Int Float
VarChar
Date
Time
Timestamp
Bool
=== ===== ================== ========== ======== ================ ====
1 1.111 Matching datatypes 2001-01-01 01:01:01 2001-01-01 01:01 1
2 2
Ints <-> floats
2002-02-02 02:02:02 2002-02-02 02:02 1
4 4.4
Strings -> numbers 2004-04-04 04:04:04 2004-04-04 04:04 0
6 6.6
3.14159
2006-06-06 06:06:06 2006-06-06 06:06 0
7 7.7
Date/time formats
2007-07-07 07:07:07 2007-07-07 07:07 1
Perl Unicode Support
Perl supports Unicode data with some caveats. See the perlunicode
http://perldoc.perl.org/perlunicode.html and the perlunitut
http://perldoc.perl.org/perlunitut.html (Perl Unicode tutorial) manual pages for details. (Be sure
to see the copies of these manual pages included with the version of Perl installed on your client
system, as the support for Unicode has changed in recent versions of Perl.) Perl DBI and
DBD::ODBC also support Unicode, however DBD::ODBC must be compiled with Unicode
support. See the DBD::ODBC http://search.cpan.org/~mjevans/DBD-ODBC-1.43/ODBC.pm
documentation for details. You can check the DBD::ODBC-specific connection attribute named
odbc_has_unicode to see if Unicode support is enabled in the driver.
The following example Perl script demonstrates directly inserting UTF-8 strings into HP Vertica
and then reading them back. The example writes a text file with the output, since there are may
problems displaying Unicode characters in terminal windows or consoles.
-236-
HP Vertica C lient Libraries
#!/usr/bin/perl
use strict;
use DBI;
# Open a connection using a DSN.
my $dbh = DBI->connect("dbi:ODBC:VerticaDSN","ExampleUser","password123");
unless (defined $dbh) {
# Conection failed.
die "Failed to connect: $DBI::errstr";
}
# Output to a file. Displaying Unicode characters to a console or terminal
# window has many problems. This outputs a UTF-8 text file that can
# be handled by many Unicode-aware text editors:
open OUTFILE, '>:utf8', "unicodeout.txt";
# See if the DBD::ODBC driver was compiled with Unicode support. If this returns
# 1, your Perl script will get get strings from the driver with the UTF-8
# flag set on them, ensuring that Perl handles them correctly.
print OUTFILE "Was DBD::ODBC compiled with Unicode support? " .
$dbh->{odbc_has_unicode} . "\n";
# Create a table to hold VARCHARs
$dbh->do("DROP TABLE IF EXISTS TEST CASCADE;");
# Create a table to hold data. Remember that the width of the VARCHAR column
# is the number of bytes set aside to store strings, which often does not equal
# the number of characters it can hold when it comes to Unicode!
$dbh->do("CREATE TABLE test( C_VARCHAR VARCHAR(100) )");
print OUTFILE "Inserting data...\n";
# Use Do to perform simple inserts
$dbh->do("INSERT INTO test VALUES('Hello')");
# This string contains several non-latin accented characters and symbols, encoded
# with Unicode escape notation. They are converted by Perl into UTF -8 characters
$dbh->do("INSERT INTO test VALUES('My favorite band is " .
"\N{U+00DC}ml\N{U+00E4}\N{U+00FC}t \N{U+00D6}v\N{U+00EB}rk\N{U+00EF}ll" .
" \N{U+263A}')");
# Some Chinese (Simplified) characters. This again uses escape sequence
# that Perl translates into UTF-8 characters.
$dbh->do("INSERT INTO test VALUES('\x{4F60}\x{597D}')");
print OUTFILE "Getting data...\n";
# Prepare a query to get the content of the table
my $sth = $dbh->prepare_cached("SELECT * FROM test");
# Execute the query by calling execute on the statement handle
$sth->execute();
# Loop through result rows while we have them
while (my @row = $sth->fetchrow_array()) {
# Loop through the column values
foreach my $column (@row) {
print OUTFILE "$column\t";
}
-237-
Programmer's Guide
print OUTFILE "\n";
}
close OUTFILE;
$dbh->disconnect();
Viewing the unicodeout.txt file in a UTF-8-capable text editor or viewer displays:
Was DBD::ODBC compiled with Unicode support? 1
Inserting data...
Getting data...
My favorite band is Ümläüt Övërkïll ☺
你好
Hello
Note: Terminal windows and consoles often have problems properly displaying Unicode
characters. That is why the example writes the output to a text file. With some text editors, you
may need to manually set the encoding of the text file to UTF-8 in order for the characters to
properly appear (and the font used to display text must have a full Unicode character set). If the
character still do not show up, it may be that your version of DBD::ODBC was not compiled with
UTF-8 support.
See Also
 Unicode Character Encoding in the Concepts Guide.
 Additional ODBC Driver Configuration Settings (page 49)
-238-
Programming PHP Client Applications
You can connect to HP Vertica through PHP-ODBC using the Unix ODBC
http://www.unixodbc.org/ or iODBC http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/
library.
In order to use PHP with HP Vertica, you must install the following packages (and their
dependencies):





php
php-odbc
php-pdo
UnixODBC (if you are using the Unix ODBC driver)
libiodbc (if you are using the iODBC driver)
PHP on Linux
PHP is available with most Linux operating systems as a module for the Apache web server.
Check your particular Linux repository for PHP RPMs or Debian packages. You can also build
PHP from source. See the PHP web site for documentation and source downloads.
PHP on Windows
PHP is available for windows for both the Apache and IIS web servers. You can download PHP for
Windows and view installation instructions at the PHP web site.
The PHP ODBC Drivers
PHP supports both the UnixODBC http://www.unixodbc.org/ drivers and iODBC
http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/ drivers. Both drivers use PHP's ODBC
database abstraction layer.
Setup
You must read Programming ODBC Client Applications (page 54) before connecting to HP
Vertica through PHP. The following example ODBC configuration entries detail the typical settings
required for PHP ODBC connections. The driver location assumes you have copied the HP
Vertica drivers to /usr/lib64.
Example odbc.ini
[ODBC Data Sources]
VerticaDSNunixodbc = exampledb
VerticaDNSiodbc = exampledb2
[VerticaDSNunixodbc]
Description = VerticaDSN Unix ODBC driver
Driver = /usr/lib64/libverticaodbc.so
Database = Telecom
Servername = localhost
UserName = dbadmin
Password =
Port = 5433
[VerticaDSNiodbc]
-239-
Programmer's Guide
Description = VerticaDSN iODBC driver
Driver = /usr/lib64/libverticaodbc.so
Database = Telecom
Servername = localhost
UserName = dbadmin
Password =
Port = 5433
Example odbcinst.ini
# Vertica
[VerticaDSNunixodbc]
Description = VerticaDSN Unix ODBC driver
Driver = /usr/lib64/libverticaodbc.so
[VerticaDNSiodbc]
Description = VerticaDSN iODBC driver
Driver = /usr/lib64/libverticaodbc.so
[ODBC]
Threading = 1
Verify the HP Vertica UnixODBC or iODBC Library
Verify the HP Vertica UnixODBC library can load all dependant libraries with the following
command (assuming you have copies the libraries to /usr/lib64):
For example:
ldd /usr/lib64/libverticaodbc.so
You must resolve any "not found" libraries before continuing.
Test your ODBC connection
Test your ODBC connection with the following.
isql -v VerticaDSN
PHP Unicode Support
PHP does not offer native Unicode support. PHP only supports a 256-character set. However,
PHP provides the UTF-8 functions utf8_encode() and utf8_decode()
http://www.php.net/manual/en/function.utf8-decode.php to provide some basic Unicode
functionality.
See the PHP manual for strings http://php.net/manual/en/language.types.string.php for more
details about PHP and Unicode.
-240-
HP Vertica C lient Libraries
Querying the Database Using PHP
The example script below details the use of PHP ODBC functions to connect to the HP Vertica
Analytics Platform.
-241-
Programmer's Guide
<?php
# Turn on error reporting
error_reporting(E_ERROR | E_WARNING | E_PARSE | E_NOTICE);
# A simple function to trap errors from queries
function errortrap_odbc($conn, $sql) {
if(!$rs = odbc_exec($conn,$sql)) {
echo "<br/>Failed to execute SQL: $sql<br/>" . odbc_errormsg($conn);
} else {
echo "<br/>Success: " . $sql;
}
return $rs;
}
# Connect to the Database
$dsn = "VerticaDSNunixodbc";
$conn = odbc_connect($dsn,'','') or die ("<br/>CONNECTION ERROR");
echo "<p>Connected with DSN: $dsn</p>";
# Create a table
$sql = "CREATE TABLE TEST(
C_ID INT,
C_FP FLOAT,
C_VARCHAR VARCHAR(100),
C_DATE DATE, C_TIME TIME,
C_TS TIMESTAMP,
C_BOOL BOOL)";
$result = errortrap_odbc($conn, $sql);
# Insert data into the table with a standard SQL statement
$sql = "INSERT into test values(1,1.1,'abcdefg1234567890','1901-01-01','23:12:34
','1901-01-01 09:00:09','t')";
$result = errortrap_odbc($conn, $sql);
# Insert data into the table with odbc_prepare and odbc_execute
$values = array(2,2.28,'abcdefg1234567890','1901-01-01','23:12:34','1901-01-01
0
9:00:09','t');
$statement = odbc_prepare($conn,"INSERT into test values(?,?,?,?,?,?,?)");
if(!$result = odbc_execute($statement, $values)) {
echo "<br/>odbc_execute Failed!";
} else {
echo "<br/>Success: odbc_execute.";
}
# Get the data from the table and display it
$sql = "SELECT * FROM TEST";
if($result = errortrap_odbc($conn, $sql)) {
echo "<pre>";
while($row = odbc_fetch_array($result) ) {
-242-
HP Vertica C lient Libraries
print_r($row);
}
echo "</pre>";
}
# Drop the table and projection
$sql = "DROP TABLE TEST CASCADE";
$result = errortrap_odbc($conn, $sql);
# Close the ODBC connection
odbc_close($conn);
?>
Example Output
The following is the example output from the script.
Success: CREATE TABLE TEST( C_ID INT, C_FP FLOAT, C_VARCHAR VARCHAR(100), C_DATE
DATE, C_TIME TIME, C_TS TIMESTAMP, C_BOOL BOOL)
Success: INSERT into test
values(1,1.1,'abcdefg1234567890','1901-01-01','23:12:34 ','1901-01-01
09:00:09','t')
Success: odbc_execute.
Success: SELECT * FROM TEST
Array
(
[C_ID] => 1
[C_FP] => 1.1
[C_VARCHAR] => abcdefg1234567890
[C_DATE] => 1901-01-01
[C_TIME] => 23:12:34
[C_TS] => 1901-01-01 09:00:09
[C_BOOL] => 1
)
Array
(
[C_ID] => 2
[C_FP] => 2.28
[C_VARCHAR] => abcdefg1234567890
[C_DATE] => 1901-01-01
[C_TIME] => 23:12:34
[C_TS] => 1901-01-01 23:12:34
[C_BOOL] => 1
)
Success: DROP TABLE TEST CASCADE
-243-
Using vsql
vsql is a character-based, interactive, front-end utility that lets you type SQL statements and see
the results. It also provides a number of meta-commands and various shell-like features that
facilitate writing scripts and automating a variety of tasks.
If you are using the vsql client installed on the server, then you can connect from the:


Administration Tools (page 246)
Linux command line (page 247)
You can also install the vsql client for other supported platforms.
A man page is available for vsql. If you are running as the dbadmin user, simply type: man vsql.
If you are running as a different user, type: man -M /opt/vertica/man vsql.
General Notes
 SQL statements can be spread over several lines for clarity.
 vsql can handles input and output in UTF-8 encoding. Note that the terminal emulator running
vsql must be set up to display the UTF-8 characters correctly. Follow the documentation of
your terminal emulator. The following example shows the settings in PuTTy from the Change
Settings > Window > Translation option:




See also Best Practices for Working with Locales.
Cancel SQL statements by typing Ctrl+C.
Traverse command history by typing Ctrl+R.
When you disconnect a user session, any transactions in progress are automatically rolled
back.
To view wide result sets, use the Linux less utility to truncate long lines.
-244-
Using vsql
1. Before connecting to the database, specify that you want to use less for query output:
$ export PAGER=less
2. Connect to the database.
3. Query a wide table:
=> select * from wide_table;
4. At the less prompt, type:
-S

If a shell running vsql fails (crashes or freezes), the vsql processes continue to run even if you
stop the database. In that case, log in as root on the machine on which the shell was running
and manually kill the vsql process. For example:
# ps -ef | grep vertica
fred
2401
1 0 06:02 pts/1
00:00:00 /opt/vertica/bin/vsql -p
5433 -h test01_site01 quick_start_single
# kill -9 2401
Installing the vsql Client
The vsql client is installed as part of the HP Vertica server rpm, but it is also available as a
download for other Unix-based systems such as HP-UX, AIX, and Mac OSX.
How to install vsql on Unix-based systems:
1 Use a web browser to log in to the myVertica portal http://my.vertica.com/.
2 Click the Download tab, scroll down to the CLIENT PACKAGES section and download the
appropriate vsql client from the HP Vertica downloads page. Note that there are both 32-bit
and 64-bit versions for most platforms.
3 Extract the tarball. The tarball is organized to extract into /opt/vertica if you extract it at the root
(/) of the drive.
4 Optionally add the directory where the vsql client resides to your path.
5 Verify mode on the vsql file is executable. For example: chmod ugo+x vsql_VERSION
6 Set your shell locale to a locale supported by vsql. For example, in your .profile, add, export
LANG=en_US.UTF-8
Installing vsql on Windows:
vsql on Windows is installed as part of the Windows Client Driver package. To install vsql on
windows see the instructions for installing the Windows Client Driver package.
See vsql Notes for Windows Users for details on using vsql in a windows console.
-245-
Programmer's Guide
vsql Notes for Windows Users
Font
Set the console font to "Lucida Console", because the raster font does not work well with the ANSI
code page.
Console Encoding
vsql is built as a "console application." The Windows console windows use a different encoding
than the rest of the system, so take care when you use 8-bit characters within vsql. If vsql detects
a problematic console code page, it warns you at startup. To change the console code page, two
things are necessary:

Set the code page by entering cmd.exe /c chcp 1252.
1252 is a code page that is appropriate for European languages; replace it with your preferred
locale code page.
Running Under Cygwin
Verify that your cygwin.bat file does not include the "tty" flag. If the "tty" flag is included in your
cywgin.bat file, then banners and prompts are not displayed in vsql.
For example, change:
set CYGWIN=binmode tty ntsec
to:
set CYGWIN=binmode ntsec
Additionally, when running under Cygwin, vsql uses Cygwin shell conventions as opposed to
Windows console conventions.
Tab Completion
Tab completion is a function of the shell, not vsql. Because of this, tab completion does not work
the same way in Windows vsql as it does on Linux versions of vsql.
On Windows, instead of using tab-completion. Press F7 to pop-up a history window of commands,
or press F8 after typing a few letters of a command to cycle through commands in the history
buffer starting with the same letters.
Connecting From the Administration Tools
You can use the Administration Tools to connect to a database using vsql on any node in the
cluster.
-246-
Using vsql
1 Log in as any user that does not have root privileges. (HP Vertica does not allow users with
root privileges to connect to a database for security reasons).
2 Run the Administration Tools.
/opt/vertica/bin/admintools
3 On the Main Menu, select Connect to Database.
4 Supply the database password if asked:
Password:
When you create a new user with the CREATE USER command, you can configure the
password or leave it empty. You cannot bypass the password if the user was created with a
password configured. You can change a user's password using the ALTER USER command.
5 The Administration Tools connect to the database and transfer control to vsql.
Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type:
\h or \? for help with vsql commands
\g or terminate with semicolon to execute query
\q to quit
=>
Note: See Meta-Commands (page 252) for the various commands you can run while
connected to the database through the Administration Tools.
Connecting from the Command Line
You can use vsql from the command line to connect to a database from any Linux machine,
including those not part of the cluster. Copy /opt/vertica/bin/vsql to your machine.
Syntax
/opt/vertica/bin/vsql [ option...] [ dbname [ username ] ]
Parameters
option
One or more of the vsql command line options (page
248)
dbname
The name of the target databas e
-247-
Programmer's Guide
username
The name of the user to connect as
Notes
 If the database is password protected, you must specify the -w or --password command line
option.
 The default dbname and username is your Linux user name.
 If the connection cannot be made for any reason (for example, insufficient privileges, server is
not running on the targeted host, etc.), vsql returns an error and terminates.
 vsql returns the following informational messages:
 0 to the shell if it finished normally
 1 if a fatal error of its own (out of memory, file not found) occurs
 2 if the connection to the server went bad and the session was not interactive
 3 if an error occurred in a script and the variable ON_ERROR_STOP was set
 Unrecognized words in the command line might be interpreted as database or user names.
Example
The following example redirects vsql output and error messages into an output file called
retail_queries.out and captures any error messages:
$ vsql --echo-all < retail_queries.sql > retail_queries.out 2>&1
Command Line Options
This section contains the command-line options for vsql:
-? --help displays help about vsql command line arguments and exits.
-a --echo-all prints all input lines to standard output as they are read. This is more useful for
script processing than interactive mode. It is equivalent to setting the variable ECHO (page 274) to
all.
-A --no-align switches to unaligned output mode. (The default output mode is aligned.)
-c command --command command runs one command and exits. This is useful in shell scripts.
The command must be either a command string that can be completely parsed by the server (it
contains no vsql specific features), or a single meta-command. In other words, you cannot mix
SQL and vsql meta-commands. To achieve that, you can pipe the string into vsql like this:
echo "\\timing\\\\select * from t" | ../Linux64/bin/vsql
Timing is on.
i | c | v
---+---+--(0 rows)
Note: If you use double quotes with echo, you must double the backslashes.
-d dbname --dbname dbname specifies the name of the database to connect to. This is
equivalent to specifying dbname as the first non-option argument on the command line.
-248-
Using vsql
-e --echo-queries copies all SQL commands sent to the server to standard output as well.
This is equivalent to setting the variable ECHO (page 274) to queries.
-E displays queries generated by internal commands.
-f filename --file filename uses the file filename as the source of commands instead of
reading commands interactively. After the file is processed, vsql terminates. This is in many ways
equivalent to the internal command \i (see "i FILE" on page 265).
If filename is - (hyphen), the standard input is read.
Using this option is subtly different from writing vsql < filename. In general, both do what you
expect, but using -f enables some nice features such as error messages with line numbers.
There is also a slight chance that using this option reduces the start-up overhead. On the other
hand, the variant using the shell's input redirection is (in theory) guaranteed to yield exactly t he
same output that you would have gotten had you entered everything by hand.
Using f filename to Read Data Piped into vsql
To read data piped into vsql from a data file:
1. Create the following:
 A named pipe. For example, to create a named pipe called pipe1: mkfifo pipe1
 A data file. The data file in this example is called data_file.
 The command file that selects the table into which you want to copy data, copies the data
from the pipe file (pipe1), and removes the pipe file. The command file in this example is
called command_line.
1. From the command line, run a command that pipes the data file (data_file) into the
appropriate table through vsql. The following example pipes the data file into
public.shipping_dimension in the VMart database.
cat data_file > pipe1 | vsql -f 'command_line'
Example data_file:
110|EXPRESS|SEA|FEDEX
111|EXPRESS|HAND CARRY|MSC
112|OVERNIGHT|COURIER|USPS
Example command_line file:
SELECT * FROM public.shipping_dimension;
\set dir `pwd`/
\set file '''':dir'pipe1'''
COPY public.shipping_dimension FROM :file delimiter '|';
SELECT * FROM public.shipping_dimension;
--Remove the pipe1
\! rm pipe1
-249-
Programmer's Guide
-F separator --field-separator separator specifies the field separator for unaligned
output (default: "|") (-P fieldsep=). (See -A --no-align.) This is equivalent to \pset (page
267) fieldsep or \f (see "f [ string ]" on page 263).
-h hostname --host hostname specifies the host name of the machine on which the server
is running.
Notes:
 If you are using client authentication with a connection method of either "gss" or" "krb5"
(Kerberos), you are required to specify -h hostname.
 If you are using client authentication with a "local" connection type specified, avoid using
-h hostname if you want to match the client authentication entry.
-H --html turns on HTML tabular output. This is equivalent to \pset (page 267) format
html or the \H (see "H" on page 264) command.
-l --list returns all available databases, then exits. Other non-connection options are ignored.
This command is similar to the internal command \list.
-n disables command line editing.
-o filename --output filename writes all query output into file filename. This is equivalent
to the command \o (page 266).
-p port --port port specifies the TCP port or the local socket file extension on which the
server is listening for connections. Defaults to port 5433.
-P assignment --pset assignment lets you specify printing options in the style of \pset
(page 267) on the command line. Note that you have to separate name and value with an equal
sign instead of a space. Thus to set the output format to LaTeX, you could write -P
format=latex.
-q --quiet specifies that vsql do its work quietly. By default, it prints welcome messages and
various informational output. If this option is used, none of this appears. This is useful with the -c
option. Within vsql you can also set the QUIET (page 276) variable to achieve the same effect.
-R separator --record-separator separator uses separator as the record separator.
This is equivalent to the \pset (page 267) recordsep command.
-s --single-step runs in single-step mode for debugging scripts. Forces vsql to prompt before
each statement is sent to the database and allows you to cancel execution.
-S --single-line runs in single-line mode where a newline terminates a SQL command, like
the semicolon does.
Note: This mode is provided for those who insist on it, but you are not necessarily encouraged
to use it, particularly if you mix SQL and meta-commands on a line. The order of execution
might not always be clear to the inexperienced user.
-t --tuples-only disables printing of column names, result row count footers, and so on. This
is equivalent to the \t (see "t" on page 271) command.
-250-
Using vsql
-T table_options --table-attr table_options allows you to specify options to be
placed within the HTML table tag. See \pset (page 267) for details.
-U username --username username connects to the database as the user username
instead of the default.
-v assignment --set assignment --variable assignment performs a variable
assignment, like the \set (see "set [ NAME [ VALUE [ ... ] ] ]" on page 270) internal
command.
Note: You must separate name and value, if any, by an equal sign on the command line.
To unset a variable, omit the equal sign. To set a variable without a value, use the equal sign
but omit the value. These assignments are done during a very early stage of start-up, so
variables reserved for internal purposes can get overwritten later.
-V --version prints the vsql version and exits.
-w password specifies the password for a database user.
Note: Using this command line option displays the database password in plain text on the
screen. Use it with care, particularly if you are connecting as the database administrator.
-W --password forces vsql to prompt for a password before connecting to a database.
The password is not displayed on the screen. This option remains set for the entire session,
even if you change the database connection with the meta-command \connect (see "c (or
\connect) [ dbname [ username ] ]" on page 254).
-x --expanded enables extended table formatting mode. This is equivalent to the command \ x
(see "x" on page 272).
-X, --no-vsqlrc prevents the start-up file from being read (the system-wide vsqlrc file or the
user's ~/.vsqlrc file).
Connecting From a Non-Cluster Host
You can use the HP Vertica vsql executable image on a non-cluster Linux host to connect to an
HP Vertica database.


On Red Hat 5.0 64-bit and SUSE 10/11 64-bit, you can install the client driver RPM, which
includes the vsql executable. See Installing the Client RPM on Red Hat and SUSE (page
21) for details.
If the non-cluster host is running the same version of Linux as the cluster, copy the image file to
the remote system. For example:
$ scp host01:/opt/vertica/bin/vsql .
$ ./vsql
-251-
Programmer's Guide

If the non-cluster host is running a different version of Linux than your cluster hosts, and that
operating system is not Red Hat version 5 64-bit or SUSE 10/11 64-bit, you must install the HP
Vertica server RPM in order to get vsql. Download the appropriate rpm package from the
Download tab of the myVertica portal http://my.vertica.com/ then log into the non-cluster
host as root and install the rpm package using the command:
# rpm -Uvh filename
In the above command, filename is package you downloaded. Note that you do not have to run
the install_HP Vertica script on the non-cluster host in order to use vsql.
Notes
 Use the same command line options (page 248) that you would on a cluster host.
 You cannot run vsql on a Cygwin bash shell (Windows). Use ssh to connect to a cluster host,
then run vsql.
Meta-Commands
Anything you enter in vsql that begins with an unquoted backslash is a vsql meta-command that is
processed by vsql itself. These commands help make vsql more useful for administration or
scripting. Meta-commands are more commonly called slash or backslash commands.
The format of a vsql command is the backslash, followed immediately by a command verb, then
any arguments. The arguments are separated from the command verb and each other by any
number of whitespace characters.
To include whitespace into an argument you can quote it with a single quote. To include a single
quote into such an argument, precede it by a backslash. Anything contained in single quotes is
furthermore subject to C-like substitutions for \n (new line), \t (tab), \digits, \0digits, and
\0xdigits (the character with the given decimal, octal, or hexadecimal code).
If an unquoted argument begins with a colon (:), it is taken as a vsql variable and the value of the
variable is used as the argument instead.
Arguments that are enclosed in backquotes (`) are taken as a command line that is passed to the
shell. The output of the command (with any trailing newline removed) is taken as the argument
value. The above escape sequences also apply in backquotes.
Some commands take a SQL identifier (such as a table name) as argument. These arguments
follow the syntax rules of SQL: Unquoted letters are forced to lowercase, while double quotes (")
protect letters from case conversion and allow incorporation of whitespace into the identifier.
Within double quotes, paired double quotes reduce to a single double quote in the resulting name.
For example, FOO"BAR"BAZ is interpreted as fooBARbaz, and "A weird"" name" becomes A
weird" name.
Parsing for arguments stops when another unquoted backslash occurs. This is taken as the
beginning of a new meta-command. The special sequence \\ (two backslashes) marks the end of
arguments and continues parsing SQL commands, if any. That way SQL and vsql commands can
be freely mixed on a line. But in any case, the arguments of a meta-command cannot continue
beyond the end of the line.
-252-
Using vsql
! [ COMMAND ]
\! [ COMMAND ] executes a command in a Linux shell (passing arguments as entered) or starts
an interactive shell.
?
\? displays help information about the meta-commands. Works the same as \h (see "h \help" on
page 264).
=> \?
General
\c[onnect] [DBNAME|- [USER]]
connect to new database (currently "VMart")
\cd [DIR]
change the current working directory
\q
quit vsql
\set [NAME [VALUE]]
set internal variable, or list all if no parameters
\timing
toggle timing of commands (currently off)
\unset NAME
unset (delete) internal variable
\! [COMMAND]
execute command in shell or start interactive shell
\password [USER]
change user's password
Query
\e
\g
\g
\g
\p
\r
\s
\w
Buffer
[FILE]
FILE
| COMMAND
[FILE]
FILE
edit the query buffer (or file) with external editor
send query buffer to server
send query buffer to server and results to file
send query buffer to server and pipe results to command
show the contents of the query buffer
reset (clear) the query buffer
display history or save it to file
write query buffer to file
Input/Output
\echo [STRING] write string to standard output
\i FILE
execute commands from file
\o FILE
send all query results to file
\o | COMMAND
pipe all query results to command
\o
close query-results file or pipe
\qecho [STRING]
write string to query output stream (see \o)
Informational
\d [PATTERN]
describe tables (list tables if no argument is supplied)
-253-
Programmer's Guide
\df [PATTERN]
\dj [PATTERN]
\dn [PATTERN]
\dp [PATTERN]
\ds [PATTERN]
\dS [PATTERN]
\dt [PATTERN]
\dtv [PATTERN]
\dT [PATTERN]
\du [PATTERN]
\dv [PATTERN]
\l
\z [PATTERN]
list
list
list
list
list
list
list
list
list
list
list
list
list
functions
projections
schemas
table access privileges
sequences
system tables
tables
tables and views
data types
users
views
all databases
table access privileges (same as \dp)
Formatting
\a
toggle between unaligned and aligned output mode
\b
toggle beep on command completion
\C [STRING]
set table title, or unset if none
\f [STRING]
show or set field separator for unaligned query output
\H
toggle HTML output mode (currently off)
\pset NAME [VALUE]
set table output option
(NAME := {format|border|expanded|fieldsep|footer|null|
recordsep|tuples_only|title|tableattr|pager})
\t
show only rows (currently off)
\T [STRING]
set HTML <table> tag attributes, or unset if none
\x
toggle expanded output (currently off)
a
\a toggles output format alignment. This command is kept for backwards compatibility. See \pset
(page 267) for a more general solution.
\a is similar to the command line option -A --no-align, which only disables alignment.
b
\b toggles beep on command completion.
c (or \connect) [ dbname [ username ] ]
\c (or \connect) [ dbname [ username ] ] establishes a connection to a new database
and/or under a user name. The previous connection is closed. If dbname is - the current database
name is assumed.
If username is omitted the current user name is assumed.
-254-
Using vsql
As a special rule, \connect without any arguments connects to the default database as the
default user (as you would have gotten by starting vsql without any arguments).
If the connection attempt fails (wrong user name, access denied, etc.), the previous connection is
kept if and only if vsql is in interactive mode. When executing a non-interactive script, processing
immediately stops with an error. This distinction that avoids typos and a prevent scripts from
accidentally acting on the wrong database.
C [ STRING ]
\C [ STRING ] sets the title of any tables being printed as the result of a query or unsets any such
title. This command is equivalent to \pset (page 267) title title. (The name of this command
derives from "caption", as it was previously only used to set the caption in an HTML table.)
cd [ DIR ]
\cd [ DIR ] changes the current working directory to directory. Without argument, changes to
the current user's home directory.
To print your current working directory, use \! (see "! [ COMMAND ]" on page 253)pwd. For
example:
=> \!pwd
/home/dbadmin
The \d [ PATTERN ] meta-commands
This section describes the various \d meta-commands
All \d meta-commands take an optional pattern (asterisk [ * ] or question mark [ ? ]) and return
only the records that match that pattern.
The ? argument is useful if you can't remember if a table name uses an underscore or a dash:
=> \dn v?internal
List of schemas
Name
| Owner
------------+--------v_internal | dbadmin
(1 row)
The output from the \d metacommands places double quotes around non-alphanumeric table
names and table names that are keywords, such as in the following example.
=> CREATE TABLE my_keywords.precision(x numeric (4,2));
CREATE TABLE
=> \d
List of tables
Schema
|
Name
| Kind | Owner
--------------+-----------------------+-------+---------
-255-
Programmer's Guide
my_keywords
| "precision"
| table | dbadmin
Double quotes are optional when you use a \d command with pattern matching.
d [ PATTERN ]
The \d [ PATTERN ] meta-command lists all tables in the database and returns their schema,
table name, kind (e.g., table), and owner. For example, the following is the result of \d in the vmart
schema.
vmartdb=> \d
List of tables
Schema
|
Name
| Kind | Owner
--------------+-----------------------+-------+--------online_sales | call_center_dimension | table | dbadmin
online_sales | online_page_dimension | table | dbadmin
online_sales | online_sales_fact
| table | dbadmin
public
| customer_dimension
| table | dbadmin
public
| date_dimension
| table | dbadmin
public
| employee_dimension
| table | dbadmin
public
| inventory_fact
| table | dbadmin
public
| product_dimension
| table | dbadmin
public
| promotion_dimension
| table | dbadmin
public
| shipping_dimension
| table | dbadmin
public
| vendor_dimension
| table | dbadmin
public
| warehouse_dimension
| table | dbadmin
store
| store_dimension
| table | dbadmin
store
| store_orders_fact
| table | dbadmin
store
| store_sales_fact
| table | dbadmin
(15 rows)
If you provide the table name as an argument, the result shows the schema name, table name,
column name, column data type, data type size, default value, whether it is Nullable or has a NOT
NULL constraint, and whether there is a primary key or foreign key constraint.
vmartdb=> \d inventory_fact
List of Fields by Tables
Schema |
Table
|
Column
| Type | Size | Default | Not Null | Primary Key |
Foreign Key
--------+----------------+-----------------+------+------+---------+----------+-------------+-----------------------------------------public | inventory_fact | date_key
| int |
8 |
| t
| f
|
public.date_dimension(date_key)
public | inventory_fact | product_key
| int |
8 |
| t
| f
|
public.product_dimension(product_key)
public | inventory_fact | product_version | int |
8 |
| t
| f
|
public.product_dimension(product_version)
public | inventory_fact | warehouse_key
| int |
8 |
| t
| f
|
public.warehouse_dimension(warehouse_key)
public | inventory_fact | qty_in_stock
| int |
8 |
| f
| f
|
(5 rows)
You can also use the question mark [ ? ] argument to replace a single character. For example, the
? argument replaces the last character in the SubQ1 and SubQ2 tables, so the command returns
information about both:
=> \d SubQ?
List of Fields by Tables
-256-
Using vsql
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+--------+------+------+---------+----------+-------------+------------public | SubQ1 | a
| int |
8 |
| f
| f
|
public | SubQ1 | b
| int |
8 |
| f
| f
|
public | SubQ1 | c
| int |
8 |
| f
| f
|
public | SubQ2 | x
| int |
8 |
| f
| f
|
public | SubQ2 | y
| int |
8 |
| f
| f
|
public | SubQ2 | z
| int |
8 |
| f
| f
|
(6 rows)
df [ PATTERN ]
The \df [ PATTERN ] meta-command returns all function names, the function return data type,
and the function argument data type. Also returns the procedure names and arguments for all
procedures that are available to the user.
vmartdb=> \df
List of functions
procedure_name | procedure_return_type | procedure_argument_types
-----------------+-----------------------+--------------------------abs
| Float
| Float
abs
| Integer
| Integer
abs
| Interval
| Interval
abs
| Interval
| Interval
abs
| Numeric
| Numeric
acos
| Float
| Float
add_location
| Varchar
| Varchar
add_location
| Varchar
| Varchar, Varchar, Varchar
...
width_bucket
| Integer
| Float, Float, Float, Integer
width_bucket
| Integer
| Interval, Interval, Interval, Integer
width_bucket
| Integer
| Interval, Interval, Interval, Integer
width_bucket
| Integer
| Timestamp, Timestamp, Timestamp,
Integer
...
The following example uses the wildcard character to search for all functions that begin with as:
vmartdb=> \df as*
List of functions
procedure_name | procedure_return_type | procedure_argument_types
----------------+-----------------------+-------------------------ascii
| Integer
| Varchar
asin
| Float
| Float
(2 rows)
dj [ PATTERN ]
The \dj [ PATTERN ] meta-command returns all projections showing the schema, projection
name, owner, and node:
vmartdb=> \dj
List of projections
-257-
Programmer's Guide
Schema
|
Name
| Owner |
Node
--------------+--------------------------------+---------+-------------------public
| product_dimension_node0001
| dbadmin | v_wmartdb_node0001
public
| product_dimension_node0002
| dbadmin | v_wmartdb_node0002
public
| product_dimension_node0003
| dbadmin | v_wmartdb_node0003
online_sales | call_center_dimension_node0001 | dbadmin | v_wmartdb_node0001
online_sales | call_center_dimension_node0002 | dbadmin | v_wmartdb_node0002
online_sales | call_center_dimension_node0003 | dbadmin | v_wmartdb_node0003
...
If you supply a projection name as an argument, the system returns fewer records:
vmartdb=> \dj call_center_dimension_n*
List of projections
Schema
|
Name
| Owner |
Node
--------------+--------------------------------+---------+-------------------online_sales | call_center_dimension_node0001 | dbadmin | v_wmartdb_node0001
online_sales | call_center_dimension_node0002 | dbadmin | v_wmartdb_node0002
online_sales | call_center_dimension_node0003 | dbadmin | v_wmartdb_node0003
(3 rows)
dn [ PATTERN ]
The \dn [ PATTERN ] meta-command returns the schema names and schema owner.
vmartdb=> \dn
List of schemas
Name
| Owner
--------------+--------v_internal
| dbadmin
v_catalog
| dbadmin
v_monitor
| dbadmin
public
| dbadmin
store
| dbadmin
online_sales | dbadmin
(6 rows)
The following command returns all schemas that begin with the letter v:
=> \dn v*
List of schemas
Name
| Owner
------------+--------v_internal | dbadmin
v_catalog | dbadmin
v_monitor | dbadmin
(3 rows)
-258-
Using vsql
dp [ PATTERN ]
The \dp [ PATTERN ] meta-command returns the grantee, grantor, privileges, schema, and
name for all table access privileges in each schema:
vmartdb=> \dp
Access privileges for database "vmartdb"
Grantee | Grantor | Privileges | Schema |
Name
---------+---------+------------+--------+-----------| dbadmin | USAGE
|
| public
| dbadmin | USAGE
|
| v_internal
| dbadmin | USAGE
|
| v_catalog
| dbadmin | USAGE
|
| v_monitor
(4 rows)
Note: \dp is the same as \z (see "z" on page 272).
ds [ PATTERN ]
The \ds [ PATTERN ]meta-command (lowercase s) returns a list of sequences and their
parameters.
The following series of commands creates a sequence called my_seq and uses the vsql command
to display its parameters:
=> CREATE SEQUENCE my_seq MAXVALUE 5000 START 150;
CREATE SEQUENCE
=> \ds
List of Sequences
Schema | Sequence | CurrentValue | IncrementBy | Minimum | Maximum | AllowCycle
--------+----------+--------------+-------------+---------+---------+----------public | my_seq
|
149 |
1 |
1 |
5000 | f
(1 row)
Note: You can return additional information about sequences by issuing SELECT * FROM
V_CATALOG_SEQUENCES, as described in the SQL Reference Manual.
dS [ PATTERN ]
The \dS [ PATTERN ] meta-command (uppercase S) returns all system table (monitoring AP I)
names. You can get identical results issuing SELECT * FROM system_tables;
vmartdb=> \dS
List of tables
Schema
|
Name
| Kind |
Description
-----------+--------------------+--------+------------------------------------------------
-259-
Programmer's Guide
v_catalog
v_catalog
v_catalog
v_catalog
v_catalog
policy
v_catalog
v_catalog
v_catalog
v_catalog
v_catalog
...
v_monitor
v_monitor
each node
v_monitor
v_monitor
...
| columns
| dual
| foreign_keys
| grants
| passwords
|
|
|
|
|
primary_keys
profile_parameters
profiles
projection_columns
projections
| system
| system
| system
| system
| system |
|
|
|
|
|
system
system
system
system
system
| Table column information
| Oracle(TM) compatibility DUAL table
| Foreign key information
| Grant information
User password history and password reuse
|
|
|
|
|
Primary key information
Profile Parameters information
Profile information
Projection columns information
Projection information
| host_resources
| load_streams
| system | Per host profiling information
| system | Load metrics for each load stream on
| locks
| node_resources
| system | Lock grants and requests for all nodes
| system | Per node profiling information
dt [ PATTERN ]
The \dt [ PATTERN ] meta-command (lowercase t) is identical to \d and returns all tables in the
database—unless a table name is specified—in which case the command lists only the schema,
name, kind and owner for the specified table (or tables if wildcards used).
vmartdb=> \dt inventory_fact
List of tables
Schema |
Name
| Kind | Owner
--------+----------------+-------+--------public | inventory_fact | table | dbadmin
(1 row)
The following command returns all table names that begin with "st":
vmartdb=> \dt st*
List of tables
Schema |
Name
| Kind | Owner
--------+-------------------+-------+--------store | store_dimension
| table | dbadmin
store | store_orders_fact | table | dbadmin
store | store_sales_fact | table | dbadmin
(3 rows)
dT [ PATTERN ]
The \dT [ PATTERN ] meta-command (uppercase T) lists all supported data types.
vmartdb=> \dT
List of data types
type_name
------------Binary
-260-
Using vsql
Boolean
Char
Date
Float
Integer
Interval Day
Interval Day to Hour
Interval Day to Minute
Interval Day to Second
Interval Hour
Interval Hour to Minute
Interval Hour to Second
Interval Minute
Interval Minute to Second
Interval Month
Interval Second
Interval Year
Interval Year to Month
Numeric
Time
TimeTz
Timestamp
TimestampTz
Varbinary
Varchar
(26 rows)
dtv [ PATTERN ]
The \dtv [ PATTERN ] meta-command lists all tables and views, returning the schema, table or
view name, kind (table of view), and owner.
vmartdb=> \dtv
List of tables
Schema
|
Name
| Kind | Owner
--------------+-----------------------+-------+--------online_sales | call_center_dimension | table | release
online_sales | online_page_dimension | table | release
online_sales | online_sales_fact
| table | release
public
| customer_dimension
| table | release
public
| date_dimension
| table | release
public
| employee_dimension
| table | release
public
| inventory_fact
| table | release
public
| product_dimension
| table | release
public
| promotion_dimension
| table | release
public
| shipping_dimension
| table | release
public
| vendor_dimension
| table | release
public
| warehouse_dimension
| table | release
store
| store_dimension
| table | release
store
| store_orders_fact
| table | release
store
| store_sales_fact
| table | release
(15 rows)
-261-
Programmer's Guide
du [ PATTERN ]
The \du [ PATTERN ] meta-command returns all database users and attributes, such as if user
is a superuser.
vmartdb=> \du
List of users
User name | Is Superuser
-----------+-------------dbadmin
| t
(1 row)
dv [ PATTERN ]
The \dv [ PATTERN ] meta-command returns the schema name, view name, and view owner.
The following example defines a view using the SEQUENCES system table:
vmartdb=> CREATE VIEW my_seqview AS (SELECT * FROM sequences);
CREATE VIEW
vmartdb=> \dv
List of views
Schema |
Name
| Owner
--------+------------+--------public | my_seqview | dbadmin
(1 row)
If a view name is provided as an argument, the result shows the schema, view name, and the
following for all columns within the view's result set: schema name, view name, column name,
column data type, and data type size.
vmartdb=> \dv my_seqview
List of View Fields
Schema |
View
|
Column
|
Type
| Size
--------+------------+---------------------+--------------+-----public | my_seqview | sequence_schema
| varchar(128) | 128
public | my_seqview | sequence_name
| varchar(128) | 128
public | my_seqview | owner_name
| varchar(128) | 128
public | my_seqview | identity_table_name | varchar(128) | 128
public | my_seqview | session_cache_count | int
|
8
public | my_seqview | allow_cycle
| boolean
|
1
public | my_seqview | output_ordered
| boolean
|
1
public | my_seqview | increment_by
| int
|
8
public | my_seqview | minimum
| int
|
8
public | my_seqview | maximum
| int
|
8
public | my_seqview | current_value
| int
|
8
public | my_seqview | sequence_schema_id | int
|
8
public | my_seqview | sequence_id
| int
|
8
-262-
Using vsql
public | my_seqview | owner_id
public | my_seqview | identity_table_id
(15 rows)
| int
| int
|
|
8
8
e \edit [ FILE ]
\e \edit [ FILE ] edits the query buffer (or specified file) with an external editor. When the
editor exits, its content is copied back to the query buffer. If no argument is given, the current query
buffer is copied to a temporary file which is then edited in the same fashion.
The new query buffer is then re-parsed according to the normal rules of vsql, where the whole
buffer up to the first semicolon is treated as a single line. (Thus you cannot make scripts this way.
Use \i (see "i FILE" on page 265) for that.) If there is no semicolon, vsql waits for one to be
entered (it does not execute the query buffer).
Tip: vsql searches the environment variables VSQL_EDITOR, EDITOR, and VISUAL (in that
order) for an editor to use. If all of them are unset, vi is used on Linux systems, notepad.exe on
Windows systems.
echo [ STRING ]
\echo [ STRING ] writes the string to standard output
Tip: If you use the \o (page 266) command to redirect your query output you might want to use
\qecho (page 269) instead of this command.
f [ string ]
\f [ string ] sets the field separator for unaligned query output. The default is the vertical bar
(|). See also \pset (page 267) for a generic way of setting output options.
g
The \g meta-command sends the query in the input buffer (see \p (see "p" on page 267)) to the
server. With no arguments, it displays the results in the standard way.
\g FILE sends the query input buffer to the server, and writes the results to FILE.
\g | COMMAND sends the query buffer to the server, and pipes the results to a shell COMMAND.
See Also
\o meta-command (see "o" on page 266)
-263-
Programmer's Guide
H
\H toggles HTML query output format. This command is for compatibility and convenience, but
see \pset (page 267) about setting other output options.
h \help
\h \help displays help information about the meta-commands. Works the same as \? (see "?" on
page 253).
=> \h
General
\c[onnect] [DBNAME|- [USER]]
connect to new database (currently "VMart")
\cd [DIR]
change the current working directory
\q
quit vsql
\set [NAME [VALUE]]
set internal variable, or list all if no parameters
\timing
toggle timing of commands (currently off)
\unset NAME
unset (delete) internal variable
\! [COMMAND]
execute command in shell or start interactive shell
\password [USER]
change user's password
Query
\e
\g
\g
\g
\p
\r
\s
\w
Buffer
[FILE]
FILE
| COMMAND
[FILE]
FILE
edit the query buffer (or file) with external editor
send query buffer to server
send query buffer to server and results to file
send query buffer to server and pipe results to command
show the contents of the query buffer
reset (clear) the query buffer
display history or save it to file
write query buffer to file
Input/Output
\echo [STRING] write string to standard output
\i FILE
execute commands from file
\o FILE
send all query results to file
\o | COMMAND
pipe all query results to command
\o
close query-results file or pipe
\qecho [STRING]
write string to query output stream (see \o)
Informational
\d [PATTERN]
\df [PATTERN]
\dj [PATTERN]
describe tables (list tables if no argument is supplied)
list functions
list projections
-264-
Using vsql
\dn [PATTERN]
\dp [PATTERN]
\ds [PATTERN]
\dS [PATTERN]
\dt [PATTERN]
\dtv [PATTERN]
\dT [PATTERN]
\du [PATTERN]
\dv [PATTERN]
\l
\z [PATTERN]
list
list
list
list
list
list
list
list
list
list
list
schemas
table access privileges
sequences
system tables
tables
tables and views
data types
users
views
all databases
table access privileges (same as \dp)
Formatting
\a
toggle between unaligned and aligned output mode
\b
toggle beep on command completion
\C [STRING]
set table title, or unset if none
\f [STRING]
show or set field separator for unaligned query output
\H
toggle HTML output mode (currently off)
\pset NAME [VALUE]
set table output option
(NAME := {format|border|expanded|fieldsep|footer|null|
recordsep|tuples_only|title|tableattr|pager})
\t
show only rows (currently off)
\T [STRING]
set HTML <table> tag attributes, or unset if none
\x
toggle expanded output (currently off)
i FILE
\i filename command reads input from the file filename and executes it as though it had been
typed on the keyboard.
Note: To see the lines on the screen as they are read, set the variable ECHO (page 274) to all.
l
\l provides a list of databases and their owners.
vmartdb=> \l
List of databases
name
| user_name
---------+----------vmartdb | dbadmin
(1 row)
locale
The vsql \locale command displays the current locale setting or lets you set a new locale for the
session.
-265-
Programmer's Guide
This command does not alter the default locale for all database sessions. To change the default
for all sessions, set the DefaultSessionLocale configuration parameter.
Viewing the Current Locale Setting
To view the current locale setting, use the vsql command \locale, as follows:
=> \locale
en_US@collation=binary
Overriding the Default Local for a Session
To override the default local for a specific session, use the vsql command \locale
<ICU-locale-identifier>. The session locale setting applies to any subsequent commands issued in
the session.
For example:
\locale en_GB
INFO: Locale: 'en_GB'
INFO:
English (United Kingdom)
INFO: Short form: 'LEN'
You can also use the short form of an ICU locale identifier:
\locale LEN
INFO: Locale: 'en'
INFO:
English
INFO: Short form: 'LEN'
Notes
The server locale settings impact only the collation behavior for server-side query processing. The
client application is responsible for ensuring that the correct locale is set in order to display the
characters correctly. Below are the best practices recommended by HP to ensure predictable
results:





The locale setting in the terminal emulator for vsql (POSIX) should be set to be equivalent to
session locale setting on server side (ICU) so data is collated correctly on the server and
displayed correctly on the client.
The vsql locale should be set using the POSIX LANG environment variable in terminal
emulator. Refer to the documentation of your terminal emulator for how to set locale.
Server session locale should be set using the set as described in Specify the Default Locale for
the Database.
Note that all input data for vsql should be in UTF-8 and all output data is encoded in UTF-8
Non UTF-8 encodings and associated locale values are not supported.
o
The \o meta-command is used to control where vsql directs its query output. The output can be
written to a file, piped to a shell command, or sent to the standard output.
-266-
Using vsql
\o FILE sends all subsequent query output to FILE.
\o | COMMAND pipes all subsequent query output to a shell COMMAND.
\o with no argument closes any open file or pipe, and switches back to normal query result output.
Notes
 Query results includes all tables, command responses, and notices obtained from the
database server.
 To intersperse text output with query results, use \qecho (page 269).
See Also
\g meta-command (page 263)
p
\p prints the current query buffer to the standard output. For example:
=> \p
CREATE VIEW my_seqview AS (SELECT * FROM sequences);
password [ USER ]
\password starts the password change process. Users can only change their own passwords.
The command prompts the user for the old password, a new password, and then the new
password again to confirm.
A superuser can change the password of another user by supplying the username. A superuser is
not prompted for the old password, either when changing his or her own password, or when
changing another user's password.
Note: If you want to cancel the password change process, press ENTER until you return the to
vsql prompt.
pset NAME [ VALUE ]
\pset NAME [ VALUE ] sets options affecting the output of query result tables. NAME describes
which option to set, as illustrated in the following table. The parameters of VALUE depend
thereon.
It is an error to call \pset without arguments
Adjustable printing options are:
format
Sets the output format to one of unaligned, aligned, html, or latex.
-267-
Programmer's Guide
Unique abbreviations are allowed. (That would mean one letter is enough.)
"Unaligned" writes all columns of a row on a line, separated by the currently
active field separator. This is intended to creat e output that might be
intended to be read in by other programs (tab- separated,
comma-separated). "Aligned" mode is the standard, human-readable,
nicely formatted text output that is default. The "HTML" and "LaTeX" modes
put out tables that are intended to be included in documents using the
respective mark -up language. They are not complete documents! (This
might not be so dramatic in HTML, but in LaTeX y ou must have a complete
document wrapper. )
border
The second argument must be a number. In general, the higher the number
the more borders and lines the tables have, but this depends on the
particular format. In HTML mode, this translates directly into the
border=... attribute, in the others only values 0 (no border), 1 (int ernal
dividing lines), and 2 (table frame) make sens e.
expanded
Toggles between regular and expanded format. When expanded format is
enabled, all output has two columns wit h the column name on the left and
the data on the right. This mode is useful if the data wouldn't fit on the
screen in the normal "horizontal" mode.
Expanded mode is supported by all four output formats.
\x is the same as \pset expanded.
fieldsep
Specifies the field separator to be used in unaligned output mode. That way
one can create, for example, tab- or comma-separated output, which other
programs might prefer. To set a tab as field separator, type \pset
fieldsep '\t'. The default field separator is '|' (a vertical bar).
footer
Toggles the display of the default footer (x rows).
null
The second argument is a string that is printed whenever a column is null.
The default is not to print anything, which can easily be mistaken for, say, an
empty string. Thus, one might choose to write \pset null '(null)'.
recordsep
Specifies the record (line) separator to use in unaligned output mode. The
default is a newline character.
tuples_only (or t) Toggles between tuples only and full display. Full display might show extra
information such as column headers, titles, and various footers. In tuples
only mode, only actual table data is shown.
title [ text ]
Sets the table title for any subsequently printed tables. This can be used to
give your output descriptive tags. If no argument is given, the title is unset.
tableattr (or T)
[ text ]
Allows you to specify any attributes to be plac ed inside the HTML table
tag. This could for example be cellpadding or bgcolor. Note that you
probably don't want to specify border here, as that is already taken care of
by \pset border.
pager
Cont rols use of a pager for query and vsql help output. If the environment
variable PAGER is set, the output is piped to the specified program.
Otherwise a platform -dependent default (such as more) is used.
When the pager is off, the pager is not used. When the pager is on, the
pager is used only when appropriate; that is, the output is to a terminal and
does not fit on the screen. (vs ql does not do a perfect job of estimating when
-268-
Using vsql
to use the pager.) \pset pager turns the pager on and off. Pager can also
be set to always, which causes the pager to be always used.
See illustrations on how these different formats look in the Examples (page 283) section.
Tip: There are various shortcut commands for \pset. See \a (see "a" on page 254), \C (see "C [
STRING ]" on page 255), \H (see "H" on page 264), \t (see "t" on page 271), \T (see "T [
STRING ]" on page 271), and \ x (see "x" on page 272).
q
\q quits the vsql program.
qecho [ STRING ]
\qecho [ STRING ] is identical to \echo (see "echo [ STRING ]" on page 263) except th at
the output is written to the query output stream, as set by \o (see "o" on page 266).
r
\r resets (clears) the query buffer.
For example, run the \p (see "p" on page 267) meta-command to see what is in the query buffer:
=> \p
CREATE VIEW my_seqview AS (SELECT * FROM sequences);
Now reset the query buffer:
=> \r
Query buffer reset (cleared).
If you reissue the command to see what's in the query buffer, you can see it is now empty:
=> \p
Query buffer is empty.
s [ FILE ]
\s [ FILE ] prints or saves the command line history to filename. If a filename is not specified,
\s writes the history to the standard output. This option is only available if vsql is configured to use
the GNU Readline library.
-269-
Programmer's Guide
set [ NAME [ VALUE [ ... ] ] ]
\set [ name [ value [ ... ] ] ] sets the internal variable name to value or, if m ore than
one value is given, to the concatenation of all of values. If no second argument is given, the
variable is set with no value.
If no argument is provided, \set lists all internal variables; for example:
=> \set
VERSION = 'Vertica Analytic Database v6.0.0-0'
AUTOCOMMIT = 'off'
VERBOSITY = 'default'
PROMPT1 = '%/%R%# '
PROMPT2 = '%/%R%# '
PROMPT3 = '>> '
ROWS_AT_A_TIME = '1000'
DBNAME = 'VMartDB'
USER = 'dbadmin'
HOST = '<host_ip_address>'
PORT = '5433'
LOCALE = 'en_US@collation=binary'
HISTSIZE = '500'
Notes
 Valid variable names are case sensitive and can contain characters, digits, and underscores.
vsql treats several variables as special, which are described in Variables (page 272).
 The \set parameter ROWS_AT_A_TIME defaults to 1000. It retrieves results as blocks of rows
of that size. The column formatting for the first block is used for all blocks, so in later blocks
some entries could overflow. See \timing (page 271) for examples.
 When formatting results, HP Vertica buffers ROWS_AT_A_TIME rows in memory to calculate
the maximum column widths. It is possible that rows after this initial fetch are not properly
aligned if any of the field values are longer than those see in the first ROWS_AT_A_TIME
rows. ROWS_AT_A_TIME can be \unset (see "unset [ NAME ]" on page 271) to guarantee
perfect alignment, but this requires re-buffering the entire result set in memory and may cause
vsql to fail if the result set is too big.
 To unset a variable, use the \unset (page 271) command.
Using Backquotes to Read System Variables
In vsql, the contents of backquotes are passed to the system shell to be interpreted (the same
behavior as many UNIX shells). This is particularly useful in setting internal vsql variables, since
you may want to access UNIX system variables (such as HOME or TMPDIR) rather than
hard-code values.
For example, if you want to set an internal variable to the full path for a file in your UNIX user
directory, you could use backquotes to get the content of the system HOME variable, which is the
full path to your user directory:
=> \set inputfile `echo $HOME`/myinput.txt
=> \echo :inputfile
-270-
Using vsql
/home/dbadmin/myinput.txt
The contents of the backquotes are replaced with the results of running the contents in a system
shell interpreter. In this case, the echo $HOME command returns the contents of the HOME
system variable.
t
\t toggles the display of output column name headings and row count footer. This command is
equivalent to \pset (page 267) tuples_only and is provided for convenience.
T [ STRING ]
\T [ STRING ] specifies attributes to be placed within the table tag in HTML tabular output
mode. This command is equivalent to \pset (page 267) tableattr table_options.
timing
\timing toggles the timing of commands (currently off). The meta-command displays how long
each SQL statement takes, in milliseconds, and reports both the time required to fetch the first
block of rows from the server and the total until the last block is formatted.
Example
=> \o /dev/null
=> SELECT * FROM fact LIMIT 100000;
Time: First fetch (1000 rows): 22.054 ms. All rows formatted: 235.056 ms
Note that the database retrieved the first 1000 rows in 22 ms and completed retrieving and
formatting all rows in 235 ms.
See Also
\set (page 270)
unset [ NAME ]
\unset [ NAME ] unsets (deletes) the internal variable name that was set using the \set (page
270) meta-command.
-271-
Programmer's Guide
w [ FILE ]
\w [ FILE ] outputs the current query buffer to the file filename.
x
\x toggles extended table formatting mode. Is equivalent to \pset (page 267) expanded.
Note: There is no space between the backslash and the x.
z
\z lists table access privileges (grantee, grantor, privilege, and name) for all table access
privileges in each schema. Is the same as \dp (see "dp [ PATTERN ]" on page 259)
Variables
vsql provides variable substitution features similar to common Linux command shells. Variables
are simply name/value pairs, where the value can be any string of any length. To set variables,
use the vsql meta-command \set (see "set [ NAME [ VALUE [ ... ] ] ]" on page 270):
=> \set fact dim
sets the variable fact to the value dim. To retrieve the content of the variable, precede the name
with a colon and use it as the argument of any slash command:
=> \echo :fact
dim
Note: The arguments of \set are subject to the same substitution rules as with other
commands. For example, \set dim :fact is a valid way to copy a variable.
If you call \set without a second argument, the variable is set, with an empty string as value. To
unset (or delete) a variable, use the command \unset (see "unset [ NAME ]" on page 271).
vsql's internal variable names can consist of letters, numbers, and underscores in any order and
any number. Some of these variables are treated specially by vsql. They indicate certain option
settings that can be changed at run time by altering the value of the variable or represent some
state of the application. Although you can use these variables for any other purpose, this is not
recommended. By convention, all specially treated variables consist of all upper-case letters (and
possibly numbers and underscores). To ensure maximum compatibility in the future, avoid using
such variable names for your own purposes.
SQL Interpolation
An additional useful feature of vsql variables is that you can substitute ("interpolate") them into
regular SQL statements. The syntax for this is again to prepend the variable name with a colon (:).
-272-
Using vsql
testdb=> \set fact 'my_table'
testdb=> SELECT * FROM : fact;
would then query the table my_table. The value of the variable is copied literally (except for
backquoted strings, see below), so it can even contain unbalanced quotes or backslash
commands. Make sure that it makes sense where you put it. Variable interpolation is not
performed into quoted SQL entities.
Note: The one exception to variable values being copied literally is strings in backquotes (``).
The contents of backquoted strings are passed to a system shell, and replaced with the shell's
output. See the set (page 270) metacommand topic for details.
AUTOCOMMIT
When AUTOCOMMIT is set 'on', each SQL command is automatically committed upon successful
completion; for example:
\set (see "set [ NAME [ VALUE [ ... ] ] ]" on page 270) AUTOCOMMIT on
To postpone COMMIT in this mode, set the value as off.
\set AUTOCOMMIT off
If AUTOCOMMIT is empty or defined as off, SQL commands are not committed unless you
explicitly issue COMMIT.
Notes
 AUTOCOMMIT is off by default.
 AUTOCOMMIT must be in uppercase, but the values, on or off, are case insensitive.
 In autocommit-off mode, you must explicitly abandon any failed transaction by entering
ABORT or ROLLBACK.
 If you exit the session without committing, your work is rolled back.
 Validation on vsql variables is done when they are run, not when they are set.
 The COPY statement, by default, commits on completion, so it does not matter which
AUTOCOMMIT mode you use, unless you issue COPY NO COMMIT.
 To tell if AUTOCOMMIT is on or off, issue the set command:
$ \set
...
AUTOCOMMIT = 'off'
...
 AUTOCOMMIT is off if a SELECT * FROM LOCKS shows locks from the statement you just ran.
$ \set AUTOCOMMIT off
$ \set
...
AUTOCOMMIT = 'off'
...
SELECT COUNT(*) FROM customer_dimension;
count
-------273-
Programmer's Guide
50000
(1 row)
SELECT node_names, object_name, lock_mode, lock_scope
FROM LOCKS;
node_names |
object_name
| lock_mode | lock_scope
------------+--------------------------+-----------+------------site01
| Table:customer_dimension | S
| TRANSACTION
(1 row)
DBNAME
The name of the database to which you are currently connected. DBNAME is set every time you
connect to a database (including program startup), but it can be unset.
ECHO
If set to all, all lines entered from the keyboard or from a script are written to the standard output
before they are parsed or run.
To select this behavior on program start-up, use the switch -a. If set to queries, vsql merely
prints all queries as they are sent to the server. The switch for this is -e.
ECHO_HIDDEN
When this variable is set and a backslash command queries the database, the query is first
shown. This way you can study the HP Vertica internals and provide similar functionality in your
own programs. (To select this behavior on program start-up, use the switch -E.)
If you set the variable to the value noexec, the queries are just shown but are not actually sent to
the server and run.
ENCODING
The current client character set encoding.
HISTCONTROL
If this variable is set to ignorespace, lines that begin with a space are not entered into the history
list. If set to a value of ignoredups, lines matching the previous history line are not entered. A
value of ignoreboth combines the two options. If unset, or if set to any other value than those
previously mentioned, all lines read in interactive mode are saved on the history list.
Source: Bash.
-274-
Using vsql
HISTSIZE
The number of commands to store in the command history. The default value is 500.
Source: Bash.
HOST
The database server host you are currently connected to. This is set every time you connect to a
database (including program startup), but can be unset.
IGNOREEOF
If unset, sending an EOF character (usually Control+D) to an interactive session of vsql terminates
the application. If set to a numeric value, that many EOF characters are ignored before the
application terminates. If the variable is set but has no numeric value, the default is 10.
Source: Bash.
ON_ERROR_STOP
By default, if a script command results in an error, for example, because of a malformed command
or invalid data format, processing continues. If you set ON_ERROR_STOP to „on‟ in a script and an
error occurs during processing, the script terminates immediately.
If you set ON_ERROR_STOP to „on‟ in a script, run the script from Linux using vsql -f
<filename>, and an error occurs, vsql returns an error code 3 to Linux to indicate that the error
occurred in a script.
To enable ON_ERROR_STOP:
=> \set ON_ERROR_STOP on
To disable ON_ERROR_STOP:
=> \set ON_ERROR_STOP off
PORT
The database server port to which you are currently connected. This is set every time you connect
to a database (including program start-up), but can be unset.
-275-
Programmer's Guide
PROMPT1 PROMPT2 PROMPT3
These specify what the prompts vsql issues look like. See Prompting (page 277) for details.
QUIET
This variable is equivalent to the command line option -q. It is probably not too useful in interactive
mode.
SINGLELINE
This variable is equivalent to the command line option -S.
SINGLESTEP
This variable is equivalent to the command line option -s.
USER
The database user you are currently connected as. This is set every time you connect to a
database (including program startup), but can be unset.
VERBOSITY
This variable can be set to the values default, verbose, or terse to control the verbosity of
error reports.
VSQL_HOME
By default, the vsql program reads configuration files from the user's home directory. In cases
where this is not desirable, the configuration file location can be overridden by setting the
VSQL_HOME environment variable in a way that does not require modifying a shared resource.
In the following example, vsql reads configuration information out of /tmp/jsmith rather than out of
~.
# Make an alternate configuration file in /tmp/jsmith
mkdir -p /tmp/jsmith
echo "\\echo Using VSQLRC in tmp/jsmith" > /tmp/jsmith/.vsqlrc
# Note that nothing is echoed when invoked normally
vsql
-276-
Using vsql
# Note that the .vsqlrc is read and the following is
# displayed before the vsql prompt
#
# Using VSQLRC in tmp/jsmith
VSQL_HOME=/tmp/jsmith vsql
Prompting
The prompts vsql issues can be customized to your preference. The three variables PROMPT1,
PROMPT2, and PROMPT3 contain strings and special escape sequences that describe the
appearance of the prompt. Prompt 1 is the normal prompt that is issued when vsql requests a new
command. Prompt 2 is issued when more input is expected during command input because the
command was not terminated with a semicolon or a quote was not closed. Prompt 3 is issued
when you run a SQL COPY command and you are expected to type in the row values on the
terminal.
The value of the selected prompt variable is printed literally, except where a percent sign (%) is
encountered. Depending on the next character, certain other text is substituted instead. Defined
substitutions are:
%M
The full host name (with domain name) of the database server, or [local] if the
connection is over a socket, or [local:/dir/name], if the socket is not at the compiled in
default location.
%m
The host name of the database server, truncated at the first dot, or [local].
%>
The port number at which the database server is listening.
%n
The databas e session user name.
%/
The name of the current database.
%~
Like %/, but the output is ~ (tilde) if the database is your default database.
%#
If the session user is a database superuser, then a #, otherwise a >. (The expansion
of this value might change during a dat abase session as the result of the command
SET SESSION AUTHORIZA TION.)
%R
In prompt 1 normally =, but ^ if in single-line mode, and ! if the session is disconnected
from the dat abas e (which can happen if \connect fails). In prompt 2 the sequence is
replaced by -, *, a single quote, a double quote, or a dollar sign, depending on
whet her vsql expects more input because the command wasn't terminated yet,
because you are inside a /* ... */ comment, or because you are inside a quoted or
dollar-escaped string. In prompt 3 the sequenc e does n't produce anything.
%x
Trans action status: an empty string when not in a transaction block, or * when in a
transaction block, or ! when in a failed transaction block, or ? when the transaction
state is indeterminate (for example, because there is no connection).
%digits
The character with the indic ated numeric code is substituted. If digits starts with 0x
the rest of the characters are interpret ed as hexadecimal; otherwise if the first digit is
0 the digits are interpret ed as octal; otherwise the digits are read as a decimal
number.
-277-
Programmer's Guide
%:name:
The value of the vsql variable name. See the section Variables for details.
%`command`
The output of command, similar to ordinary "back - tick" substitution.
%[ ... %]
Prompts may contain terminal control characters which, for example, change the
color, background, or style of the prompt text, or change the title of the terminal
window. In order for the line editing features of Readline to work properly, these
non-printing control characters must be designated as invisible by surroundi ng them
with %[ and %]. Multiple pairs of these may occur within the prompt. The following
example results in a boldfaced (1;) yellow-on-black (33;40) prompt on
VT100-compatible, color-capable terminals:
testdb=> \set PROMPT1 '%[%033[1;33;40m%]%n@%/%R%#%[%033[0m%] '
To insert a percent sign into your prompt, write %%. The default prompts are
'%/%R%# ' for prompts 1 and 2, and '>> ' for prompt 3.
Note: See the specifi cation for terminal control sequences
http://www.xfree86.org/current/ ctl seqs.html (applicable to gnome-terminal and
xterm).
Command Line Editing
vsql supports the tecla library for convenient line editing and retrieval.
The command history is automatically saved when vsql exits and is reloaded when vsql starts up.
Tab-completion is also supported, although the completion logic makes no claim to be a SQL
parser. If for some reason you do not like the tab completion, you can turn it off by putting this in a
file named .teclarc in your home directory:
bind ^I
Read the tecla documentation for further details.
Notes
The vsql implementation of the tecla library deviates from the tecla documentation as follows:



Recalling Previously Typed Lines
Under pure tecla, all new lines are appended to a list of historical input lines maintained within
the GetLine resource object. In vsql, only different, non-empty lines are appended to the list of
historical input lines.
History Files
tecla has no standard name for the history file. In vsql, the file name is called ~/.vsql_hist.
International Character Sets (Meta keys and locales)
In vsql, 8-bit meta characters are no longer supported. Make sure that meta characters send
an escape by setting their EightBitInput X resource to False. You can do this in one of the
following ways:
 Edit the ~/.Xdefaults file by adding the following line:
XTerm*EightBitInput: False
-278-
Using vsql


 Start an xterm with an -xrm '*EightBitInput: False' command-line argument.
Key Bindings:
The following key bindings are specific to vsql:
 Insert switches between insert mode (the default) and overwrite mode.
 Delete deletes the character to the right of the cursor.
 Home moves the cursor to the front of the line.
 End moves the cursor to the end of the line.
 ^R Performs a history backwards search.
vsql Environment Variables
The following environment variables can be set to automatically use the defined properties each
time you start vsql:









PAGER - If the query results do not fit on the screen, they are piped through this command.
Typical values are more or less. The default is platform-dependent. The use of the pager can
be disabled by using the \pset (see "pset NAME [ VALUE ]" on page 267) command.
VSQL_DAT ABASE - Database to connect to. For example, VMart.
VSQL_HOST - Host name or IP address of the HP Vertica node.
VSQL_PORT - Port to use for the connection.
VSQL_USER - Username to use for the connection.
VSQL_PASSWORD - Password to use for the connection.
VSQL_EDITOR, EDITOR and VISUAL - Editor used by the \e command. The variables are
examined in the order listed; the first that is set is used.
SHELL - Command run by the \! (see "! [ COMMAND ]" on page 253) command.
TMPDIR - Directory for storing temporary files. The default is platform-dependant. On
Unix-like systems the default is /tmp.
Locales
The default terminal emulator under Linux is gnome-terminal, although xterm can also be used.
HP recommends that you use gnome-terminal with vsql in UTF-8 mode, which is its default.
To change settings on Linux
1 From the tabs at the top of the vsql screen, select Terminal.
2 Click Set Character Encoding.
3 Select Unicode (UTF-8).
Note: This works well for standard keyboards. xterm has a similar UTF-8 option.
To change settings on Windows using PuTTy
1 Right click the vsql screen title bar and select Change Settings.
-279-
Programmer's Guide
2 Click Window and click Translation.
3 Select UTF-8 in the drop-down menu on the right.
Notes
 vsql has no way of knowing how you have set your terminal emulator options.
 The tecla library is prepared to do POSIX-type translations from a local encoding to UTF-8 on
interactive input, using the POSIX LANG, etc., environment variables. This could be useful to
international users who have a non-UTF-8 keyboard. See the tecla documentation for
details.
HP recommends the following (or whatever other .UTF-8 locale setting you find appropriate):
export LANG=en_US.UTF-8
 The vsql \locale (see "locale" on page 265) command invokes and tracks the server SET
LOCALE TO command, described in the SQL Reference Manual. vsql itself currently does
nothing with this locale setting, but rather treats its input (from files or from tecla), all its output,
and all its interactions with the server as UTF-8. vsql ignores the POSIX locale variables,
except for any "automatic" uses in printf, and so on.
Files
Before starting up, vsql attempts to read and execute commands from the system-wide vsqlrc
file and the user's ~/.vsqlrc file. The command-line history is stored in the file
~/.vsql_history.
Tip: If you want to save your old history file, open another terminal window and save a copy to
a different file name.
Exporting Data Using vsql
You can use vsql for simple data exports tasks by changing its output format options so the output
is suitable for importing into other systems (tab delimited or comma-separated files, for example).
These options can be set either from within an interactive vsql session, or through command-line
arguments to the vsql command (making the export process suitable for automation through
scripting). After you have set vsql's options so it outputs the data in a format your target system
can read, you run a query and capture the result in a text file.
The following table lists the meta-commands and command-line options that are useful for
changing the format of vsql's output.
Description
Meta-command
Command-line Option
Disable padding used to align output.
\a (page 254)
-A or --no-align
Show only tuples, disabling column headings
\t (page 271)
-t or --tuples-only
-280-
Using vsql
and row counts.
Set the field separator character.
\pset (page 267)
fields ep
-F or --field-separat or
Send output to a file.
\o (page 266)
-o or --output
Specify a SQL statement to execute.
N/A
-c or --command
The following example demonstrates disabling padding and column headers in the output, and
setting a field separator to dump a table to a tab-separated text file within an interactive session.
=> SELECT * FROM my_table;
a |
b
| c
---+-------+--a | one
| 1
b | two
| 2
c | three | 3
d | four | 4
e | five | 5
(5 rows)
=> \a
Output format is unaligned.
=> \t
Showing only tuples.
=> \pset fieldsep '\t'
Field separator is "
".
=> \o dumpfile.txt
=> select * from my_table;
=> \o
=> \! cat dumpfile.txt
a
one
1
b
two
2
c
three
3
d
four
4
e
five
5
Note: You could encounter issues with empty strings being converted to NULLs or the reverse
using this technique. You can prevent any confusion by explicitly setting null values to output a
unique string such as NULLNULLNULL (for example, \pset null 'NULLNULLNULL'). Then,
on the import end, convert the unique string back to a null value. For example, if you are
copying the file back into an HP Vertica database, you would give the argument NULL
'NULLNULLNULL' to the COPY statement.
When logged into one of the database nodes, you can create the same output file directly from the
command line by passing the right parameters to vsql:
> vsql -U username -F $'\t' -At -o dumpfile.txt -c "SELECT * FROM my_table;"
Password:
> cat dumpfile.txt
a
one
1
b
two
2
c
three
3
d
four
4
-281-
Programmer's Guide
e
five
5
Note: $'...' is a BASH-specific string format that interprets backslash escapes, so it will
pass a literal tab character to the vsql command as the argument for the -F parameter. Shells
other than BASH may have other string literal syntax.
If you want to convert null values to a unique string as mentioned earlier, you can add the
argument -P null='NULLNULLNULL' (or whatever unique string you choose).
By adding the -w vsql command-line option to the example command line, you could use the
command within a batch script to automate the data export. However, the script would contain the
database password as plain text. If you take this approach, you should prevent unauthorized
access to the batch script, and also have the script use a database user account that has limited
access.
Copying Data Using vsql
You can use vsql to copy data between two HP Vertica databases. This technique is similar to the
technique explained in Exporting Data Using vsql (page 280), except instead of having vsql
save data to a file for export, you pipe one vsql's output to the input of another vsql command that
runs a COPY statement from STDIN. This technique can also work for other databases or
applications that accept data from an input stream.
Note: The following technique only works for individual tables. To copy an entire database to
another cluster, see Copying a Database to Another Cluster in the Administrator's Guide.
The easiest way to copy using vsql is to log in to a node of the target database, then issue a vsql
command that connects to the source HP Vertica database to dump the data you want. For
example, the following command copies the store.store_sales_fact table from the vmart database
on node testdb01 to the vmart database on the node you are logged into:
vsql -U username -w passwd -h testdb01 -d vmart -At -c "SELECT * from store.store_sales_fact" \
| vsql -U username -w passwd -d vmart -c "COPY store.store_sales_fact FROM STDIN DELIMITER '|';"
Note: The above example copies the data only, not the table design. The target table for the
data copy must already exist in the target database. You can export the design of the table
using EXPORT_OBJECTS or EXPORT_CATALOG.
Monitoring Progress (optional)
You may want some way of monitoring progress when copying large amounts of data between HP
Vertica databases. One way of monitoring the progress of the copy operation is to use a utility
such as Pipe Viewer (http://www.ivarch.com/programs/pv.shtml) that pipes its input directly to
its output while displaying the amount and speed of data it passes along. Pipe Viewer can even
display a progress bar if you give it the total number of bytes or lines you expect to be processed.
You can get the number of lines to be processed by running a separate vsql command that
executes a SELECT COUNT query.
-282-
Using vsql
Note: Pipe Viewer isn't a standard Linux or Solaris command, so you will need download and
install it yourself. See the Pipe Viewer (http://www.ivarch.com/programs/pv.shtml) page for
download packages and instructions. HP does not support Pipe Viewer. Install and use it at
your own risk.
The following command demonstrates how you can use Pipe Viewer to monitor the progress of
the copy shown in the prior example. The command is complicated by the need to get the number
of rows that will be copied, which is done using a separate vsql command within a BASH
backquote string, which executes the strings contents and inserts the output of the command into
the command line. This vsql command just counts the number of rows in the
store.store_sales_fact table.
vsql -U username -w passwd -h testdb01 -d vmart -At -c "SELECT * from store.store_sales_fact" \
| pv -lpetr -s `vsql -U username -w passwd -h testdb01 -d vmart -At -c "SELECT COUNT (*) FROM
store.store_sales_fact;"` \
| vsql -U username -w passwd -d vmart -c "COPY store.store_sales_fact FROM STDIN DELIMITER '|';"
While running, the above command displays a progress bar that looks like this:
0:00:39 [12.6M/s] [=============================>
] 50% ETA 00:00:40
Output Formatting Examples
The first example shows how to spread a command over several lines of input. Notice the
changing prompt:
testdb=> CREATE TABLE my_table (
testdb(> first integer not null default 0,
testdb(> second char(10));
CREATE TABLE
Assume you have filled the table with data and want to take a look at it:
testdb=> SELECT * FROM my_table;
first | second
-------+-------1 | one
2 | two
3 | three
4 | four
(4 rows)
You can display tables in different ways by using the \pset command:
testdb=> \pset border 2
Border style is 2.
testdb=> SELECT * FROM my_table;
+-------+--------+
| first | second |
+-------+--------+
|
1 | one
|
|
2 | two
|
-283-
Programmer's Guide
|
3 | three |
|
4 | four
|
+-------+--------+
(4 rows)
testdb=> \pset border 0
Border style is 0.
testdb=> SELECT * FROM my_table;
first second
----- -----1 one
2 two
3 three
4 four
(4 rows)
testdb=> \pset border 1
Border style is 1.
testdb=> \pset format unaligned
Output format is unaligned.
testdb=> \pset fieldsep ','
Field separator is ",".
testdb=> \pset tuples_only
Showing only tuples.
testdb=> SELECT second, first FROM my_table;
one,1
two,2
three,3
four,4
Alternatively, use the short commands:
testdb=> \a \t \ x
Output format is aligned.
Tuples only is off.
Expanded display is on.
testdb=> SELECT * FROM my_table;
first | 1
second | one
-------+----------first | 2
second | two
-------+----------first | 3
second | three
-------+----------first | 4
second | four
-284-
Writing Queries
Queries are database operations that retrieve data from one or more tables or views. In HP
Vertica, the top-level SELECT statement is the query, and a query nested within another SQL
statement is called a subquery.
HP Vertica is designed to run the same SQL standard queries that run on other databases.
However, there are some differences between HP Vertica queries and queries used in other
relational database management systems.
The HP Vertica transaction model is different from the SQL standard in a way that has a profound
effect on query performance. You can:


Run a query on a static snapshot of the database from any specific date and time. Doing so
avoids holding locks or blocking other database operations.
Use a subset of the standard SQL isolation levels and access modes (read/write or read-only)
for a user session.
In HP Vertica, the primary structure of a SQL query is its statement. Each statement ends with a
semicolon, and you can write multiple queries separated by semicolons; for example:
=> CREATE TABLE t1( ..., date_col date NOT NULL, ...);
=> CREATE TABLE t2(..., state VARCHAR NOT NULL, ...);
Multiple Instances of Dimension Tables in the FROM Clause
The same dimension table can appear multiple times in a query's FROM clause, using different
aliases. For example:
SELECT *
FROM
fact, dimension d1, dimension d2
WHERE fact.fk = d1.pk
AND
fact.name = d2.name;
Historical (Snapshot) Queries
HP Vertica supports querying historical data for individual SELECT statements.
Syntax
[ AT EPOCH LATEST ] | [ AT TIME 'timestamp' ] SELECT ...
Parameters
AT EPOCH LATEST
Queries all committed data in the database up to, but not
including, the current epoch.
-285-
Programmer's Guide
AT TIME 'timestamp'
Queries all committed data in the database up to the time stamp
specified. AT TIME 'timestamp' queries are resolved to the
next epoch boundary before being evaluated.
Historical queries, also known as snapshot queries, are useful because they access data in past
epochs only. Historical queries do not need to hold table locks or block write operations because
they do not return the absolute latest data.
Historical queries behave in the same manner regardless of transaction isolation level. Historical
queries observe only committed data, even excluding updates made by the current transaction,
unless those updates are to a temporary table.
Note: You do not need to use historical queries for temporary tables because temp tables do
not require locks. Their content is private to the transaction and valid only for the length of the
transaction.
Be aware that there is only one snapshot of the logical schema. This means that any changes you
make to the schema are reflected across all epochs. If, for example, you add a new column to a
table and you specify a default value for the column, all historical epochs display the new column
and its default value.
See Also
Transactions in the Concepts Guide
Temporary Tables
You can use the CREATE TEMPORARY TABLE statement to implement certain queries using
multiple steps:
1 Create one or more temporary tables.
2 Execute queries and store the result sets in the temporary tables.
3 Execute the main query using the temporary tables as if they were a normal part of the logical
schema.
See CREATE TEMPORARY TABLE in the SQL Reference Manual for details.
SQL Queries
All DML (Data Manipulation Language) statements can contain queries. This section introduces
some of the query types in HP Vertica, with additional details in later sections.
Note: Many of the examples in this chapter use the VMart schema. For information about other
HP Vertica-supplied queries, see the Getting Started Guide.
-286-
Writing Queries
Simple Queries
Simple queries contain a query against one table. Minimal effort is required to process the
following query, which looks for product keys and SKU numbers in the product table:
=> SELECT product_key, sku_number FROM public.product_dimension;
product_key | sku_number
-------------+----------43
| SKU-#129
87
| SKU-#250
42
| SKU-#125
49
| SKU-#154
37
| SKU-#107
36
| SKU-#106
86
| SKU-#248
41
| SKU-#121
88
| SKU-#257
40
| SKU-#120
(10 rows)
Joins
Joins use a relational operator that combines information from two or more tables. The query's ON
clause specifies how tables are combined, such as by matching foreign keys to primary keys. In
the following example, the query requests the names of stores with transactions greater than 70
by joining the store key ID from the store schema's sales fact and sales tables:
=> SELECT store_name, COUNT(*) FROM store.store_sales_fact
JOIN store.store_dimension ON store.store_sales_fact.store_key = store.store_dimension.store_key
GROUP BY store_name HAVING COUNT(*) > 70 ORDER BY store_name;
store_name | count
------------+------Store49
|
72
Store83
|
78
(2 rows)
For more detailed information, see Joins (page 315). See also the Multicolumn subqueries
section in Subquery Examples.
Cross Joins
Also known as the Cartesian product, a cross join is the result of joining every record in one table
with every record in another table. A cross join occurs when there is no join key between tables to
restrict records. The following query, for example, returns all instances of vendor and store names
in the vendor and store tables:
=> SELECT vendor_name, store_name FROM public.vendor_dimension
CROSS JOIN store.store_dimension;
vendor_name
| store_name
--------------------+-----------Deal Warehouse
| Store41
Deal Warehouse
| Store12
Deal Warehouse
| Store46
Deal Warehouse
| Store50
-287-
Programmer's Guide
Deal Warehouse
Deal Warehouse
Deal Warehouse
Sundry Wholesale
Sundry Wholesale
Sundry Wholesale
Sundry Wholesale
Sundry Wholesale
Sundry Wholesale
Sundry Wholesale
Market Discounters
Market Discounters
Market Discounters
Market Discounters
Market Discounters
Market Discounters
Market Discounters
Market Suppliers
Market Suppliers
Market Suppliers
Market Suppliers
Market Suppliers
Market Suppliers
Market Suppliers
...
(4000 rows)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Store15
Store48
Store39
Store41
Store12
Store46
Store50
Store15
Store48
Store39
Store41
Store12
Store46
Store50
Store15
Store48
Store39
Store41
Store12
Store46
Store50
Store15
Store48
Store39
...
This example's output is truncated because this particular cross join returned several thousand
rows. See also Cross Joins (page 321).
Subqueries
A subquery is a query nested within another query. In the following example, we want a list of all
products containing the highest fat content. The inner query (subquery) returns the product
containing the highest fat content among all food products to the outer query block (containing
query). The outer query then uses that information to return the names of the products containing
the highest fat content.
=> SELECT product_description, fat_content FROM public.product_dimension
WHERE fat_content IN
(SELECT MAX(fat_content) FROM public.product_dimension
WHERE category_description = 'Food' AND department_description = 'Bakery')
LIMIT 10;
product_description
| fat_content
-------------------------------------+------------Brand #59110 hotdog buns
|
90
Brand #58107 english muffins
|
90
Brand #57135 english muffins
|
90
Brand #54870 cinnamon buns
|
90
Brand #53690 english muffins
|
90
Brand #53096 bagels
|
90
Brand #50678 chocolate chip cookies |
90
Brand #49269 wheat bread
|
90
Brand #47156 coffee cake
|
90
-288-
Writing Queries
Brand #43844 corn muffins
(10 rows)
|
90
For more information, see Subqueries (page 290).
Sorting Queries
Use the ORDER BY clause to order the rows that a query returns.
Special Note About Query Results
You could get different results running certain queries on one machine or another for the following
reasons:

Partitioning on a FLOAT type could return nondeterministic results because of the precision,
especially when the numbers are close to one another, such as results from the RADIANS()
function, which has a very small range of output.
To get deterministic results, use NUMERIC if you must partition by data that is not an INTEGER
type.

Most analytics (with analytic aggregations, such as MIN()/MAX()/SUM()/COUNT()/AVG()
as exceptions) rely on a unique order of input data to get deterministic result. If the analytic
window-order (page 336) clause cannot resolve ties in the data, results could be different
each time you run the query.
For example, in the following query, the analytic ORDER BY does not include the first column in
the query, promotion_key. So for a tie of AVG(RADIANS(cost_dollar_amount)),
product_version, the same promotion_key could have different positions within the
analytic partition, resulting in a different NTILE() number. Thus, DISTINCT could also have a
different result:
=> SELECT COUNT(*) FROM
(SELECT DISTINCT
SIN(FLOOR(MAX(store.store_sales_fact.promotion_key))),
NTILE(79) OVER(PARTITION BY AVG (RADIANS
(store.store_sales_fact.cost_dollar_amount ))
ORDER BY store.store_sales_fact.product_version)
FROM store.store_sales_fact
GROUP BY store.store_sales_fact.product_version,
store.store_sales_fact.sales_dollar_amount ) AS store;
count
------1425
(1 row)
If you add MAX(promotion_key) to analytic ORDER BY, the results are the same on any
machine:
=> SELECT COUNT(*) FROM (SELECT DISTINCT
MAX(store.store_sales_fact.promotion_key),
NTILE(79) OVER(PARTITION BY
MAX(store.store_sales_fact.cost_dollar_amount)
ORDER BY store.store_sales_fact.product_version,
MAX(store.store_sales_fact.promotion_key ))
-289-
Programmer's Guide
FROM store.store_sales_fact
GROUP BY store.store_sales_fact.product_version,
store.store_sales_fact.sales_dollar_amount) AS store;
Subqueries
Subqueries provide a great deal of flexibility to SQL statements by letting you perform in one step
what, otherwise, would require several steps. For example, instead of having to write separate
queries to answer multiple-part questions, you can write a subquery.
A subquery is a SELECT statement within another SELECT statement. The inner statement is the
subquery, and the outer statement is the containing statement (often referred to in HP Vertica as
the outer query block).
Like any query, a subquery returns records from a table that could consist of a single column and
record, a single column with multiple records, or multiple columns and records. Queries can be
noncorrelated or correlated (page 303). You can even use them to update or delete (page 305)
records in a table based on values that are stored in other database tables.
Notes
 Many examples in this section use the VMart example database.
 Be sure to read Subquery Restrictions (page 314).
Subqueries Used in Search Conditions
Subqueries are used as search conditions in order to filter results. They specify the conditions for
the rows returned from the containing query's select-list, a query expression, or the subquery
itself. The operation evaluates to TRUE, FALSE, or UNKNOWN (NULL).
Syntax
< search_condition > {
[ { AND | OR [ NOT ] } { < predicate > | ( < search_condition > ) } ]
} [ ,... ]
< predicate >
{ expression comparison-operator expression
... | string-expression [ NOT ] { LIKE | ILIKE | LIKEB | ILIKEB }
string-expression
... | expression IS [ NOT ] NULL
... | expression [ NOT ] IN ( subquery | expression [ ,...n ] )
... | expression comparison-operator [ ANY | SOME ] ( subquery )
... | expression comparison-operator ALL ( subquery )
... | expression OR ( subquery )
... | [ NOT ] EXISTS ( subquery )
... | [ NOT ] IN ( subquery )
}
Parameters
<search-condition>
Specifies the search conditions for the rows returned from one
of the:
-290-
Writing Queries



containing query's select-list
a query expression
a subquery
If the subquery is used with an UPDA TE or DE LE TE statement,
UPDA TE specifies the rows to update and DE LE TE specifies
the rows to delete.
{ AND | OR | NOT }
Keywords that specify the logical operators that combine
conditions, or in the case of NOT, negate conditions.
 AND — Combines two conditions and evaluates to
TRUE when bot h of the conditions are TRUE.
 OR — Combines two conditions and evaluates to
TRUE when either condition is TRUE.
 NOT — Negates the Boolean expression specified by
the predicate.
<predicate>
Is an expression that returns TRUE, FALSE, or UNK NOWN
(NULL).
expression
Can be a column name, a constant, a function, a scalar
subquery, or a combination of column names, constants, and
functions connected by operators or subqueries.
comparison-operator
Test conditions bet ween expressions:
 < tests the condition of one expression being less than
the other.
 > tests the condition of one expression being greater
than the other.
 <= tests the condition of one expression being less
than or equal to the other expression.
 >= tests the condition of one expression being greater
than or equal to the other expression.
 = tests the equality between two expressions.
 <=> tests equality like the = operator, but it returns
TRUE instead of UNKNOWN if both operands are
UNK NOWN and FA LSE instead of UNKNOWN if one
operand is UNK NOWN.
 <> and != test the condition of two expressions not
equal to one anot her.
string_expression
Is a character string wit h optional wildcard (*) characters.
[ NOT ] { LIKE | ILIKE |
LIKEB | ILIKEB }
Indicates that the character string following the predicate is to
be used (or not used) for pattern matching.
IS [ NOT ] NULL
Searches for values that are null or are not null.
ALL
Is used with a comparison operator and a subquery. Returns
TRUE for the lefthand predicat e if all values returned by the
subquery satisfy the comparison operation, or FALSE if not all
values satisfy the comparison or if the subquery returns no rows
to the outer query block.
-291-
Programmer's Guide
ANY | SOME
ANY and SOME are synonyms and are used with a comparison
operator and a subquery. Either returns TRUE for the lefthand
predicate if any value returned by the subquery satisfies the
comparis on operation, or FALSE if no values in the subquery
satisfy the comparison or if the subquery returns no rows to the
outer query block. Otherwise, the expression is UNK NOWN.
[ NOT ] EXISTS
Used wit h a subquery to test for the existence of records that
the subquery returns.
[ NOT ] IN
Searches for an expression on the basis of an expression's
exclusion or inclusion from a list. The list of values is enclosed
in parentheses and can be a subquery or a set of constants.
Logical Operators AND and OR
The AND and OR logical operators combine two conditions. AND evaluates to TRUE when both of
the conditions joined by the AND keyword are matched, and OR evaluates to TRUE when either
condition joined by OR is matched.
OR subqueries (complex expressions)
HP Vertica supports subqueries in more complex expressions using OR; for example:



More than one subquery in the conjunct expression:
(SELECT MAX(b) FROM t1) + SELECT (MAX FROM t2)
a IN (SELECT a FROM t1) OR b IN (SELECT x FROM t2)
An OR clause in the conjunct expression involves at least one subquery:
a IN (SELECT a FROM t1) OR b IN (SELECT x FROM t2)
a IN (SELECT a from t1) OR b = 5
a = (SELECT MAX FROM t2) OR b = 5
One subquery is present but it is part of a another expression:
x IN (SELECT a FROM t1) = (x = (SELECT MAX FROM t2)
(x IN (SELECT a FROM t1) IS NULL
How AND queries are evaluated
HP Vertica treats expressions separated by AND (conjunctive) operators individually. For example
if the WHERE clause were:
WHERE (a IN (SELECT a FROM t1) OR b IN (SELECT x FROM t2)) AND (c IN (SELECT a FROM t1))
the query would be interpreted as two conjunct expressions:
1 (a IN (SELECT a FROM t1) OR b IN (SELECT x FROM t2))
2 (c IN (SELECT a FROM t1))
The first expression is considered a complex subquery, whereas the second expression is not.
Examples
The following list shows some of the ways you can filter complex conditions in the WHERE clause:
-292-
Writing Queries












OR expression between a subquery and a non-subquery condition:
=> SELECT x FROM t WHERE x > (SELECT SUM(DISTINCT x) FROM t GROUP BY y)
OR x < 9;
OR expression between two subqueries:
=> SELECT * FROM t WHERE x=(SELECT x FROM t) OR EXISTS(SELECT x FROM tt);
Subquery expression:
=> SELECT * FROM t WHERE x=(SELECT x FROM t)+1 OR x<>(SELECT x FROM t)+1;
OR expression with [NOT] IN subqueries:
=> SELECT * FROM t WHERE NOT (EXISTS (SELECT x FROM t)) OR x >9;
OR expression with IS [NOT] NULL subqueries:
=> SELECT * FROM t WHERE (SELECT * FROM t)IS NULL OR (SELECT * FROM tt)IS
NULL;
OR expression with boolean column and subquery that returns Boolean data type:
=> SELECT * FROM t2 WHERE x = (SELECT x FROM t2) OR x;
Note: To return TRUE, the argument of OR must be a Boolean data type.
OR expression in the CASE statement:
=> SELECT * FROM t WHERE CASE WHEN x=1 THEN x > (SELECT * FROM t)
OR x < (SELECT * FROM t2) END ;
Analytic function, NULL-handling function, string function, math function, and so on:
=> SELECT x FROM t WHERE x > (SELECT COALESCE (x,y) FROM t GROUP BY x,y)
OR
x < 9;
In user-defined functions (assuming f() is one):
=> SELECT * FROM t WHERE x > 5 OR x = (SELECT f(x) FROM t);
Use of parentheses at different places to restructure the queries:
=> SELECT x FROM t WHERE (x = (SELECT x FROM t) AND y = (SELECT y FROM
t))
OR (SELECT x FROM t) =1;
Multicolumn subqueries:
=> SELECT * FROM t WHERE (x,y) = (SELECT x,y FROM t) OR x > 5;
Constant/NULL on lefthand side of subquery:
=> SELECT * FROM t WHERE x > 5 OR 5 = (SELECT x FROM t);
See Also
Subquery Restrictions (page 314)
In Place of an Expression
Subqueries that return a single value (unlike a list of values returned by IN subqueries) can be
used just about anywhere an expression is allowed in SQL. It can be a column name, a constant,
a function, a scalar subquery, or a combination of column names, constants, and functions
connected by operators or subqueries.
For example:
-293-
Programmer's Guide
=> SELECT c1 FROM t1
=> SELECT c1 FROM t1
=> SELECT c1 FROM t1
COALESCE((t1.c1
WHERE c1 = ANY (SELECT c1 FROM t2) ORDER BY c1;
WHERE COALESCE((t1.c1 > ANY (SELECT c1 FROM t2)), TRUE);
GROUP BY c1 HAVING
<> ALL (SELECT c1 FROM t2)), TRUE);
Multi-column expressions are also supported:
=> SELECT c1 FROM t1 WHERE (t1.c1, t1.c2) = ALL (SELECT c1, c2 FROM t2);
=> SELECT c1 FROM t1 WHERE (t1.c1, t1.c2) <> ANY (SELECT c1, c2 FROM t2);
HP Vertica returns an error on queries where more than one row would be returned by any
subquery used as an expression:
=> SELECT c1 FROM t1 WHERE c1 = (SELECT c1 FROM t2) ORDER BY c1;
ERROR: more than one row returned by a subquery used as an expression
See Also
Subquery Restrictions (page 314)
Comparison Operators
HP Vertica supports Boolean subquery expressions in the WHERE clause with any of the
following operators: (>, <, >=, <=, =, <>, <=>).
WHERE clause subqueries filter results and take the following form:
SELECT <column, ...>
FROM <table>
WHERE <condition> (SELECT <column, ...> FROM <table> WHERE <condition>);
These conditions are available for all data types where comparison makes sense. All comparison
operators are binary operators that return values of TRUE, FALSE, or UNKNOWN (NULL).
Expressions that correlate to just one outer table in the outer query block are supported, and these
correlated expressions can be comparison operators.
The following subquery scenarios are supported:
SELECT * FROM T1 WHERE T1.x = (SELECT MAX(c1) FROM T2);
SELECT * FROM T1 WHERE T1.x >= (SELECT MAX(c1) FROM T2 WHERE T1.y = T2.c2);
SELECT * FROM T1 WHERE T1.x <= (SELECT MAX(c1) FROM T2 WHERE T1.y = T2.c2);
See Also
Subquery Restrictions (page 314)
LIKE Pattern Matching
HP Vertica supports LIKE pattern-matching conditions in subqueries and take the following form:
string-expression [ NOT ] { LIKE | ILIKE | LIKEB | ILIKEB } string-expression
The following command searches for customers whose company name starts with "Ev" and
returns the total count:
-294-
Writing Queries
=> SELECT COUNT(*) FROM customer_dimension WHERE customer_name LIKE
(SELECT 'Ev%' FROM customer_dimension LIMIT 1);
count
------153
(1 row)
HP Vertica also supports single-row subqueries as the pattern argument for LIKEB and ILIKEB
predicates; for example:
=> SELECT * FROM t1 WHERE t1.x LIKEB (SELECT t2.x FROM t2);
The following symbols are substitutes for the LIKE keywords:
~~
~#
~~*
~#*
!~~
!~#
!~~*
!~#*
LIKE
LIKEB
ILIKE
ILIKEB
NOT LIKE
NOT LIKEB
NOT ILIKE
NOT IILIKEB
Note: The ESCAPE keyword is not valid for the above symbols.
See LIKE-predicate in the SQL Reference Manual for additional examples.
ANY (SOME) and ALL
Normally, you use operators like equal and greater-than only on subqueries that return one row.
With ANY and ALL, however, comparisons can be made on subqueries that return multiple rows.
The ANY and ALL keywords let you specify whether any or all of the subquery values,
respectively, match the specified condition.
These subqueries take the following form:
expression comparison-operator { ANY | SOME } ( subquery )
expression comparison-operator ALL ( subquery )
Notes
 The keyword SOME is an alias for ANY.
 IN is equivalent to = ANY.
 NOT IN is equivalent to <> ALL.
ANY subqueries
Subqueries that use the ANY keyword yield a Boolean result when any value retrieved in the
subquery matches the value of the lefthand expression.
Expression
Returns
> ANY(1,10,100)
Returns TRUE for any value > 1 (greater than at least one value
or greater than the minimum value)
-295-
Programmer's Guide
< ANY(1,10,100)
Returns TRUE for any value < 100 (less than at least one value
or less than the maximum value)
= ANY(1,10,100)
Returns TRUE for any value = 1 or 10 or 100 (equals any of the
values)
ANY subquery examples
 An ANY subquery within an expression. Note that the second statement uses the SOME
keyword:
=> SELECT c1 FROM t1 WHERE COALESCE((t1.c1 > ANY (SELECT c1 FROM t2)),
TRUE);
=> SELECT c1 FROM t1 WHERE COALESCE((t1.c1 > SOME (SELECT c1 FROM t2)),
TRUE);
 ANY noncorrelated subqueries without aggregates:
=> SELECT c1 FROM t1 WHERE c1 = ANY (SELECT c1 FROM t2) ORDER BY c1;
Note that omitting the ANY keyword returns an error because more than one row would be
returned by the subquery used as an expression:
=> SELECT c1 FROM t1 WHERE c1 = (SELECT c1 FROM t2) ORDER BY c1;
 ANY noncorrelated subqueries with aggregates:
=> SELECT c1 FROM t1 WHERE c1 <> ANY (SELECT MAX(c1) FROM t2) ORDER BY
c1;
=> SELECT c1 FROM t1 GROUP BY c1 HAVING c1 <> ANY (SELECT MAX(c1) FROM
t2)
ORDER BY c1;
 ANY noncorrelated subqueries with aggregates and a GROUP BY clause:
=> SELECT c1 FROM t1 WHERE c1 <> ANY (SELECT MAX(c1) FROM t2 GROUP BY
c2)
ORDER BY c1;
 ANY noncorrelated subqueries with a GROUP BY clause:
=> SELECT c1 FROM t1 WHERE c1 <=> ANY (SELECT c1 FROM t2 GROUP BY c1)
ORDER BY c1;
 ANY correlated subqueries with no aggregates or GROUP BY clause:
=> SELECT c1 FROM t1 WHERE c1 >= ANY (SELECT c1 FROM t2 WHERE t2.c2 =
t1.c2)
ORDER BY c1;
ALL subqueries
Subqueries that use the ALL keyword yield a Boolean result when all values retrieved in the
subquery match the specified condition of the lefthand expression.
Expression
Returns
> ALL(1,10,100)
Returns the expression value > 100 (greater than the maximum
value)
< ALL(1,10,100)
Returns the expression value < 1 (less than the minimum value)
-296-
Writing Queries
ALL subquery examples
Following are some examples of ALL (subquery):




ALL noncorrelated subqueries without aggregates:
=> SELECT c1 FROM t1 WHERE c1 >= ALL (SELECT c1 FROM t2) ORDER BY c1;
ALL noncorrelated subqueries with aggregates:
=> SELECT c1 FROM t1 WHERE c1 = ALL (SELECT MAX(c1) FROM t2) ORDER BY
c1;
=> SELECT c1 FROM t1 GROUP BY c1 HAVING c1 <> ALL (SELECT MAX(c1) FROM
t2)
ORDER BY c1;
ALL noncorrelated subqueries with aggregates and a GROUP BY clause:
=> SELECT c1 FROM t1 WHERE c1 <= ALL (SELECT MAX(c1) FROM t2 GROUP BY
c2)
ORDER BY c1;
ALL noncorrelated subqueries with a GROUP BY clause:
=> SELECT c1 FROM t1 WHERE c1 <> ALL (SELECT c1 FROM t2 GROUP BY c1)
ORDER BY c1;
See Also
Subquery Restrictions (page 314)
EXISTS and NOT EXISTS
The EXISTS predicate is one of the most common predicates used to build conditions that use
noncorrelated and correlated subqueries. Use EXISTS to identify the existence of a relationship
without regard for the quantity. For example, EXISTS returns true if the subquery returns any
rows, and NOT EXISTS returns true if the subquery returns no rows.
[NOT] EXISTS subqueries take the following form:
expression [ NOT ] EXISTS ( subquery )
The EXISTS condition is considered to be met if the subquery returns at least one row. Since the
result depends only on whether any records are returned, and not on the contents of those
records, the output list of the subquery is normally uninteresting. A common coding convention is
to write all EXISTS tests as follows:
EXISTS (SELECT 1 WHERE ...)
In the above fragment, SELECT 1 returns the value 1 for every record in the query. If the query
returns, for example, five records, it returns 5 ones. The system doesn't care about the real values
in those records; it just wants to know if a row is returned.
Alternatively, a subquery’s select list that uses EXISTS might consist of the asterisk (*). You do not
need to specify column names, because the query tests for the existence or nonexistence of
records that meet the conditions specified in the subquery.
-297-
Programmer's Guide
EXISTS (SELECT * WHERE ...)
Notes
 If EXISTS (subquery) returns at least 1 row, the result is TRUE.
 If EXISTS (subquery) returns no rows, the result is FALSE.
 If NOT EXISTS (subquery) returns at least 1 row, the result is FALSE.
 If NOT EXISTS (subquery) returns no rows, the result is TRUE.
Examples
The following query retrieves the list of all the customers who purchased anything from any of the
stores amounting to more than 550 dollars:
=> SELECT customer_key, customer_name, customer_state
FROM public.customer_dimension WHERE EXISTS
(SELECT 1 FROM store.store_sales_fact
WHERE customer_key = public.customer_dimension.customer_key
AND sales_dollar_amount > 550)
AND customer_state = 'MA' ORDER BY customer_key;
customer_key |
customer_name
| customer_state
--------------+--------------------+---------------14818 | William X. Nielson | MA
18705 | James J. Goldberg | MA
30231 | Sarah N. McCabe
| MA
48353 | Mark L. Brown
| MA
(4 rows)
Whether you use EXISTS or IN subqueries depends on which predicates you select in outer and
inner query blocks. For example, to get a list of all the orders placed by all stores on January 2,
2003 for vendors with records in the vendor table:
=> SELECT store_key, order_number, date_ordered
FROM store.store_orders_fact WHERE EXISTS
(SELECT 1 FROM public.vendor_dimension
WHERE public.vendor_dimension.vendor_key =
store.store_orders_fact.vendor_key)
AND date_ordered = '2003-01-02';
store_key | order_number | date_ordered
-----------+--------------+-------------37 |
2559 | 2003-01-02
16 |
552 | 2003-01-02
35 |
1156 | 2003-01-02
13 |
3885 | 2003-01-02
25 |
554 | 2003-01-02
21 |
2687 | 2003-01-02
49 |
3251 | 2003-01-02
19 |
2922 | 2003-01-02
26 |
1329 | 2003-01-02
40 |
1183 | 2003-01-02
(10 rows)
-298-
Writing Queries
The above query looks for existence of the vendor and date ordered. To return a particular value,
rather than simple existence, the query looks for orders placed by the vendor who got the best
deal on January 4, 2004:
=> SELECT store_key, order_number, date_ordered
FROM store.store_orders_fact ord, public.vendor_dimension vd
WHERE ord.vendor_key = vd.vendor_key AND vd.deal_size IN
(SELECT MAX(deal_size) FROM public.vendor_dimension)
AND date_ordered = '2004-01-04';
store_key | order_number | date_ordered
-----------+--------------+-------------166 |
36008 | 2004-01-04
113 |
66017 | 2004-01-04
198 |
75716 | 2004-01-04
27 |
150241 | 2004-01-04
148 |
182207 | 2004-01-04
9 |
188567 | 2004-01-04
45 |
202416 | 2004-01-04
24 |
250295 | 2004-01-04
121 |
251417 | 2004-01-04
(9 rows)
See Also
Subquery Restrictions (page 314)
IN and NOT IN
While you cannot equate a single value to a set of values, you can check to see if a single value is
found within that set of values. Use the IN clause for multiple-record, single-column subqueries.
After the subquery returns results introduced by IN or NOT IN, the outer query uses them to return
the final result.
[NOT] IN subqueries take the following form:
{ expression [ NOT ] IN ( subquery )
| expression [ NOT ] IN ( expression ) }
There is no limit to the number of parameters passed to the IN clause of the SELECT statement;
for example:
=> SELECT * FROM tablename WHERE column IN (a, b, c, d, e, ...);
HP Vertica also supports queries where two or more outer expressions refer to different inner
expressions:
=> SELECT * FROM A WHERE (A.x,A.x) IN (SELECT B.x, B.y FROM B);
Examples
The following query uses the Vmart schema to illustrate the use of outer expressions referring to
different inner expressions:
=> SELECT product_description, product_price FROM product_dimension
-299-
Programmer's Guide
WHERE (product_dimension.product_key, product_dimension.product_key) IN
(SELECT store.store_orders_fact.order_number,
store.store_orders_fact.quantity_ordered
FROM store.store_orders_fact);
product_description
| product_price
-----------------------------+--------------Brand #72 box of candy
|
326
Brand #71 vanilla ice cream |
270
(2 rows)
To find all products supplied by stores in MA, first create the inner query and run it to ensure that it
works as desired. The following query returns all stores located in MA:
=> SELECT store_key FROM store.store_dimension WHERE store_state = 'MA';
store_key
----------13
31
(2 rows)
Then create the outer or main query that specifies all distinct products that were sold in stores
located in MA. This statement combines the inner and outer queries using the IN predicate:
=> SELECT DISTINCT s.product_key, p.product_description
FROM store.store_sales_fact s, public.product_dimension p
WHERE s.product_key = p.product_key
AND s.product_version = p.product_version
AND s.store_key IN
(SELECT store_key
FROM store.store_dimension
WHERE store_state = 'MA')
ORDER BY s.product_key;
product_key |
product_description
-------------+--------------------------------------1 | Brand #1 white bread
1 | Brand #4 vegetable soup
3 | Brand #9 wheelchair
5 | Brand #15 cheddar cheese
5 | Brand #19 bleach
7 | Brand #22 canned green beans
7 | Brand #23 canned tomatoes
8 | Brand #24 champagne
8 | Brand #25 chicken nuggets
11 | Brand #32 sausage
...
...
(281 rows)
When using NOT IN, the subquery returns a list of zero or more values in the outer query where
the comparison column does not match any of the values returned from the subquery. Using the
previous example, NOT IN returns all the products that are not supplied from MA.
-300-
Writing Queries
Notes
HP Vertica supports multicolumn NOT IN subqueries in which the columns are not marked NOT
NULL. Previously, marking the columns NOT NULL was a requirement; now, if one of the columns
is found to contain a a NULL value during query execution, HP Vertica returns a run-time error.
Similarly, IN subqueries nested within another expression are not supported if any of the column
values are NULL. For example, if in the following statement column x from either table contained a
NULL value, Vertica would return a run-time error:
=> SELECT * FROM t1 WHERE (x IN (SELECT x FROM t2)) IS FALSE;
ERROR: NULL value found in a column used by a subquery
See Also
Subquery Restrictions (page 314)
IN-predicate in the SQL Reference Manual
Subqueries in the SELECT List
Subqueries can occur in the select list of the containing query. The results from the following
statement are ordered by the first column (customer_name). You could also write ORDER BY 2 and
specify that the results be ordered by the select-list subquery.
=> SELECT c.customer_name, (SELECT AVG(annual_income) FROM customer_dimension
WHERE deal_size = c.deal_size) AVG_SAL_DEAL FROM customer_dimension c
ORDER BY 1;
customer_name | AVG_SAL_DEAL
---------------+-------------Goldstar
|
603429
Metatech
|
628086
Metadata
|
666728
Foodstar
|
695962
Verihope
|
715683
Veridata
|
868252
Bettercare
|
879156
Foodgen
|
958954
Virtacom
|
991551
Inicorp
|
1098835
...
Notes
 Scalar subqueries in the select-list return a single row/column value. These subqueries use
Boolean comparison operators: =, >, <, <>, <=, >=.
If the query is correlated, it returns NULL if the correlation results in 0 rows. If the query returns
more than one row, the query errors out at run time and HP Vertica displays an error message
that the scalar subquery must only return 1 row.
-301-
Programmer's Guide

Subquery expressions such as [NOT] IN, [NOT] EXISTS, ANY/SOME, or ALL always return a
single Boolean value that evaluates to TRUE, FALSE, or UNKNOWN; the subquery itself can
have many rows. Most of these queries can be correlated or noncorrelated.
Note: ALL subqueries cannot be correlated.

Subqueries in the ORDER BY and GROUP BY clauses are supported; for example, the
following statement says to order by the first column, which is the select-list subquery:
SELECT (SELECT MAX(x) FROM t2 WHERE y=t1.b) FROM t1 ORDER BY 1;
See Also
Subquery Restrictions (page 314)
WITH Clauses in SELECT
While not strictly subqueries, WITH clauses are concomitant queries within a larger, primary query.
HP Vertica evaluates each WITH clause only once while executing the primary query. You can
reference the results of any evaluated WITH clause during the primary query transaction, as if the
results existed in a temporary table using the WITH query name. Each WITH clause query must
have a unique name. Attempting to use same-name aliases for WITH clause query names causes
an error.
You can also use the results of a previously evaluated WITH clause in any subsequent WITH
clause or select statement. Combining WITH clause results in another WITH query lets you
successively use the results of evaluated WITH clauses (the limit of evaluated WITH clauses is
undefined). WITH clauses do not support INSERT, DELETE, and UPDATE statements, and you
cannot use them recursively.
Using WITH Clauses
This example illustrates the use of two WITH clauses evaluated before the primary query,
regional_sales and top_regions. Following are the evaluation and usage steps:
1 The regional_sales clause evaluates its select statement from table orders. After
evaluation, each of the columns selected in this WITH clause can be queried or used as part of
a later WITH clause.
2 The top_regions clause is evaluated using values from results obtained in the
regional_sales clause.
3 The primary query is then evaluated using the results obtained in the top_regions clause, and
accessed as if top_regions is a temporary table.
-- Begin WITH clauses,
-- First WITH clause,regional_sales
WITH
regional_sales AS (
SELECT region, SUM(amount) AS total_sales
FROM orders
-302-
Writing Queries
GROUP BY region),
-- Second WITH clause top_regions
top_regions AS (
SELECT region
FROM regional_sales
WHERE total_sales > (SELECT SUM (total_sales)/10 FROM regional_sales) )
-- End defining WITH clause statement
-- Begin main primary query
SELECT region,
product,
SUM(quantity) AS product_units,
SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;
See Also
Subquery Restrictions (page 314)
WITH Clause in the SQL Reference Manual
Noncorrelated and Correlated Subqueries
A class of queries is evaluated by running the subquery once and then substituting the resulting
value or values into the WHERE clause of the outer query. These self-contained queries are called
noncorrelated (simple) subqueries; you can run them by themselves and inspect the results
independent of their containing statements. A correlated subquery, however, is dependent on its
containing statement, from which it references one or more columns.
See the following table for examples of the two subquery types:
Noncorrelated (simple)
SELECT name, street, city, state
FROM addresses as ADD
WHERE state IN
(SELECT state FROM states);
Correlated
SELECT name, street, city, state
FROM addresses as ADD
WHERE EXISTS
(SELECT * FROM states as ST
WHERE ST.state = ADD.state);
The subquery (SELECT state FROM states)
is independent from the containing query. It
is evaluated first and its results are passed
to the outer query block.
The subquery needs values from the state
column in containing query, and results are
then passed to the outer query block. The
subquery is evaluated for every record of the
outer block because the column is being
used in the subquery.
-303-
Programmer's Guide
The difference between noncorrelated (simple) subqueries and correlated subqueries is that in
simple subqueries, the containing (outer) query only has to take action on the results from the
subquery (inner query). In a correlated subquery, the outer query block provides values for
subquery to use in its evaluation.
Notes
 You can use an outer join to obtain the same effect as a correlated subquery.
 Arbitrary uncorrelated queries are permitted in the WHERE clause as single-row expressions;
for example:
=> SELECT COUNT(*) FROM SubQ1 WHERE SubQ1.a = (SELECT y from SubQ2);
 Noncorrelated queries in the HAVING clause as single-row expressions are permitted; for
example:
=> SELECT COUNT(*) FROM SubQ1 GROUP BY SubQ1.a
HAVING SubQ1.a = (SubQ1.a & (SELECT y from SubQ2));
See Also
Subquery Restrictions (page 314)
Flattening FROM Clause Subqueries and Views
FROM clause subquery are always evaluated before their containing query. For example, in the
following statement, the HP Vertica optimizer must evaluate all records in table t1 before it can
evaluate the records in table t0:
=> SELECT * FROM (SELECT a, MAX(a) AS max FROM ( SELECT * FROM t1) AS t0 GROUP BY
a);
In an optimization called subquery flattening, some FROM clause subqueries are flattened into the
containing query, improving the performance of subquery-containing queries.
Using the above query, HP Vertica can internally flatten it as follows:
=> SELECT * FROM (SELECT a, MAX(a) FROM t1 GROUP BY a) AS t0;
Both queries return the same results, but the flattened query runs more quickly.
The optimizer will choose the flattening plan if the subquery or view does not contain the following:








Aggregates
Analytics
An outer join (left, right or full)
GROUP BY, ORDER BY, or HAVING clause
DISTINCT keyword
LIMIT or OFFSET clause
UNION, EXCEPT, or INTERSECT clause
EXISTS subquery
-304-
Writing Queries
To see if a FROM clause subquery has been flattened, inspect the query plan, as described in
EXPLAIN in the SQL Reference Manual. Typically, the number of value expression nodes
(ValExpNode) decreases after flattening.
Flattening views
When you specify a view in a FROM clause query, HP Vertica first replaces the view name with the
view definition query, creating further opportunities for subquery flattening. This process is called
view flattening and works the same way as subquery flattening. See Implementing Views in the
Administrator's Guide for additional details about views.
Examples
If you have a predicate that applies to a view or subquery, the flattening operation can allow for
optimizations by evaluating the predicates before the flattening takes place. In this example,
without flattening, HP Vertica must first evaluate the subquery, and only then can the predicate
WHERE x > 10 be applied. In the flattened subquery, HP Vertica applies the predicate before
evaluating the subquery, thus reducing the amount of work for the optimizer because it returns
only the records WHERE x > 10 to the containing query.
Assume that view v1 is defined as follows:
=> CREATE VIEW v1 AS SELECT * FROM a;
You enter the following query:
=> SELECT * FROM v1 JOIN b ON x=y WHERE x > 10;
HP Vertica internally transforms the above query as follows:
=> SELECT * FROM (SELECT * FROM a) AS t1 JOIN b ON x=y WHERE x > 10;
And the flattening mechanism gives you the following:
=> SELECT * FROM a JOIN b ON x=y WHERE x > 10;
The following example is how HP Vertica transforms FROM clause subqueries within a WHERE
clause IN subquery:


Original query: SELECT * FROM a WHERE b IN (SELECT b FROM (SELECT * FROM t2)
AS D WHERE x=1;
Flattened query: SELECT * FROM a WHERE b IN (SELECT b FROM t2) AS D WHERE x=1;
See Also
Subquery Restrictions (page 314)
Subqueries in UPDATE and DELETE Statements
You can nest subqueries within UPDATE and DELETE statements.
-305-
Programmer's Guide
UPDATE subqueries
If you want to update records in a table based on values that are stored in other database tables,
you can nest a subquery within an UPDATE statement. See also UPDATE in the SQL Reference
Manual.
Syntax
UPDATE [[db-name.]schema.]table
SET column = { expression | DEFAULT } [, ...]
[ FROM from-list ]
[ WHERE Clause ]
Semantics
UPDATE changes the values of the specified columns in all rows that satisfy the
condition. Only the columns to be modified need to be specified in the SET
clause. Columns that are not explicitly modified retain their previous values.
Outputs
On successful completion, an update operation returns a count, which
represents the number of rows updated. A count of 0 is not an error; it means
that no rows matched the condition.
Performance tip
To be eligible for DELETE optimization, all target table columns referenced in a DELETE or UPDATE
statement's WHERE clause must be in the projection definition.
For example, the following simple schema has two tables and three projections:
CREATE TABLE tb1 (a INT, b INT, c INT, d INT);
CREATE TABLE tb2 (g INT, h INT, i INT, j INT);
The first projection references all columns in tb1 and sorts on column a:
CREATE PROJECTION tb1_p AS SELECT a, b, c, d FROM tb1 ORDER BY a;
The buddy projection references and sorts on column a in tb1:
CREATE PROJECTION tb1_p_2 AS SELECT a FROM tb1 ORDER BY a;
This projection references all columns in tb2 and sorts on column i:
CREATE PROJECTION tb2_p AS SELECT g, h, i, j FROM tb2 ORDER BY i;
Consider the following DML statement, which references tb1.a in its WHERE clause. Since both
projections on tb1 contain column a, both are eligible for the optimized DELETE:
DELETE FROM tb1 WHERE tb1.a IN (SELECT tb2.i FROM tb2);
Restrictions
Optimized DELETEs are not supported under the following conditions:


With pre-join projections on nodes that are down
With replicated and pre-join projections if subqueries reference the target table. For example,
the following syntax is not supported:
DELETE FROM tb1 WHERE tb1.a IN (SELECT e FROM tb2, tb2 WHERE tb2.e =
tb1.e);
-306-
Writing Queries

With subqueries that do not return multiple rows. For example, the following syntax is not
supported:
DELETE FROM tb1 WHERE tb1.a = (SELECT k from tb2);
Notes and restrictions
 The table specified in the UPDATE list cannot also appear in the FROM list (no self joins); for
example, the following is not allowed:
=> BEGIN;
=> UPDATE result_table
SET address='new' || r2.address
FROM result_table r2
WHERE r2.cust_id = result_table.cust_id + 10;
ERROR: Self joins in UPDATE statements are not allowed
DETAIL: Target relation result_table also appears in the FROM list
 If more than one row in a table to be updated matches the WHERE predicate, HP Vertica
returns an error specifying which row had more than one match.
UPDATE example
The following series of commands illustrate the use of subqueries in UPDATE statements; they all
use the following simple schema:
=> CREATE TABLE result_table(
cust_id INTEGER,
address VARCHAR(2000));
Enter some customer data:
=> COPY result_table FROM stdin delimiter ',' DIRECT;
20, Lincoln Street
30, Booth Hill Road
30, Beach Avenue
40, Mt. Vernon Street
50, Hillside Avenue
\.
Query the table you just created:
=> SELECT * FROM result_table;
cust_id |
address
---------+-------------------20 | Lincoln Street
30 | Beach Avenue
30 | Booth Hill Road
40 | Mt. Vernon Street
50 | Hillside Avenue
(5 rows)
Create a second table called new_addresses:
=> CREATE TABLE new_addresses(
new_cust_id integer,
new_address VARCHAR(200));
Enter some customer data.
-307-
Programmer's Guide
Note: The following COPY statement creates an entry for a customer ID with a value of 60,
which does not have a matching value in the result_table table:
=> COPY new_addresses FROM stdin delimiter ',' DIRECT;
20, Infinite Loop
30, Loop Infinite
60, New Addresses
\.
Query the new_addresses table:
=> SELECT * FROM new_addresses;
new_cust_id | new_address
-------------+---------------20 | Infinite Loop
30 | Loop Infinite
60 | New Addresses
(3 rows)
Commit the changes:
=> COMMIT;
In the following example, a noncorrelated subquery (page 303) is used to change the address
record in results_table to 'New Address' when the query finds a customer ID match in both
tables:
=> UPDATE result_table
SET address='New Address'
WHERE cust_id IN (SELECT new_cust_id FROM new_addresses);
The output returns the expected count indicating that three rows were updated:
OUTPUT
-------3
(1 row)
Now query the result_table table to see the changes for matching customer ID 20 and 30.
Addresses for customer ID 40 and 50 are not updated:
=> SELECT * FROM result_table;
cust_id |
address
---------+-----------------20 | New Address
30 | New Address
30 | New Address
40 | Mt. Vernon Street
50 | Hillside Avenue
(5 rows)
To preserve your original data, issue the ROLLBACK command:
=> ROLLBACK;
In the following example, a correlated subquery is used to replace all address records in the
results_table with the new_address record from the new_addresses table when the query
finds match on the customer ID in both tables:
-308-
Writing Queries
=> UPDATE result_table
SET address=new_addresses.new_address
FROM new_addresses
WHERE cust_id = new_addresses.new_cust_id;
Again, the output returns the expected count indicating that three rows were updated:
OUTPUT
-------3
(1 row)
Now query the result_table table to see the changes for customer ID 20 and 30. Addresses
for customer ID 40 and 50 are not updated, and customer ID 60 is omitted because there is no
match:
=> SELECT * FROM result_table;
cust_id |
address
---------+-----------------20 | Infinite Loop
30 | Loop Infinite
30 | Loop Infinite
40 | Mt. Vernon Street
50 | Hillside Avenue
(5 rows)
DELETE subqueries
If you want to delete records in a table based on values that are stored in other database tables,
you can nest a subquery within a DELETE statement. See also DELETE in the SQL Reference
Manual.
Syntax
DELETE FROM [[db-name.]schema.]table
WHERE Clause
Semantics
The DELETE operation deletes rows that satisfy the WHERE clause from the
specified table. If the WHERE clause is absent, all table rows are deleted. The
result is a valid, even though the statement leaves an empty table.
Outputs
On successful completion, a DELETE operation returns a count, which
represents the number of rows deleted. A count of 0 is not an error; it means that
no rows matched the condition.
DELETE example
The following series of commands illustrate the use of subqueries in DELETE statements; they all
use the following simple schema:
=>
=>
=>
=>
=>
=>
CREATE TABLE t (a INTEGER);
CREATE TABLE t2 (a INTEGER);
INSERT INTO t VALUES (1);
INSERT INTO t VALUES (2);
INSERT INTO t2 VALUES (1);
COMMIT;
-309-
Programmer's Guide
The following command deletes the expected row from table t:
=> DELETE FROM t WHERE t.a IN (SELECT t2.a FROM t2);
OUTPUT
-------1
(1 row)
Notice that table t now has only one row,instead of two:
=> SELECT * FROM t;
a
--2
(1 row)
To preserve the data for this example, issue the rollback command:
=> ROLLBACK;
The following command deletes the expected two rows:
=> DELETE FROM t WHERE EXISTS (SELECT * FROM t2);
OUTPUT
-------2
(1 row)
Now table t contains no rows:
=> SELECT * FROM t;
a
--(0 rows)
Roll back to the previous state and verify that you still have two rows:
=> ROLLBACK;
SELECT * FROM t;
a
--1
2
(2 rows)
The following command uses a correlated subquery to delete all rows in table t where t.a
matches a value of t2.a.
=> DELETE FROM t WHERE EXISTS (SELECT * FROM t2 WHERE t.a = t2.a);
OUTPUT
-------1
(1 row)
Query the table to verify the row was deleted:
=> SELECT * FROM t;
a
---
-310-
Writing Queries
2
(1 row)
Roll back to the previous state and query the table again:
=> ROLLBACK;
=> SELECT * FROM t;
a
--1
2
(2 rows)
See Also
Subquery Restrictions (page 314)
Subquery Examples
This topic illustrates some of the subqueries you can write. The examples use the Vmart example
database.
Single-row subqueries
Single-row subqueries are used with single-row comparison operators (=, >=, <=, <>, and <=>)
and return exactly one row.
For example, the following query retrieves the name and hire date of the oldest employee in the
Vmart database:
=> SELECT employee_key, employee_first_name, employee_last_name, hire_date
FROM employee_dimension
WHERE hire_date = (SELECT MIN(hire_date) FROM employee_dimension);
employee_key | employee_first_name | employee_last_name | hire_date
--------------+---------------------+--------------------+-----------2292 | Mary
| Bauer
| 1956-01-11
(1 row)
Multiple-row subqueries
Multiple-row subqueries return multiple records.
For example, the following IN clause subquery returns the names of the employees making the
highest salary in each of the six regions:
=> SELECT employee_first_name, employee_last_name, annual_salary,
employee_region
FROM employee_dimension WHERE annual_salary IN
(SELECT MAX(annual_salary) FROM employee_dimension GROUP BY employee_region)
ORDER BY annual_salary DESC;
employee_first_name | employee_last_name | annual_salary | employee_region
---------------------+--------------------+---------------+------------------Alexandra
| Sanchez
|
992363 | West
-311-
Programmer's Guide
Mark
Tiffany
Barbara
Sally
Wendy
(6 rows)
|
|
|
|
|
Vogel
Vu
Lewis
Gauthier
Nielson
|
|
|
|
|
983634
977716
957949
927335
777037
|
|
|
|
|
South
SouthWest
MidWest
East
NorthWest
Multicolumn subqueries
Multicolumn subqueries return one or more columns. Sometimes a subquery's result set is
evaluated in the containing query in column-to-column and row-to-row comparisons.
Note: Multicolumn subqueries can use the <>, !=, and = operators but not the <, >, <=, >=
operators.
You can substitute some multicolumn subqueries with a join, with the reverse being true as well.
For example, the following two queries ask for the sales transactions of all products sold online to
customers located in Massachusetts and return the same result set. The only difference is the first
query is written as a join and the second is written as a subquery.
Join query:
=> SELECT *
FROM online_sales.online_sales_fact
INNER JOIN public.customer_dimension
USING (customer_key)
WHERE customer_state = 'MA';
Subquery:
=> SELECT *
FROM online_sales.online_sales_fact
WHERE customer_key IN
(SELECT customer_key
FROM public.customer_dimension
WHERE customer_state = 'MA');
The following query returns all employees in each region whose salary is above the average:
=> SELECT e.employee_first_name, e.employee_last_name, e.annual_salary,
e.employee_region, s.average
FROM employee_dimension e,
(SELECT employee_region, AVG(annual_salary) AS average
FROM employee_dimension GROUP BY employee_region) AS s
WHERE e.employee_region = s.employee_region AND e.annual_salary > s.average
ORDER BY annual_salary DESC;
employee_first_name | employee_last_name | annu al_salary | employee_region |
average
---------------------+--------------------+---------------+-----------------+-----------------Doug
| Overstreet
|
995533 | East
| 61192.786013986
Matt
| Gauthier
|
988807 | South
| 57337.8638902996
Lauren
| Nguyen
|
968625 | West
| 56848.4274914089
Jack
| Campbell
|
963914 | West
| 56848.4274914089
William
| Martin
|
943477 | NorthWest
| 58928.2276119403
Luigi
| Campbell
|
939255 | MidWest
| 59614.9170454545
Sarah
| Brown
|
901619 | South
| 57337.8638902996
Craig
| Goldberg
|
895836 | East
| 61192.786013986
Sam
| Vu
|
889841 | MidWest
| 59614.9170454545
Luigi
| Sanchez
|
885078 | MidWest
| 59614.9170454545
Michael
| Weaver
|
882685 | South
| 57337.8638902996
Doug
| Pavlov
|
881443 | SouthWest
| 57187.2510548523
Ruth
| McNulty
|
874897 | East
| 61192.786013986
Luigi
| Dobisz
|
868213 | West
| 56848.4274914089
Laura
| Lang
|
865829 | East
| 61192.786013986
...
You can also use the EXCEPT, INTERSECT, and UNION [ALL] keywords in FROM, WHERE, and
HAVING clauses.
-312-
Writing Queries
The following subquery returns information about all Connecticut-based customers who bought
items through either stores or online sales channel and whose purchases amounted to more than
500 dollars:
=> SELECT DISTINCT customer_key, customer_name FROM public.customer_dimension
WHERE customer_key IN (SELECT customer_key FROM store.store_sales_fact
WHERE sales_dollar_amount > 500
UNION ALL
SELECT customer_key FROM online_sales.online_sales_fact
WHERE sales_dollar_amount > 500)
AND customer_state = 'CT';
customer_key | customer_name
--------------+-----------------200 | Carla Y. Kramer
733 | Mary Z. Vogel
931 | Lauren X. Roy
1533 | James C. Vu
2948 | Infocare
4909 | Matt Z. Winkler
5311 | John Z. Goldberg
5520 | Laura M. Martin
5623 | Daniel R. Kramer
6759 | Daniel Q. Nguyen
...
HAVING clause subqueries
A HAVING clause is used in conjunction with the GROUP BY clause to filter the select-list records
that a GROUP BY returns. HAVING clause subqueries must use Boolean comparison operators:
=, >, <, <>, <=, >= and take the following form:
SELECT <column, ...>
FROM <table>
GROUP BY <expression>
HAVING <expression>
(SELECT <column, ...>
FROM <table>
HAVING <expression>);
For example, the following statement uses the VMart database and returns the number of
customers who purchased lowfat products. Note that the GROUP BY clause is required because
the query uses an aggregate (COUNT).
=> SELECT s.product_key, COUNT(s.customer_key) FROM store.store_sales_fact s
GROUP BY s.product_key HAVING s.product_key IN
(SELECT product_key FROM product_dimension WHERE diet_type = 'Low Fat');
The subquery first returns the product keys for all low-fat products, and the outer query then
counts the total number of customers who purchased those products.
product_key | count
-------------+------15 |
2
41 |
1
66 |
1
-313-
Programmer's Guide
106
118
169
181
184
186
211
229
267
289
334
336
|
|
|
|
|
|
|
|
|
|
|
|
1
1
1
2
2
2
1
1
1
1
2
1
(15 rows)
Subquery Restrictions
The following list summarizes subquery restrictions in HP Vertica.








Subqueries are not allowed in the defining query of a CREATE PROJECTION statement.
Subqueries can be used in the select-list, but GROUP BY or aggregate functions are not
allowed in the query if the subquery is not part of the GROUP BY clause in the containing
query; for example, the following two statement returns an error message:
=> SELECT y, (SELECT MAX(a) FROM t1) FROM t2 GROUP BY y;
ERROR: subqueries in the SELECT or ORDER BY are not supported if the
subquery is not part of the GROUP BY
=> SELECT MAX(y), (SELECT MAX(a) FROM t1) FROM t2;
ERROR: subqueries in the SELECT or ORDER BY are not supported if the
query has aggregates and the subquery is not part of the GROUP BY
Subqueries are supported within UPDATE statements with the following exceptions:
 You cannot use SET column = {expression} to specify a subquery.
 The table specified in the UPDATE list cannot also appear in the FROM list (no self joins).
FROM clause subqueries require an alias but tables do not. If the table has no alias, the query
must refer to columns inside it as <table>.<column>; however, if the column names are
uniquely identified among all tables used by the query, then preceding the column with a table
name is not enforced.
If the ORDER BY clause is inside a FROM clause subquery, rather than in the containing
query, the query could return unexpected sort results. This is because HP Vertica data comes
from multiple nodes, so sort order cannot be guaranteed unless an ORDER BY clause is
specified in the outer query block. This behavior is compliant with the SQL standard but it
might differ from other databases.
Multicolumn subqueries cannot use the <, >, <=, >= comparison operators. They can use <>,
!=, and = operators.
WHERE and HAVING clause subqueries must use Boolean comparison operators: =, >, <, <>,
<=, >=. Those subqueries can be noncorrelated and correlated.
[NOT] IN and ANY subqueries nested within another expression are not supported if any of the
column values are NULL. In the following statement, for example, if column x from either table
t1 or t2 contains a NULL value, HP Vertica returns a run-time error:
-314-
Writing Queries






=> SELECT * FROM t1 WHERE (x IN (SELECT x FROM t2)) IS FALSE;
ERROR: NULL value found in a column used by a subquery
HP Vertica returns an error message during subquery run time on scalar subqueries that
return more than one row.
Aggregates and GROUP BY clauses are allowed in subqueries, as long as those subqueries
are not correlated.
Correlated expressions under ALL and [NOT] IN are not supported.
Correlated expressions under OR are not supported.
Multiple correlations are allowed only for subqueries that are joined with an equality predicate
(<, >, <=, >=, =, <>, <=>) but IN/NOT IN, EXISTS/NOT EXISTS predicates within correlated
subqueries are not allowed:
=> SELECT t2.x, t2.y, t2.z FROM t2 WHERE t2.z NOT IN
(SELECT t1.z FROM t1 WHERE t1.x = t2.x);
ERROR: Correlated subquery with NOT IN is not supported
Up to one level of correlated subqueries is allowed in the WHERE clause if the subquery
references columns in the immediate outer query block. For example, the following query is
not supported because the t2.x = t3.x subquery can only refer to table t1 in the outer query,
making it a correlated expression because t3.x is two levels out:
=> SELECT t3.x, t3.y, t3.z FROM t3 WHERE t3.z IN (
SELECT t1.z FROM t1 WHERE EXISTS (
SELECT 'x' FROM t2 WHERE t2.x = t3.x) AND t1.x = t3.x);
ERROR: More than one level correlated subqueries are not supported
The query is supported if it is rewritten as follows:
=> SELECT t3.x, t3.y, t3.z FROM t3 WHERE t3.z IN
(SELECT t1.z FROM t1 WHERE EXISTS
(SELECT 'x' FROM t2 WHERE t2.x = t1.x)
AND t1.x = t3.x);
Joins
Queries can combine records from multiple tables, or multiple instances of the same table. A
query that combines records from one or more tables is called a join. Joins are allowed in a
SELECT statement, as well as inside a subquery.
HP Vertica supports the following join types:




Inner (including natural, cross) joins
Left, right, and full outer joins
Optimizations for equality and range joins predicates
Hash, merge and sort-merge join algorithms.
There are three basic algorithms that perform a join operation: hash, merge, and nested loop:

A hash join is used to join large data sets. The smaller joined table is used to build a hash table
in memory on the join column. The HP Vertica optimizer then scans the larger table and
probes the hash table to look for matches. The optimizer chooses a hash join when projections
are not sorted on the join columns.
-315-
Programmer's Guide


If both inputs are pre-sorted, the optimizer can skip a sort operation and choose a merge join.
The term sort-merge join refers to the case when one or both inputs must be sorted before the
merge join. In this case, HP Vertica sorts the inner input but only if the outer input is already
sorted on the join keys.
HP Vertica does not support nested loop joins.
The ANSI Join Syntax
Before the ANSI SQL-92 standard introduced the new join syntax, relations (tables, views, etc.)
were named in the FROM clause, separated by commas. Join conditions were specified in the
WHERE clause:
=> SELECT * FROM T1, T2 WHERE T1.id = T2.id;
The ANSI SQL-92 standard provided more specific join syntax, with join conditions named in the
ON clause:
=> SELECT * FROM T1
[ INNER | LEFT OUTER | RIGHT OUTER | FULL OUTER | NATURAL | CROSS ] JOIN T2
ON T1.id = T2.id
See SQL-99 ANSI syntax at BNF Grammar for SQL-99
(http://savage.net.au/SQL/sql-99.bnf.html) for additional details.
Although some users continue to use the older join syntax, HP encourages you to use the SQL-92
join syntax whenever possible because of its many advantages:




SQL-92 outer join syntax is portable across databases; the older syntax was not consistent
between databases. (HP Vertica does not support proprietary outer join syntax such as '+'
that can be used in some databases.)
SQL-92 syntax provides greater control over whether predicates are to be evaluated during or
after outer joins. This was also not consistent between databases when using the older
syntax. See "Join Conditions vs. Filter Conditions" below.
SQL-92 syntax eliminates ambiguity in the order of evaluating the joins, in cases where more
than two tables are joined with outer joins.
Union joins can be expressed using the SQL-92 syntax, but not in the older syntax.
Note: HP Vertica does not currently support union joins.
Join Conditions vs. Filter Conditions
If you do not use the SQL-92 syntax, join conditions (predicates that are evaluated during the join)
are difficult to distinguish from filter conditions (predicates that are evaluated after the join), and in
some cases cannot be expressed at all. With SQL-92, join conditions and filter conditions are
separated into two different clauses, the ON clause and the WHERE clause, respectively, making
queries easier to understand.
-316-
Writing Queries


The ON clause contains relational operators (for example, <, <=, >, >=, <>, =, <=>) or other
predicates that specify which records from the left and right input relations to combine, such as
by matching foreign keys to primary keys. ON can be used with inner, left outer, right outer, and
full outer joins. Cross joins and union joins do not use an ON clause.
Inner joins return all pairings of rows from the left and right relations for which the ON clause
evaluates to TRUE. In a left join, all rows from the left relation in the join are present in the
result; any row of the left relation that does not match any rows in the right relation is still
present in the result but with nulls in any columns taken from the right relation. Similarly, a right
join preserves all rows from the right relation, and a full join retains all rows from both relations.
The WHERE clause is evaluated after the join is performed. It filters records returned by the
FROM clause, eliminating any records that do not satisfy the WHERE clause condition.
HP Vertica automatically converts outer joins to inner joins in cases where it is correct to do so,
allowing the optimizer to choose among a wider set of query plans and leading to better
performance.
Inner Joins
An inner join combines records from two tables based on a join predicate and requires that each
record in the first table has a matching record in the second table. Inner joins, thus, return only
those records from both joined tables that satisfy the join condition. Records that contain no
matches are not preserved.
Inner joins take the following form:
SELECT <column list>
FROM <left joined table>
[INNER] JOIN <right joined table>
ON <join condition>
Notes
 Inner joins are are commutative and associative, which means you can specify the tables in
any order you want, and the results do not change.
 If you omit the INNER keyword, the join is still an inner join, the most commonly used type of
join.
 Join conditions that follow the ON keyword generally can contain many predicates connected
with Boolean AND, OR, or NOT predicates.
 For best performance, do not join on any LONG VARBINARY and LONG VARCHAR columns.
 You can also use inner join syntax to specify joins for pre-join projections. See Pre-join
Projections and Join Predicates (page 326).Some SQL-related books and online tutorials
refer to a left-joined table as the outer table and a right-joined table as the inner table. The HP
documentation often uses the left/right table concept.
-317-
Programmer's Guide
Example
In the following example, an inner join produces only the set of records that matches in both T1
and T2 when T1 and T2 have the same data type; all other data is excluded.
=> SELECT * FROM T1 INNER JOIN T2 ON (T1.id = T2.id);
If a company, for example, wants to know the dates vendors in Utah sold inventory:
=> SELECT v.vendor_name, d.date FROM vendor_dimension v
INNER JOIN date_dimension d ON v.vendor_key = d.date_key
WHERE vendor_state = 'UT';
vendor_name
|
date
------------------+-----------Frozen Warehouse | 2003-01-07
Delicious Farm
| 2003-01-26
(2 rows)
To clarify, if the vendor dimension table contained a third row that has no corresponding date
when a vendor sold inventory, then that row would not be included in the result set. Similarly, if on
some date there was no inventory sold by any vendor, those rows would be left out of the result
set. If you want to include all rows from one table or the other regardless of whether a match
exists, you can specify an outer join (page 322).
See Also
Joins Notes and Restrictions (page 327)
Equi-joins and Non Equi-Joins
HP Vertica supports any arbitrary join expression with both matching and non-matching column
values; for example:
SELECT
SELECT
SELECT
SELECT
SELECT
*
*
*
*
*
FROM
FROM
FROM
FROM
FROM
fact
fact
fact
fact
fact
JOIN
JOIN
JOIN
JOIN
JOIN
dim
dim
dim
dim
dim
ON
ON
ON
ON
ON
fact.x
fact.x
fact.x
fact.x
fact.x
= dim.x;
> dim.y;
<= dim.y;
<> dim.y;
<=> dim.y;
Note: The = and <=> operators generally run the fastest.
Equi-joins are based on equality (matching column values). This equality is indicated with an equal
sign (=), which functions as the comparison operator in the ON clause using SQL-92 syntax or the
WHERE clause using older join syntax.
The first example below uses SQL-92 syntax and the ON clause to join the online sales table with
the call center table using the call center key; the query then returns the sale date key that equals
the value 156:
=> SELECT sale_date_key, cc_open_date FROM online_sales.online_sales_fact
INNER JOIN online_sales.call_center_dimension
ON (online_sales.online_sales_fact.call_center_key =
online_sales.call_center_dimension.call_center_key
AND sale_date_key = 156);
-318-
Writing Queries
sale_date_key | cc_open_date
---------------+-------------156 | 2005-08-12
(1 row)
The second example uses older join syntax and the WHERE clause to join the same tables to get
the same results:
=> SELECT sale_date_key, cc_open_date
FROM online_sales.online_sales_fact, online_sales.call_center_dimension
WHERE online_sales.online_sales_fact.call_center_key =
online_sales.call_center_dimension.call_center_key
AND sale_date_key = 156;
sale_date_key | cc_open_date
---------------+-------------156 | 2005-08-12
(1 row)
HP Vertica also permits tables with compound (multiple-column) primary and foreign keys. For
example, to create a pair of tables with multi-column keys:
=>
=>
=>
=>
CREATE TABLE dimension(pk1 INTEGER NOT NULL, pk2 INTEGER NOT NULL);
ALTER TABLE dimension ADD PRIMARY KEY (pk1, pk2);
CREATE TABLE fact (fk1 INTEGER NOT NULL, fk2 INTEGER NOT NULL);
ALTER TABLE fact ADD FOREIGN KEY (fk1, fk2) REFERENCES dimension (pk1, pk2);
To join tables using compound keys, you must connect two join predicates with a Boolean AND
operator. For example:
=> SELECT * FROM fact f JOIN dimension d ON f.fk1 = d.pk1 AND f.fk2 = d.pk2;
You can write queries with expressions that contain the <=> operator for NULL=NULL joins.
=> SELECT * FROM fact JOIN dim ON fact.x <=> dim.y;
The <=> operator performs an equality comparison like the = operator, but it returns true, instead
of NULL, if both operands are NULL, and false, instead of NULL, if one operand is NULL.
=> SELECT 1 <=> 1, NULL <=> NULL, 1 <=> NULL;
?column? | ?column? | ?column?
----------+----------+---------t
| t
| f
(1 row)
Compare the <=> operator to the = operator:
=> SELECT 1 = 1, NULL = NULL, 1 = NULL;
?column? | ?column? | ?column?
----------+----------+---------t
|
|
(1 row)
Note: Writing NULL=NULL joins on primary key/foreign key combinations is not an optimal
choice because PK/FK columns are usually defined as NOT NULL.
-319-
Programmer's Guide
When composing joins, it helps to know in advance which columns contain null values. An
employee's hire date, for example, would not be a good choice because it is unlikely hire date
would be omitted. An hourly rate column, however, might work if some employees are paid hourly
and some are salaried. If you are unsure about the value of columns in a given table and want to
check, type the command:
=> SELECT COUNT(*) FROM tablename WHERE columnname IS NULL;
Natural Joins
A natural join is just a join with an implicit join predicate. Natural joins can be inner, left outer, right
outer, or full outer joins and take the following form:
SELECT <column list> FROM <left-joined table>
NATURAL [ INNER | LEFT OUTER | RIGHT OUTER | FULL OUTER ] JOIN <right-joined table>
Natural joins are, by default, natural inner joins; however, there can also be natural (left/right) outer
joins. The primary difference between an inner and natural join is that inner joins have an explicit
join condition, whereas the natural join’s conditions are formed by matching all pairs of columns in
the tables that have the same name and compatible data types, making natural joins equi-joins
because join condition are equal between common columns. (If the data types are incompatible,
HP Vertica returns an error.)
Note: The data type coercion chart lists the data types that can be cast to other data types. If
one data type can be cast to the other, those two data types are compatible.
The following query is a simple natural join between tables T1 and T2 when the T2 column val is
greater than 5:
=> SELECT * FROM T1 NATURAL JOIN T2 WHERE T2.val > 5;
The following example shows a natural join between the store_sales_fact table and the
product_dimension table with columns of the same name, product_key and
product_version:
=> SELECT product_description, store.store_sales_fact.*
FROM store.store_sales_fact, public.product_dimension
WHERE store.store_sales_fact.product_key =
public.product_dimension.product_key
AND store.store_sales_fact.product_version =
public.product_dimension.product_version;
The following three queries return the same result expressed as a basic query, an inner join, and a
natural join. at the table expressions are equivalent only if the common attribute in the
store_sales_fact table and the store_dimension table is store_key. If both tables have
a column named store_key, then the natural join would also have a
store_sales_fact.store_key = store_dimension.store_key join condition. Since the
results are the same in all three instances, they are shown in the first (basic) query only:
=> SELECT store_name FROM store.store_sales_fact, store.store_dimension
WHERE store.store_sales_fact.store_key = store.store_dimension.store_key
AND store.store_dimension.store_state = 'MA' ORDER BY store_name;
store_name
-320-
Writing Queries
-----------Store11
Store128
Store178
Store66
Store8
Store90
(6 rows)
The query written as an inner join:
=> SELECT store_name FROM store.store_sales_fact
INNER JOIN store.store_dimension
ON (store.store_sales_fact.store_key = store.store_dimension.store_key)
WHERE store.store_dimension.store_state = 'MA' ORDER BY store_name;
In the case of the natural join, the join predicate appears implicitly by comparing all of the columns
in both tables that are joined by the same column name. The result set contains only one column
representing the pair of equally-named columns.
=> SELECT store_name FROM store.store_sales_fact
NATURAL JOIN store.store_dimension
WHERE store.store_dimension.store_state = 'MA' ORDER BY store_name;
Cross Joins
Cross joins are the simplest joins to write, but they are not usually the fastest to run because they
consist of all possible combinations of two tables’ records. Cross joins contain no join condition
and return what is known as a Cartesian product, where the number of rows in the result set is
equal to the number of rows in the first table multiplied by the number of rows in the second table.
The following query returns all possible combinations from the the promotion table and the store
sales table:
=> SELECT * FROM promotion_dimension CROSS JOIN store.store_sales_fact;
Since this example returns over 600 million records, many cross join results can be extremely
large and difficult to manage. Cross joins can be useful, however, such as when you want to return
a single-row result set.
Tip: Filter out unwanted records in a cross with WHERE clause join predicates:
=> SELECT * FROM promotion_dimension p
CROSS JOIN store.store_sales_fact f
WHERE p.promotion_key LIKE f.promotion_key;
For details on what qualifies as a join predicate, see Pre-join Projections and
Join Predicates (page 326).
HP recommends that you do not write implicit cross joins (coma-separated tables in the FROM
clause). These queries could imply accidental omission of a join predicate.
The following statement is an example of an implicit cross join:
=> SELECT * FROM promotion_dimension , store.store_sales_fact;
-321-
Programmer's Guide
If you intend is to run a cross join, write an explicit cross join query using CROSS JOIN keywords,
such as in the following statement:
=> SELECT * FROM promotion_dimension CROSS JOIN store.store_sales_fact;
Examples
The following example creates two small tables and their superprojections and then runs a cross
join on the tables:
=>
=>
=>
=>
=>
=>
=>
=>
CREATE
CREATE
INSERT
INSERT
INSERT
INSERT
INSERT
SELECT
TABLE employee(employee_id INT, employee_fname VARCHAR(50));
TABLE department(dept_id INT, dept_name VARCHAR(50));
INTO employee VALUES (1, 'Andrew');
INTO employee VALUES (2, 'Priya');
INTO employee VALUES (3, 'Michelle');
INTO department VALUES (1, 'Engineering');
INTO department VALUES (2, 'QA');
* FROM employee CROSS JOIN department;
In the result set, the cross join retrieves records from the first table and then creates a new row for
every row in the 2nd table. It then does the same for the next record in the first table, and so on.
employee_id | employee_name | dept_id | dept_name
-------------+---------------+---------+----------1 | Andrew
|
1 | Engineering
2 | Priya
|
1 | Engineering
3 | Michelle
|
1 | Engineering
1 | Andrew
|
2 | QA
2 | Priya
|
2 | QA
3 | Michelle
|
2 | QA
(6 rows)
Outer Joins
Outer joins extend the functionality of inner joins by letting you preserve rows of one or both tables
that do not have matching rows in the non-preserved table. Outer joins take the following form:
SELECT <column list>
FROM <left-joined table>
[ LEFT | RIGHT | FULL ] OUTER JOIN <right-joined table>
ON <join condition>
Note: Omitting the keyword OUTER from your statements does not affect results of left and right
joins. LEFT OUTER JOIN and LEFT JOIN perform the same operation and return the same
results.
Left Outer Joins
A left outer join returns a complete set of records from the left-joined (preserved) table T1, with
matched records, where available, in the right-joined (non-preserved) table T2. Where HP Vertica
finds no match, it extends the right side column (T2) with null values.
=> SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1.x = T2.x;
-322-
Writing Queries
To exclude the non-matched values from T2, write the same left outer join, but filter out the records
you don't want from the right side by using a WHERE clause:
=> SELECT * FROM T1 LEFT OUTER JOIN T2
ON T1.x = T2.x WHERE T2.x IS NOT NULL;
The following example uses a left outer join to enrich telephone call detail records with an
incomplete numbers dimension. It then filters out results that are known not to be from
Massachusetts:
=> SELECT COUNT(*) FROM calls LEFT OUTER JOIN numbers
ON calls.to_phone = numbers.phone WHERE NVL(numbers.state, '') <> 'MA';
Right Outer Joins
A right outer join returns a complete set of records from the right-joined (preserved) table, as well
as matched values from the left-joined (non-preserved) table. If HP Vertica finds no matching
records from the left-joined table (T1), NULL values appears in the T1 column for any records with
no matching values in T1. A right join is, therefore, similar to a left join, except that the treatment of
the tables is reversed.
=> SELECT * FROM T1 RIGHT OUTER JOIN T2 ON T1.x = T2.x;
The above query is equivalent to the following query, where T1 RIGHT OUTER JOIN T2 = T2
LEFT OUTER JOIN T1.
=> SELECT * FROM T2 LEFT OUTER JOIN T1 ON T2.x = T1.x;
The following example identifies customers who have not placed an order:
=> SELECT customers.customer_id FROM orders RIGHT OUTER JOIN customers
ON orders.customer_id = customers.customer_id
GROUP BY customers.customer_id HAVING COUNT(orders.customer_id) = 0;
Full Outer Joins
A full outer join returns results for both left and right outer joins. The joined table contains all
records from both tables, including nulls (missing matches) from either side of the join. This is
useful if you want to see, for example, each employee who is assigned to a particular department
and each department that has an employee, but you also want to see all the employees who are
not assigned to a particular department, as well as any department that has no employees:
=> SELECT employee_last_name, hire_date FROM employee_dimension emp
FULL OUTER JOIN department dept ON emp.employee_key = dept.department_key;
Notes
HP Vertica also supports joins where the outer (preserved) table or subquery is replicated on more
than one node and the inner (non-preserved) table or subquery is segmented across more than
one node. For example, in the following query, the fact table, which is almost always segmented,
appears on the non-preserved side of the join, and it is allowed:
=> SELECT sales_dollar_amount, transaction_type, customer_name
FROM store.store_sales_fact f RIGHT JOIN customer_dimension d
ON f.customer_key = d.customer_key;
-323-
Programmer's Guide
sales_dollar_amount | transaction_type | customer_name
---------------------+------------------+--------------252 | purchase
| Inistar
363 | purchase
| Inistar
510 | purchase
| Inistar
-276 | return
| Foodcorp
252 | purchase
| Foodcorp
195 | purchase
| Foodcorp
290 | purchase
| Foodcorp
222 | purchase
| Foodcorp
|
| Foodgen
|
| Goldcare
(10 rows
Range Joins
HP Vertica provides performance optimizations for <, <=, >, >=, and BETWEEN predicates in join
ON clauses. These optimizations are particularly useful when a column from one table is restricted
to be in a range specified by two columns of another table.
Key Ranges
Multiple, consecutive key values can map to the same dimension values. Consider, for example, a
table of IPv4 addresses and their owners. Because large subnets (ranges) of IP addresses could
belong to the same owner, this dimension can be represented as:
=> CREATE TABLE ip_owners(
ip_start INTEGER,
ip_end INTEGER,
owner_id INTEGER);
=> CREATE TABLE clicks(
ip_owners INTEGER,
dest_ip INTEGER);
A query that associates a click stream with its destination can use a join similar to the following,
which takes advantage of the range optimization:
=> SELECT owner_id, COUNT(*) FROM clicks JOIN ip_owners
ON clicks.dest_ip BETWEEN ip_start AND ip_end
GROUP BY owner_id;
Slowly-changing Dimensions
Sometimes there are multiple dimension ranges, each relevant over a different time period. For
example, stocks might undergo splits (and reverse splits), and the price or volume of two trades
might not be directly comparable without taking this into account. A ―split factor‖ can be defined,
which accounts for these events through time:
=> CREATE TABLE splits(
symbol VARCHAR(10),
start DATE,
"end" DATE,
-324-
Writing Queries
split_factor FLOAT);
A join with an optimized range predicate can then be used to match each trade with the effective
split factor:
=> SELECT trades.symbol, SUM(trades.volume * splits.split_factor)
FROM trades JOIN splits
ON trades.symbol = splits.symbol AND trades.tdate between splits.start AND
splits.end
GROUP BY trades.symbol;
Notes
 Operators <, <=, >, >=, or BETWEEN must appear as top-level conjunctive predicates for range
join optimization to be effective, as shown in the following examples:
The following example query is optimized because BETWEEN is the only predicate:
=> SELECT COUNT(*) FROM fact JOIN dim
ON fact.point BETWEEN dim.start AND dim.end;
This next example uses comparison operators as top-level predicates (within AND):
=> SELECT COUNT(*) FROM fact JOIN dim
ON fact.point > dim.start AND fact.point < dim.end;






The following is optimized because BETWEEN is a top-level predicate (within AND):
=> SELECT COUNT(*) FROM fact JOIN dim
ON (fact.point BETWEEN dim.start AND dim.end) AND fact.c <> dim.c;
The following query is not optimized because OR is the top-level predicate (disjunctive):
=> SELECT COUNT(*) FROM fact JOIN dim
ON (fact.point BETWEEN dim.start AND dim.end) OR dim.end IS NULL;
Expressions are optimized in range join queries in many cases.
If range columns can have NULL values indicating that they are open-ended, it is possible to
use range join optimizations by replacing nulls with very large or very small values:
=> SELECT COUNT(*) FROM fact JOIN dim
ON fact.point BETWEEN NVL(dim.start, -1) AND NVL(dim.end,
1000000000000);
If there is more than one set of ranging predicates in the same ON clause, the order in which
the predicates are specified might impact the effectiveness of the optimization:
=> SELECT COUNT(*) FROM fact JOIN dim
ON fact.point1 BETWEEN dim.start1 AND dim.end1
AND fact.point2 BETWEEN dim.start2 AND dim.end2;
The optimizer chooses the first range to optimize, so write your queries so that the range you
most want optimized appears first in the statement.
The use of the range join optimization is not directly affected by any characteristics of the
physical schema; no schema tuning is required to benefit from the optimization.
The range join optimization can be applied to joins without any other predicates, and to HASH
or MERGE joins.
To determine if an optimization is in use, search for RANGE in the EXPLAIN plan. For example:
=> EXPLAIN SELECT owner_id, COUNT(*) FROM clicks JOIN ip_owners
-325-
Programmer's Guide
ON clicks.dest_ip BETWEEN ip_start AND ip_end GROUP BY owner_id;
Pre-join Projections and Join Predicates
HP Vertica can use pre-join projections when queries contain equi-joins between tables that
contain all foreign key-primary key (FK-PK) columns in the equality predicates.
If you use pre-join projections in queries, the join in the input query becomes an inner join due to
FK-PK constraints, so the second predicate in the example that follows (AND f.id2 = d.id2) is
just extra. HP Vertica runs queries using pre-join projections only if the query contains a superset
of the join predicates in the pre-join projection. In the following example, as long as the pre-join
projection contains f.id = d.id, the pre-join can be used, even with the presence of f.id2 =
d.id2.
=> SELECT * FROM fact f JOIN dim d ON
f.id = d.id AND f.id2 = d.id2;
Note: HP Vertica uses a maximum of one pre-join projection per query. More than one
pre-join projection might appear in a query plan, but at most, one will have been used to replace
the join that would be computed with the precomputed pre-join. Any other pre-join projections
are used as regular projections to supply records from a particular table.
Examples
The following is an example of a pre-join projection schema with a single-column constraint called
customer_key. The first sequence of statements creates a customer table in the public
schema and a store_sales table in the store schema. The dimension table has one primary
key, and the fact table has a foreign key that references the dimension table's primary key.
=> CREATE TABLE public.customer_dimension (
customer_key integer,
annual_income integer,
largest_bill_amount integer);
=> CREATE TABLE store.store_sales_fact (
customer_key integer,
sales_quantity integer,
sales_dollar_amount integer);
=> ALTER TABLE public.customer_dimension
ADD CONSTRAINT pk_customer_dimension PRIMARY KEY (customer_key);
=> ALTER TABLE store.store_sales_fact
ADD CONSTRAINT fk_store_sales_fact FOREIGN KEY (customer_key)
REFERENCES public.customer_dimension (customer_key);
=> CREATE PROJECTION p1 (
customer_key,
-326-
Writing Queries
annual_income,
largest_bill_amount)
AS SELECT * FROM public.customer_dimension UNSEGMENTED ALL NODES;
=> CREATE PROJECTION p2 (
customer_key,
sales_quantity,
sales_dollar_amount)
AS SELECT * FROM store.store_sales_fact UNSEGMENTED ALL NODES;
The following command creates the pre-join projection:
=> CREATE PROJECTION pp (
cust_customer_key,
cust_annual_income,
cust_largest_bill_amount,
fact_customer_key,
fact_sales_quantity,
fact_sales_dollar_amount)
AS SELECT * FROM public.customer_dimension cust, store.store_sales_fact fact
WHERE cust.customer_key = fact.customer_key ORDER BY cust.customer_key;
The pre-join projection contains columns from both tables and has a join predicate between
customer_dimension and store_sales_fact along the FK-PK (primary key-foreign key)
constraints defined on the tables.
The following query uses a pre-join projection because the join predicates match the pre-join
projection's predicates exactly:
=> SELECT COUNT(*) FROM public.customer_dimension INNER JOIN
store.store_sales_fact
ON public.customer_dimension.customer_key =
store.store_sales_fact.customer_key;
count
------10000
(1 row)
Join Notes and Restrictions
The following list summarizes the notes and restrictions for joins in HP Vertica:






Inner joins are are commutative and associative, which means you can specify the tables in
any order you want, and the results do not change.
If you omit the INNER keyword, the join is still an inner join, the most commonly used type of
join.
Join conditions that follow the ON keyword generally can contain many predicates connected
with Boolean AND, OR, or NOT predicates.
For best performance, do not join on any LONG VARBINARY and LONG VARCHAR columns.
You can also use inner join syntax to specify joins for pre-join projections. See Pre-join
Projections and Join Predicates (page 326).
HP Vertica supports any arbitrary join expression with both matching and non-matching
column values; for example:
-327-
Programmer's Guide





=> SELECT * FROM fact JOIN dim ON fact.x = dim.x;
=> SELECT * FROM fact JOIN dim ON fact.x > dim.y;
=> SELECT * FROM fact JOIN dim ON fact.x <= dim.y;
=> SELECT * FROM fact JOIN dim ON fact.x <> dim.y;
=> SELECT * FROM fact JOIN dim ON fact.x <=> dim.y;
HP Vertica permits joins between tables with compound (multiple-column) primary and foreign
keys, as long as you connect the two join predicates with a Boolean AND operator.
You can write queries with expressions that contain the <=> operator for NULL=NULL joins.
=> SELECT * FROM fact JOIN dim ON fact.x <=> dim.y;
The <=> operator performs an equality comparison like the = operator, but it returns true,
instead of NULL, if both operands are NULL, and false, instead of NULL, if one operand is
NULL.
HP recommends that you do not write implicit cross joins (such as tables named in the FROM
clause separated by commas). Such queries could imply accidental omission of a join
predicate. If your intent is to run a cross join, write explicit CROSS JOIN syntax.
HP Vertica supports joins where the outer (preserved) table or subquery is replicated on more
than one node and the inner (non-preserved) table or subquery is segmented across more
than one node.
HP Vertica uses a maximum of one pre-join projection per query. More than one pre-join
projection might appear in a query plan, but at most, one will have been used to replace the
join that would be computed with the precomputed pre-join. Any other pre-join projections are
used as regular projections to supply records from a particular table.
-328-
329
Using SQL Analytics
HP Vertica analytics are SQL functions based on the ANSI 99 standard. These functions handle
complex analysis and reporting tasks such as:




Rank the longest-standing customers in a particular state
Calculate the moving average of retail volume over a specified time
Find the highest score among all students in the same grade
Compare the current sales bonus each salesperson received against his or her previous
bonus
Analytic functions return aggregate results but they do not group the result set. They return the
group value multiple times, once per record.
You can sort these group values, or partitions, using a window ORDER BY clause, but the order
affects only the function result set, not the entire query result set. This ordering concept is
described more fully later.
Notes
HP Vertica supports a full list of analytic functions, including:

FIRST_VALUE(arguments): Allows the selection of the first value of a table or partition without
having to use a self-join
 MEDIAN(arguments): Returns the middle value from a set of values
 NTILE(value): Equally divides the data set into a {value} number of subsets (buckets)
 RANK(): Assigns a rank to each row returned from the query with respect to the other ordered
rows
 STDDEV(arguments): Computes the statistical sample standard deviation of the current row
with respect to a group of rows
 AVG(arguments): Computes an average of an expression in a group of rows
For additional details, see Analytic Functions in the SQL Reference Manual.
How Analytic Functions Work
Analytic functions take the following form:
analytic_function ( arguments ) OVER( analytic_clause )
Evaluation order
Analytic functions conform to the following phases of execution:
1 Take the input rows.
-329-
Programmer's Guide
Analytic functions are computed after WHERE, GROUP BY, HAVING clause operations, and
joins are performed on the query.
2 Group input rows according to the PARTITION BY clause.
The analytic PARTITION BY clause (called the window_partition_clause) is different from table
partition expressions. See Partitioning Tables in the Administrator's Guide for details.
3 Order rows within groups (partitions) according to ORDER BY clause.
The analytic ORDER BY clause (called the window_order_clause (page 336)) is different
from the SQL ORDER BY clause. If the query has a final ORDER BY clause (outside the
OVER() clause), the final results are ordered according by the SQL ORDER BY clause, not the
window_order_clause. See NULL Placement by Analytic Functions (page 406) and
Designing Tables to Minimize Run-Time Sorting of NULL Values in Analytic Functions
(page 407) in the Programmer's Guide for additional information about sort computation.
4 Compute some function for each row.
Analytic functions:




Require the OVER() clause. However, depending on the function, the window_frame_clause
and window_order_clause might not apply. For example, when used with analytic aggregate
functions like SUM(x), you can use the OVER() clause without supplying any of the windowing
clauses; in this case, the aggregate returns the same aggregated value for each row of the
result set.
Are allowed only in the SELECT and ORDER BY clauses.
Can be used in a subquery or in the parent query but cannot be nested; for example, the
following query is not allowed:
=> SELECT MEDIAN(RANK() OVER(ORDER BY sal) OVER()).
WHERE, GROUP BY and HAVING operators are technically not part of the analytic function;
however, they determine on which rows the analytic functions operate.
Note: Several examples throughout this section refer back to this example's allsales table
schema.
Example: Calculation of Median Value
A median is a numerical value that separates the higher half of a sample from the lower half. For
example, you can retrieve the median of a finite list of numbers by arranging all observations from
lowest value to highest value and then picking the middle one.
If there is an even number of observations, then there is no single middle value; the median is then
defined to be the mean (average) of the two middle values
allsales table schema
CREATE
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
TABLE allsales(state VARCHAR(20), name VARCHAR(20), sales INT);
INTO allsales VALUES('MA', 'A', 60);
INTO allsales VALUES('NY', 'B', 20);
INTO allsales VALUES('NY', 'C', 15);
INTO allsales VALUES('MA', 'D', 20);
INTO allsales VALUES('MA', 'E', 50);
INTO allsales VALUES('NY', 'F', 40);
-330-
Using SQL Analytics
INSERT INTO allsales VALUES('MA', 'G', 10);
COMMIT;
The following analytic query returns the median value from the allsales table. Note that the median
value is reported for every row in the result set:
=> SELECT name, sales, MEDIAN(sales) OVER () AS
MEDIAN FROM allsales;
name | sales | median
------+-------+-------G
|
10 |
20
C
|
15 |
20
D
|
20 |
20
B
|
20 |
20
F
|
40 |
20
E
|
50 |
20
A
|
60 |
20
(7 rows)
Additional MEDIAN() examples are in Window Partitioning (page 334).
See Also
Analytic Functions in the SQL Reference Manual
Analytic Functions versus Aggregate Functions
Like aggregate functions, analytic functions return aggregate results, but analytics do not group
the result set. Instead, they return the group value multiple times with each record, allowing further
analysis. Analytic queries also generally run faster and use fewer resources than aggregate
queries.
Analytic functions
Aggregate functions
Return the same number of rows
as the input
Return a single summary value
The groups of rows on whic h an
analytic function operat es are
defined by window partitioning
and window frame claus es
The groups of rows on whic h an
aggregate function operates are
defined by the SQL GROUP BY
clause
Example
This examples illustrate the difference between aggregate functions and their analytic counterpart
using table employees defined in the below sample schema:
-331-
Programmer's Guide
CREATE TABLE employees(emp_no INT, dept_no INT);
INSERT INTO employees VALUES(1, 10);
INSERT INTO employees VALUES(2, 30);
INSERT INTO employees VALUES(3, 30);
INSERT INTO employees VALUES(4, 10);
INSERT INTO employees VALUES(5, 30);
INSERT INTO employees VALUES(6, 20);
INSERT INTO employees VALUES(7, 20);
INSERT INTO employees VALUES(8, 20);
INSERT INTO employees VALUES(9, 20);
INSERT INTO employees VALUES(10, 20);
INSERT INTO employees VALUES(11, 20);
COMMIT;
Table employees:
SELECT * FROM employees ORDER BY emp_no;
emp_no | dept_no
--------+--------1 |
10
2 |
30
3 |
30
4 |
10
5 |
30
6 |
20
7 |
20
8 |
20
9 |
20
10 |
20
11 |
20
(11 rows)
Both queries below ask for the number of employees are in each department:
Aggregate query/result
SELECT dept_no, COUNT(*)
AS emp_count
FROM employees
GROUP BY dept_no ORDER BY 1;
Analytics query/result
SELECT emp_no, dept_no, COUNT(*)
OVER(PARTITION BY dept_no
ORDER BY emp_no)
AS emp_count FROM employees;
-332-
Using SQL Analytics
dept_no | emp_count
---------+----------10 |
2
20 |
6
30 |
3
(3 rows)
emp_no | dept_no | emp_count
--------+---------+----------1 |
10 |
1
4 |
10 |
2
-----------------------------6 |
20 |
1
7 |
20 |
2
8 |
20 |
3
9 |
20 |
4
10 |
20 |
5
11 |
20 |
6
-----------------------------2 |
30 |
1
3 |
30 |
2
5 |
30 |
3
(11 rows)
Aggregat e function COUNT() returns one
row per department for the number of
employees in that department.
The analytic function COUNT() returns a count of the
number of employees in each department, as well as
which employee is in each department. Within each
partition, the results are sorted on the emp_no
column, whic h is specified in the OVER order by
clause.
If you wanted to add the employee number to the above aggregate query, you would add the
emp_no column to the GROUP BY clause. For results, you would get emp_count=1 for each
row—unless the data contained employees with the same emp_no value. For example:
SELECT dept_no, emp_no, COUNT(*)
AS emp_count
FROM employees
GROUP BY dept_no, emp_no ORDER BY 1, 2;
dept_no | emp_no | emp_count
---------+--------+----------10 |
1 |
1
10 |
4 |
1
20 |
6 |
1
20 |
7 |
1
20 |
8 |
1
20 |
9 |
1
20 |
10 |
1
20 |
11 |
1
30 |
2 |
1
30 |
3 |
1
30 |
5 |
1
(11 rows)
See Also
Analytic Query Examples (page 354)
-333-
Programmer's Guide
The Window OVER() Clause
The OVER() clause contains what is called a window. The window defines partitioning, ordering,
and framing for an analytic function—important elements that determine what data the analytic
function takes as input with respect to the current row. The analytic function then operates on a
query result set, which are the rows that are returned after the FROM, WHERE, GROUP BY, and
HAVING clauses have been evaluated.
You can also the OVER() clause with certain analytic functions to define a moving window of data
for every row within a partition.
When used with analytic aggregate functions, OVER() does not require any of the windowing
clauses; in this case, the aggregate returns the same aggregated value for each row of the result
set.
The OVER() clause must follow the analytic function, as in the following syntax:
ANALYTIC_FUNCTION ( arguments )
OVER( window_partition_clause
window_order_clause
window_frame_clause )
Window Partitioning
Window partitioning is optional. When specified, the window_partition_clause divides the rows in
the input based on user-provided expressions, such as aggregation functions like SUM(x).
Window partitioning is similar to the GROUP BY clause except that it returns only one result row
per input row. If you omit the window_partition_clause, all input rows are treated as a single
partition.
The analytic function is computed per partition and starts over again (resets) at the beginning of
each subsequent partition. The window_partition_clause is specified within the OVER()
clause.
Syntax
OVER( window_partition_clause
window_order_clause
window_frame_clause )
Examples
The examples in this topic use the allsales schema defined in How Analytic Functions Work
(page 329).
-334-
Using SQL Analytics
Median of sales within each state
The following query uses the analytic window_partition_clause to calculate the median of sales
within each state. The analytic function is computed per partition and starts over again at the
beginning of the next partition.
=> SELECT state, name, sales, MEDIAN(sales)
OVER (PARTITION BY state) AS median from allsales;
Results are grouped into partitions for MA (35) and NY (20) under the median column.
state | name | sales | median
-------+------+-------+-------NY
| C
|
15 |
20
NY
| B
|
20 |
20
NY
| F
|
40 |
20
------------------------------MA
| G
|
10 |
35
MA
| D
|
20 |
35
MA
| E
|
50 |
35
MA
| A
|
60 |
35
(7 rows)
Median of sales among all states
This query calculates the median of total sales among states. When you use OVER() with no
parameters, there is one partition, the entire input:
=> SELECT state, sum(sales), median(SUM(sales))
OVER () AS median FROM allsales GROUP BY state;
state | sum | median
-------+-----+-------NY
| 75 | 107.5
MA
| 140 | 107.5
(2 rows)
Sales larger than median (evaluation order)
Remember that analytic functions are evaluated after all other clauses except the query's final
SQL ORDER BY clause. So if you were to write a query like the following, which asks for all rows
with sales larger than the median, HP Vertica would return an error because the WHERE clause is
applied before the analytic function and m does not yet exist
=> SELECT name, sales, MEDIAN(sales) OVER () AS m
FROM allsales WHERE sales > m;
ERROR 2624: Column "m" does not exist
You can work around this by having the 'WHERE sales > m' predicate complete in a subquery:
=> SELECT * FROM
(SELECT name, sales, MEDIAN(sales) OVER () AS m FROM allsales) sq
WHERE sales > m;
name | sales | m
------+-------+----
-335-
Programmer's Guide
F
|
E
|
A
|
(3 rows)
40 | 20
50 | 20
60 | 20
For additional examples, see Analytic Query Examples (page 354).
Window Ordering
Window ordering sorts the rows specified by the OVER() clause and specifies whether data is
sorted in ascending or descending order as well as the placement of null values; for example:
ORDER BY expr_list [ ASC | DESC ] [ NULLS { FIRST | LAST | AUTO ]. The ordering of the data
affects the results.
Using ORDER BY in an OVER clause changes the default window to RANGE UNBOUNDED
PRECEDING AND CURRENT ROW, which is described in Window Framing (page 337).
The following table shows the default null placement, with bold clauses to indicate what is implicit:
Ordering
Null placement
ORDER BY column1
ORDER BY a ASC NULLS LAST
ORDER BY column1 ASC
ORDER BY a ASC NULLS LAST
ORDER BY column1 DES C
ORDER BY a DESC NULLS FIRST
Because the window_order_clause is different from a query's final ORDER BY clause, window
ordering might not guarantee the final result order; it specifies only the order within a window result
set, supplying the ordered set of rows to the window_frame_clause (if present), to the analytic
function, or to both. Use the SQL ORDER BY clause to guarantee ordering of the final result set.
(See also NULL Placement by Analytic Functions (page 406) and Designing Tables to
Minimize Run-Time Sorting of NULL Values in Analytic Functions (page 407).)
Syntax
OVER( window_partition_clause
window_order_clause
window_frame_clause )
Examples
The below examples use the the allsales table schema, defined in How Analytic Functions
Work (page 329).
Example 1
Example 2
In this example, the query orders the sales In this example, the final ORDER BY
inside each sales partition:
clause sorts the results by name:
-336-
Using SQL Analytics
SELECT state, sales, name, RANK()
OVER (PARTITION BY state
ORDER BY sales) AS RANK
FROM allsales;
SELECT state, sales, name, RANK()
OVER (PARTITION by state
ORDER BY sales) AS RANK
FROM allsales ORDER BY name;
state | sales | name | RANK
-------+-------+------+---------MA
|
10 | G
|
1
MA
|
20 | D
|
2
MA
|
50 | E
|
3
MA
|
60 | A
|
4
--------------------------------NY
|
15 | C
|
1
NY
|
20 | B
|
2
NY
|
40 | F
|
3
(7 rows)
state | sales | name | RANK
-------+-------+------+---------MA
|
60 | A
|
4
NY
|
20 | B
|
2
NY
|
15 | C
|
1
MA
|
20 | D
|
2
MA
|
50 | E
|
3
NY
|
40 | F
|
3
MA
|
10 | G
|
1
(7 rows)
Window Framing
Window framing represents a unique construct, called a moving window. It defines which values in
the partition are evaluated relative to the current row. You specify a window frame type by using
the RANGE (page 341) or ROWS (page 338) keywords, described in the next topics. Both
RANGE and ROWS are respective to the CURRENT ROW, which is the next row for which the
analytic function computes results. As the current row advances, the window boundaries are
recomputed (move) along with it, determining which rows fall into the current window.
An analytic function with a window frame specification is computed for each row based on the
rows that fall into the window relative to that row. If you omit the window_frame_clause, the default
window is RANGE UNBOUNDED PRECEDING AND CURRENT ROW.
Syntax
OVER( window_partition_clause
window_order_clause
window_frame_clause )
See window_frame_clause in the SQL Reference Manual for more detailed syntax.
Schema for examples
The window framing examples that follow use the following emp table schema:
-337-
Programmer's Guide
CREATE TABLE emp(deptno INT, sal INT, empno INT);
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INTO
INTO
INTO
INTO
INTO
INTO
INTO
INTO
INTO
INTO
INTO
emp
emp
emp
emp
emp
emp
emp
emp
emp
emp
emp
VALUES(10,101,1);
VALUES(10,104,4);
VALUES(20,100,11);
VALUES(20,109,7);
VALUES(20,109,6);
VALUES(20,109,8);
VALUES(20,110,10);
VALUES(20,110,9);
VALUES(30,102,2);
VALUES(30,103,3);
VALUES(30,105,5);
COMMIT;
Windows with a Physical Offset (ROWS)
ROWS specifies the window as a physical offset. Using ROWS, defines a start and end point of a
window by the number of rows before or after the current row. The value can be INTEGER data
type only.
Note: The value returned by an analytic function with a physical offset could produce
nondeterministic results unless the ordering expression results in a unique ordering. You might
have to specify multiple columns in the window_order_clause to achieve this unique ordering.
Examples
The examples on this page use the emp table schema defined in Window Framing (page 337):



The red line represents the partition
The blue box represents the current row
The green box represents the analytic window relative to the current row.
The following example uses the ROWS-based window for the COUNT() analytic function to return
the department number, salary, and employee number with a count. The
window_frame_clause specifies the rows between the current row and two preceding. Using
ROWS in the window_frame_clause specifies the window as a physical offset and defines the
start- and end-point of a window by the number of rows before and after the current row.
SELECT deptno, sal, empno, COUNT(*)
OVER (PARTITION BY deptno ORDER BY sal
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
AS count FROM emp;
-338-
Using SQL Analytics
Notice that the partition includes department 20, and the current row and window are the same
because there are no rows that precede the current row within that partition, even though the
query specifies 2 preceding. The value in the count column (1) represents the number of rows in
the current window.
As the current row moves, the window spans from 1 preceding to the current row, which is as far
as it can go within the constraints of the window_frame_clause. COUNT() returns the number
of rows in the window, even if 2 preceding is specified. In the count column, (2) includes the
current row and the row above, which is the maximum the statement of 2 preceding allows within
the current partition.
-339-
Programmer's Guide
The current row moves again, and the window can now span 2 preceding the current row within
the partition. The count (3) includes the number of rows in the partition (2 above + current), which
is the maximum the statement of 2 preceding allows within the current partition.
When the current row moves, the window also moves to maintain 2 preceding and current row.
The count of 3 repeats because it represents the number of rows in the window, which has not
changed:
-340-
Using SQL Analytics
The current row advances again, and the window is defined by the same window, so the count
does not change.
In this example, the current row advances once more. Notice the current row has reached the end
of the deptno partition.
Windows with a Logical Offset (RANGE)
The RANGE keyword specifies the window as a logical offset, such as time. The range value must
match the window_order_clause data type, which can be NUMERIC, DATE/TIME, FLOAT or
INTEGER.
-341-
Programmer's Guide
Note: The value returned by an analytic function with a logical offset is always deterministic,
which means that the function returns the same result every time it is called if given the same
input value.
During the analytical computation, rows are excluded or included based on the logical offset, or
value (RANGE). relative to the current row, which is always the reference point.
The ORDER BY column (window_order_clause) is the column whose value is used to
compute the window span.
Only one window_order_clause column is allowed, and the data type must be NUMERIC,
DATE/TIME, FLOAT or INTEGER, unless the window specifies one of following frames:



RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
INTERVAL Year to Month can be used in an analytic RANGE window when the ORDER BY
column type is TIMESTAMP/TIMESTAMP WITH TIMEZONE, or DATE. TIME/TIME WITH
TIMEZONE are not supported.
INTERVAL Day to Second can be used when the ORDER BY column type is
TIMESTAMP/TIMESTAMP WITH TIMEZONE, DATE, and TIME/TIME WITH TIMEZONE.
Examples
The examples on this page use the emp table schema defined in Window Framing (page 337):
 The red line represents the partition
 The blue box represents the current row
 The green box represents the analytic window relative to the current row.
In the following query, RANGE specifies the window as a logical offset (value-based). The
ORDER BY column is the column on which the range is applied.
SELECT deptno, sal, empno, COUNT(*)
OVER (PARTITION BY deptno ORDER BY sal
RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)
AS COUNT FROM emp;
-342-
Using SQL Analytics
The partition includes department 20, and the current row and window are the same because
there are no rows that precede the current row within that partition:
In the next example, the ORDER BY column value is 109, so 109 - 2 = 107. The window would
include all rows whose ORDER BY column values are between 107 and 109 inclusively.
-343-
Programmer's Guide
Here, the current row advances, and 107-109 are still inclusive.
Though the current row advances again, the window is the same.
-344-
Using SQL Analytics
The current row advances so that the ORDER BY column value becomes 110 (before it was 109).
Now the window would include all rows whose ORDER BY column values were between 108 and
110, inclusive, because 110 - 2 = 108.
As the current row advances, the window still includes rows for 108-110, inclusive.
Remember that a window frame can also be time based, such as the following query:
SELECT ts, bid, avg(bid) OVER
(ORDER BY ts RANGE BETWEEN '40 SECONDS' PRECEDING AND CURRENT ROW)
FROM ticks WHERE stock = 'VERT'
GROUP BY bid, ts ORDER BY ts;
-345-
Programmer's Guide
Reporting Aggregates
Some of the analytic functions that take the window_frame_clause are the reporting aggregates.
These functions let you compare a partition's aggregate values with detail rows, taking the place of
correlated subqueries or joins.
 AVG()
 COUNT()
 MAX() and MIN()
 SUM()
 STDDEV(), STDDEV_POP(), and STDDEV_SAMP()
 VARIANCE(), VAR_POP(), and VAR_SAMP()
If you use a window aggregate with an empty OVER() clause, the analytic function is used as a
reporting function, where the entire input is treated as a single partition.
About standard deviation and variance functions
With standard deviation functions, a low standard deviation indicates that the data points tend to
be very close to the mean, whereas high standard deviation indicates that the data points are
spread out over a large range of values.
Standard deviation is often graphed and a distributed standard deviation creates the classic bell
curve.
Variance functions measure how far a set of numbers is spread out.
Examples
Think of the window for reporting aggregates as a window defined as UNBOUNDED PRECEDING
and UNBOUNDED FOLLOWING. The omission of a window_order_clause makes all rows in the
partition also the window (reporting aggregates).
SELECT deptno, sal, empno, COUNT(sal)
OVER (PARTITION BY deptno) AS COUNT FROM emp;
deptno | sal | empno | count
--------+-----+-------+------10 | 101 |
1 |
2
10 | 104 |
4 |
2
-----------------------------20 | 110 |
10 |
6
20 | 110 |
9 |
6
20 | 109 |
7 |
6
20 | 109 |
6 |
6
20 | 109 |
8 |
6
20 | 100 |
11 |
6
-----------------------------30 | 105 |
5 |
3
30 | 103 |
3 |
3
30 | 102 |
2 |
3
(11 rows)
-346-
Using SQL Analytics
If the OVER() clause in the above query contained a window_order_clause (for example,
ORDER BY sal), it would become a moving window (window aggregate) query with a default
window of RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW:
SELECT deptno, sal, empno, COUNT(sal)
OVER (PARTITION BY deptno ORDER BY sal) AS COUNT FROM emp;
deptno | sal | empno | count
--------+-----+-------+------10 | 101 |
1 |
1
10 | 104 |
4 |
2
-----------------------------20 | 100 |
11 |
1
20 | 109 |
7 |
4
20 | 109 |
6 |
4
20 | 109 |
8 |
4
20 | 110 |
10 |
6
20 | 110 |
9 |
6
-----------------------------30 | 102 |
2 |
1
30 | 103 |
3 |
2
30 | 105 |
5 |
3
(11 rows)
What about LAST_VALUE()?
You might wonder why you couldn't just use the LAST_VALUE() analytic function. For example,
for each employee, get the highest salary in the department:
SELECT deptno, sal, empno,
LAST_VALUE(empno) OVER (PARTITION BY deptno ORDER BY sal) AS lv FROM emp;
-347-
Programmer's Guide
Due to default window semantics, LAST_VALUE does not always return the last value of a
partition. If you omit the window_frame_clause from the analytic clause, LAST_VALUE operates
on this default window. Results, therefore, can seem non-intuitive because the function does not
return the bottom of the current partition. It returns the bottom of the window, which continues to
change along with the current input row being processed.
Remember the default window:
OVER (PARTITION BY deptno ORDER BY sal)
is the same as:
OVER(PARTITION BY deptno ORDER BY sal
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
-348-
Using SQL Analytics
-349-
Programmer's Guide
If you want to return the last value of a partition, use UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING.
SELECT deptno, sal, empno, LAST_VALUE(empno)
OVER (PARTITION BY deptno ORDER BY sal
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lv
-350-
Using SQL Analytics
FROM emp;
Vertica recommends that you use LAST_VALUE with the window_order_clause to produce
deterministic results.
In the following example, empno 6, 7, and 8 have the same salary, so they are in adjacent rows.
empno 8 appears first in this case but the order is not guaranteed.
-351-
Programmer's Guide
Notice in the output above, the last value is 7, which is the last row from the partition deptno = 20.
If the rows have a different order, then the function returns a different value:
Now the last value is 6, which is the last row from the partition deptno = 20. The solution is to add
a unique key to the sort order. Even if the order of the query changes, the result will always be the
same, and so deterministic.
SELECT deptno, sal, empno,
LAST_VALUE(empno) OVER (PARTITION BY deptno ORDER BY sal, empno
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lv
FROM emp;
Notice how the rows are now ordered by empno, the last value stays at 8, and it does not matter
the order of the query.
-352-
Using SQL Analytics
Naming Windows
You can use the WINDOW clause to name one or more windows and avoid typing long OVER()
clause syntax. The WINDOW clause takes the following form:
WINDOW window_name AS ( window_definition_clause );
The window_definition_clause is described in detail in the SQL Reference Manual.
Example
In the following example, RANK() and DENSE_RANK() use the partitioning and ordering
specifications in the window definition for a window named w:
=> SELECT RANK() OVER w , DENSE_RANK() OVER w
FROM employee_dimension
WINDOW w AS (PARTITION BY employee_region ORDER by annual_salary);
Though analytic functions can reference a named window to inherit the window_partition_clause,
you can define your own window_order_clause; for example:
=> SELECT RANK() OVER(w ORDER BY annual_salary ASC) ,
DENSE_RANK() OVER(w ORDER BY annual_salary DESC)
FROM employee_dimension
WINDOW w AS (PARTITION BY employee_region);
Notes:
 Each window defined in the window_definition_clause must have a unique name.
 The window_partition_clause is defined in the named window specification, not in the
OVER() clause.



The OVER() clause can specify its own window_order_clause only if the
window_definition_clause did not already define it. For example, if the second example
above is rewritten as follows, the system returns an error:
=> SELECT RANK() OVER(w ORDER BY annual_salary ASC) ,
DENSE_RANK() OVER(w ORDER BY annual_salary DESC)
FROM employee_dimension
WINDOW w AS (PARTITION BY employee_region ORDER BY annual_salary);
ERROR: cannot override ORDER BY clause of window "w"
A window definition cannot contain a window_frame_clause.
Each window defined in the window_definition_clause must have a unique name.
You can reference window names within their scope only. For example, because named
window w1 below is defined before w2, w2 is within the scope of w1:
=> SELECT RANK() OVER(w1 ORDER BY sal DESC),
RANK() OVER w2
FROM EMP
WINDOW w1 AS (PARTITION BY deptno), w2 AS (w1 ORDER BY sal);
-353-
Programmer's Guide
Analytic Query Examples
Calculating a Median Value
A median is described as the numerical value separating the higher half of a sample from the
lower half. The median of a finite list of numbers can be found by arranging all the observations
from lowest value to highest value and picking the middle one.
If there is an even number of observations, then there is no single middle value. The median is
then defined to be the mean (average) of the two middle values.
The examples that follow use the allsales table schema, defined in How Analytic Functions
Work (page 329).
Table allsales:
=> SELECT * FROM allsales;
state | name | sales
-------+------+------MA
| A
|
60
NY
| B
|
20
NY
| C
|
15
MA
| D
|
20
MA
| E
|
50
NY
| F
|
40
MA
| G
|
10
(7 rows)
The following query calculates the median of sales from the allsales table. Note that when you use
OVER() with no parameters, the query returns the same aggregated value for each row of the
result set:
=> SELECT name, sales, MEDIAN(sales) OVER() AS median FROM allsales;
name | sales | median
------+-------+-------G
|
10 |
20
C
|
15 |
20
D
|
20 |
20
B
|
20 |
20
F
|
40 |
20
E
|
50 |
20
A
|
60 |
20
(7 rows)
Without analytics, you'd have to write an overly complex query to get the median sales, but
performance will suffer, and the query returns only one row:
=> SELECT sales MEDIAN FROM
-354-
Using SQL Analytics
(
SELECT a1.name, a1.sales, COUNT(a1.sales) Rank
FROM allsales a1, allsales a2
WHERE a1.sales < a2.sales OR
(a1.sales=a2.sales AND a1.name <= a2.name)
GROUP BY a1.name, a1.sales
ORDER BY a1.sales desc
) a3
WHERE Rank =
(SELECT (COUNT(*)+1) / 2 FROM allsales);
MEDIAN
-------20
(1 row)
Getting Price Differential for Two Stocks
The following subquery selects out two stocks of interest. The outer query uses the
LAST_VALUE() and OVER() components of analytics, with IGNORE NULLS.
Schema
DROP TABLE ticks CASCADE;
CREATE TABLE ticks (ts TIMESTAMP, stock varchar(10), bid float);
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INTO
INTO
INTO
INTO
INTO
INTO
INTO
INTO
INTO
ticks
ticks
ticks
ticks
ticks
ticks
ticks
ticks
ticks
VALUES('2011-07-12
VALUES('2011-07-12
VALUES('2011-07-12
VALUES('2011-07-12
VALUES('2011-07-12
VALUES('2011-07-12
VALUES('2011-07-12
VALUES('2011-07-12
VALUES('2011-07-12
10:23:54',
10:23:58',
10:23:59',
10:25:15',
10:25:16',
10:25:22',
10:25:27',
10:31:12',
10:31:15',
COMMIT;
ticks table
SELECT * FROM ticks;
ts
| stock | bid
---------------------+-------+------2011-07-12 10:23:59 | abc
| 10.75
2011-07-12 10:25:22 | xyz
| 45.16
2011-07-12 10:23:58 | abc
| 10.34
2011-07-12 10:25:27 | xyz
| 49.33
2011-07-12 10:23:54 | abc
| 10.12
-355-
'abc', 10.12);
'abc', 10.34);
'abc', 10.75);
'abc', 11.98);
'abc');
'xyz', 45.16);
'xyz', 49.33);
'xyz', 65.25);
'xyz');
Programmer's Guide
2011-07-12
2011-07-12
2011-07-12
2011-07-12
(9 rows)
10:31:15
10:25:15
10:25:16
10:31:12
|
|
|
|
xyz
abc
abc
xyz
|
| 11.98
|
| 65.25
Query
SELECT ts, stock, bid, last_value(price1 IGNORE NULLS)
OVER(ORDER BY ts) - last_value(price2 IGNORE NULLS)
OVER(ORDER BY ts)
as price_diff
FROM
(SELECT ts, stock, bid,
CASE WHEN stock = 'abc' THEN bid ELSE NULL END AS price1,
CASE WHEN stock = 'xyz' then bid ELSE NULL END AS price2
FROM ticks
WHERE stock IN ('abc','xyz')
) v1
ORDER BY ts;
ts
| stock | bid | price_diff
---------------------+-------+-------+-----------2011-07-12 10:23:54 | abc
| 10.12 |
2011-07-12 10:23:58 | abc
| 10.34 |
2011-07-12 10:23:59 | abc
| 10.75 |
2011-07-12 10:25:15 | abc
| 11.98 |
2011-07-12 10:25:16 | abc
|
|
2011-07-12 10:25:22 | xyz
| 45.16 |
-33.18
2011-07-12 10:25:27 | xyz
| 49.33 |
-37.35
2011-07-12 10:31:12 | xyz
| 65.25 |
-53.27
2011-07-12 10:31:15 | xyz
|
|
-53.27
(9 rows)
Calculating the Moving Average
This example calculates a 40-second moving average of bids for one stock.
Note: This examples uses the ticks table schema defined in Getting Price Differential for
Two Stocks (page 355).
Query
SELECT ts, bid, AVG(bid)
OVER(ORDER BY ts
RANGE BETWEEN INTERVAL '40 seconds'
PRECEDING AND CURRENT ROW)
FROM ticks
WHERE stock = 'abc'
GROUP BY bid, ts
-356-
Using SQL Analytics
ORDER BY ts;
ts
| bid |
?column?
---------------------+-------+-----------------2011-07-12 10:23:54 | 10.12 |
10.12
2011-07-12 10:23:58 | 10.34 |
10.23
2011-07-12 10:23:59 | 10.75 | 10.4033333333333
2011-07-12 10:25:15 | 11.98 |
11.98
2011-07-12 10:25:16 |
|
11.98
(5 rows)
Avoiding GROUPBY HASH with Projection Design
If your query contains a GROUP BY clause, HP Vertica computes the result with either the
GROUPBY PIPELINED or GROUPBY HASH algorithm.
Both algorithms compute the same results and have similar performance when the query
produces a small number of distinct groups (typically a thousand per node in the cluster). For
queries that contain a large number of groups, GROUPBY PIPELINED uses less memory and can
be faster but is only used when the input data is pre-sorted on the GROUP BY columns.
To improve the performance of a query that has a large number of distinct groups that is currently
using the GROUP BY HASH algorithm, you can enable the use of the GROUPBY PIPELINED
algorithm, as this section describes.
To determine which algorithm your query is using, run the EXPLAIN statement on the query.
The three conditions described in this section refer to the following schema.
CREATE TABLE sortopt (
a INT NOT NULL,
b INT NOT NULL,
c INT,
d INT
);
CREATE PROJECTION sortopt_p (
a_proj,
b_proj,
c_proj,
d_proj )
AS SELECT * FROM sortopt
ORDER BY a,b,c
UNSEGMENTED ALL NODES;
INSERT INTO sortopt VALUES(5,2,13,84);
INSERT INTO sortopt VALUES(14,22,8,115);
INSERT INTO sortopt VALUES(79,9,401,33);
-357-
Programmer's Guide
Condition #1
All columns in the query's GROUP BY clause must be included in the projection's sort columns. If
even one column in the GROUP BY clause is excluded from the projection's ORDER BY clause, HP
Vertica uses GROUPBY HASH instead of GROUPBY PIPELINED:
Given a projection sort order ORDER BY a, b, c:
GROUP BY a
GROUP
GROUP
GROUP
GROUP
BY
BY
BY
BY
a,b
b,a
a,b,c
c,a,b
GROUP BY a,b,c,d
The query optimizer uses GROUPBY PIPELINED because
columns a, b, and c are included in the projection sort
columns.
The query optimizer uses GROUPBY HASH because
column d is not part of the projection sort columns.
Condition #2
If the number of columns in the query's GROUP BY clause is less than the number of columns in the
projection's ORDER BY clause, columns in the query's GROUP BY clause must occur first in the
projection's ORDER BY clause.
Given a projection sort order ORDER BY a, b ,c:
GROUP BY a
GROUP BY a,b
GROUP BY b,a
GROUP BY a,b,c
The query optimizer uses GROUPBY PIPELINED because
columns a, b, c are included in the projection sort
columns.
GROUP BY c,a,b
GROUP BY a,c
The query optimizer uses GROUPBY HASH because
columns a and c do not occur first in the projection sort
columns.
Condition #3
If the columns in a query's GROUP BY clause do not appear first in the projection's ORDER BY
clause, then any early-appearing projection sort columns that are missing in the query's GROUP BY
clause must be present as single-column constant equality predicates in the query's WHERE
clause.
Given a projection sort order ORDER BY a, b, c:
SELECT a FROM tab WHERE a = 10
GROUP BY b
The query optimizer uses GROUPBY PIPELINED because
all columns preceding b in the projection sort order appear
as constant equality predicates.
SELECT a FROM tab WHERE a = 10
The query optimizer uses GROUPBY PIPELINED even if
redundant grouping column a is pres ent.
GROUP BY a, b
-358-
Using SQL Analytics
SELECT a FROM tab WHERE a = 10
GROUP BY b, c
The query optimizer uses GROUPBY PIPELINED because
all columns preceding b and c in the projection sort order
appear as constant equality predicates.
SELECT a FROM tab WHERE a = 10
GROUP BY c, b
The query optimizer uses GROUPBY PIPELINED because
all columns preceding b and c in the projection sort order
appear as constant equality predicates.
SELECT a FROM tab WHERE a = 10
GROUP BY c
The query optimizer uses GROUPBY HASH because all
columns preceding c in the projection sort order do not
appear as constant equality predicates.
Getting Latest bid and ask Results
The following example query fills in missing (null) values to create a full book order showing latest
bid and ask price and size, by vendor id. Original rows have values for (typically) one price and
one size, so use last_value with "ignore nulls" to find the most recent non-null value for the other
pair each time there is an entry for the ID. Sequenceno provides a unique total ordering.
Schema:
CREATE TABLE bookorders(
vendorid VARCHAR(100),
date TIMESTAMP,
sequenceno INT,
askprice FLOAT,
asksize INT,
bidprice FLOAT,
bidsize INT);
INSERT
10.23,
INSERT
10.75,
INSERT
INSERT
10.25,
INSERT
11.90,
INSERT
11.67,
INSERT
15.10,
INSERT
INSERT
15.11,
INTO bookorders VALUES('3325XPK','2011-07-12 10:23:54', 1, 10.12, 55,
59);
INTO bookorders VALUES('3345XPZ','2011-07-12 10:23:55', 2, 10.55, 58,
57);
INTO bookorders VALUES('445XPKF','2011 -07-12 10:23:56', 3, 10.22, 43, 54);
INTO bookorders VALUES('445XPKF','2011-07-12 10:23:57', 3, 10.22, 59,
61);
INTO bookorders VALUES('3425XPY','2011-07-12 10:23:58', 4, 11.87, 66,
66);
INTO bookorders VALUES('3727XVK','2011-07-12 10:23:59', 5, 11.66, 51,
62);
INTO bookorders VALUES('5325XYZ','2011-07-12 10:24:01', 6, 15.05, 44,
59);
INTO bookorders VALUES('3675XVS','2011 -07-12 10:24:05', 7, 15.43, 47, 58);
INTO bookorders VALUES('8972VUG','2011-07-12 10:25:15', 8, 14.95, 52,
57);
COMMIT;
-359-
Programmer's Guide
Query:
SELECT
sequenceno Seq,
date "Time",
vendorid ID,
LAST_VALUE (bidprice IGNORE NULLS)
OVER (PARTITION BY vendorid ORDER BY sequenceno
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT
AS "Bid Price",
LAST_VALUE (bidsize IGNORE NULLS)
OVER (PARTITION BY vendorid ORDER BY sequenceno
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT
AS "Bid Size",
LAST_VALUE (askprice IGNORE NULLS)
OVER (PARTITION BY vendorid ORDER BY sequenceno
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT
AS "Ask Price",
LAST_VALUE (asksize IGNORE NULLS)
OVER (PARTITION BY vendorid order by sequenceno
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT
AS "Ask Size"
FROM bookorders
ORDER BY sequenceno;
ROW)
ROW)
ROW)
ROW )
Seq |
Time
|
ID
| Bid Price | Bid Size | Ask Price | Ask Size
-----+---------------------+---------+-----------+----------+-----------+---------1 | 2011-07-12 10:23:54 | 3325XPK |
10.23 |
59 |
10.12 |
55
1 | 2011-07-12 10:23:54 | 3325XPK |
10.23 |
59 |
10.12 |
55
2 | 2011-07-12 10:23:55 | 3345XPZ |
10.75 |
57 |
10.55 |
58
2 | 2011-07-12 10:23:55 | 3345XPZ |
10.75 |
57 |
10.55 |
58
3 | 2011-07-12 10:23:56 | 445XPKF |
54 |
61 |
10.22 |
43
3 | 2011-07-12 10:23:57 | 445XPKF |
10.25 |
61 |
10.22 |
59
3 | 2011-07-12 10:23:56 | 445XPKF |
54 |
61 |
10.22 |
43
3 | 2011-07-12 10:23:57 | 445XPKF |
10.25 |
61 |
10.22 |
59
4 | 2011-07-12 10:23:58 | 3425XPY |
11.9 |
66 |
11.87 |
66
4 | 2011-07-12 10:23:58 | 3425XPY |
11.9 |
66 |
11.87 |
66
5 | 2011-07-12 10:23:59 | 3727XVK |
11.67 |
662 |
11.66 |
51
5 | 2011-07-12 10:23:59 | 3727XVK |
11.67 |
62 |
11.66 |
51
6 | 2011-07-12 10:24:01 | 5325XYZ |
15.1 |
59 |
15.05 |
44
6 | 2011-07-12 10:24:01 | 5325XYZ |
15.1 |
59 |
15.05 |
44
7 | 2011-07-12 10:24:05 | 3675XVS |
58 |
|
15.43 |
47
7 | 2011-07-12 10:24:05 | 3675XVS |
58 |
|
15.43 |
47
8 | 2011-07-12 10:25:15 | 8972VUG |
15.11 |
57 |
14.95 |
52
8 | 2011-07-12 10:25:15 | 8972VUG |
15.11 |
57 |
14.95 |
52
(18 rows)
Event-based Windows
Event-based windows let you break time series data into windows that border on significant events
within the data. This is especially relevant in financial data where analysis often focuses on
specific events as triggers to other activity.
There are two event-based window functions in HP Vertica. These functions are an HP Vertica
extension and are not part of the SQL-99 standard:
-360-
Using SQL Analytics

CONDITIONAL_CHANGE_EVENT() assigns an event window number to each row, starting
from 0, and increments by 1 when the result of evaluating the argument expression on the
current row differs from that on the previous row. This function is similar to the analytic function
ROW_NUMBER, which assigns a unique number, sequentially, starting from 1, to each row
within a partition.
 CONDITIONAL_TRUE_EVENT() assigns an event window number to each row, starting from
0, and increments the number by 1 when the result of the boolean argument expression
evaluates true.
The CONDITIONAL_CHANGE_EVENT and CONDITIONAL_TRUE_EVENT functions do not
allow window framing (page 337).
Example Schema
The examples in this topic use the following schema:
CREATE TABLE TickStore3 (
ts TIMESTAMP,
symbol VARCHAR(8),
bid FLOAT
);
CREATE PROJECTION TickStore3_p (ts, symbol, bid) AS
SELECT * FROM TickStore3
ORDER BY ts, symbol, bid UNSEGMENTED ALL NODES;
INSERT INTO
INSERT INTO
INSERT INTO
INSERT INTO
COMMIT;
TickStore3
TickStore3
TickStore3
TickStore3
VALUES
VALUES
VALUES
VALUES
('2009-01-01
('2009-01-01
('2009-01-01
('2009-01-01
03:00:00',
03:00:03',
03:00:06',
03:00:09',
'XYZ',
'XYZ',
'XYZ',
'XYZ',
10.0);
11.0);
10.5);
11.0);
Using the CONDITIONAL_CHANGE_EVENT Function
The analytical function CONDITIONAL_CHANGE_EVENT returns a sequence of integers
indicating event window numbers, starting from 0. The function increments the event window
number when the result of evaluating the function expression on the current row differs from the
previous value.
In the following example, the first query returns all records from the TickStore3 table. The second
query uses the CONDITIONAL_CHANGE_EVENT function on the bid column. Since each bid
row value is different from the previous value, the function increments the window ID from 0 to 3:
SELECT ts, symbol, bid
FROM Tickstore3
ORDER BY ts;
ts
| symbol | bid
---------------------+--------+-----2009-01-01 03:00:00 | XYZ
|
10
2009-01-01 03:00:03 | XYZ
|
11
2009-01-01 03:00:06 | XYZ
| 10.5
2009-01-01 03:00:09 | XYZ
|
11
(4 rows)
==>
SELECT CONDITIONAL_CHANGE_EVENT(bid)
OVER(ORDER BY ts)
FROM Tickstore3;
ts
| symbol | bid | cce
---------------------+--------+------+----2009-01-01 03:00:00 | XYZ
|
10 | 0
2009-01-01 03:00:03 | XYZ
|
11 | 1
2009-01-01 03:00:06 | XYZ
| 10.5 | 2
2009-01-01 03:00:09 | XYZ
|
11 | 3
(4 rows)
-361-
Programmer's Guide
The following figure is a graphical illustration of the change in the bid price. Each value is different
from its previous one, so the window ID increments for each time slice:
So the window ID starts at 0 and increments at every change in from the previous value.
In this example, the bid price changes from $10 to $11 in the second row, but then stays the same.
The CONDITIONAL_CHANGE_EVENT function increments the event window ID in row 2, but not
subsequently:
SELECT ts, symbol, bid
FROM Tickstore3
ORDER BY ts;
ts
| symbol | bid
---------------------+--------+-----2009-01-01 03:00:00 | XYZ
|
10
2009-01-01 03:00:03 | XYZ
|
11
2009-01-01 03:00:06 | XYZ
|
11
2009-01-01 03:00:09 | XYZ
|
11
==>
SELECT CONDITIONAL_CHANGE_EVENT(bid)
OVER(ORDER BY ts)
FROM Tickstore3;
ts
| symbol | bid | cce
---------------------+--------+------+----2009-01-01 03:00:00 | XYZ
|
10 | 0
2009-01-01 03:00:03 | XYZ
|
11 | 1
2009-01-01 03:00:06 | XYZ
|
11 | 1
2009-01-01 03:00:09 | XYZ
|
11 | 1
The following figure is a graphical illustration of the change in the bid price at 3:00:03 only. The
price stays the same at 3:00:06 and 3:00:09, so the window ID remains at 1 for each time slice
after the change:
-362-
Using SQL Analytics
Using the CONDITIONAL_TRUE_EVENT Function
Like CONDITIONAL_CHANGE_EVENT, the analytic function CONDITIONAL_TRUE_EVENT
also returns a sequence of integers indicating event window numbers, starting from 0. The
difference between the two functions is that the CONDITIONAL_TRUE_EVENT function
increments the window ID each time its expression evaluates to true, while
CONDITIONAL_CHANGE_EVENT increments on a comparison expression with the previous
value.
In the following example, the first query returns all records from the TickStore3 table. The second
query uses the CONDITIONAL_TRUE_EVENT function to test whether the current bid is greater
than a given value (10.6). Each time the expression tests true, the function increments the
window ID. The first time the function increments the window ID is on row 2, when the value is 11.
The expression tests false for the next row (value is not greater than 10.6), so the function does
not increment the event window ID. In the final row, the expression is true for the given condition,
and the function increments the window:
SELECT ts, symbol, bid
FROM Tickstore3
ORDER BY ts;
ts
| symbol | bid
---------------------+--------+-----2009-01-01 03:00:00 | XYZ
|
10
2009-01-01 03:00:03 | XYZ
|
11
2009-01-01 03:00:06 | XYZ
| 10.5
2009-01-01 03:00:09 | XYZ
|
11
==>
SELECT CONDITIONAL_TRUE_EVENT(bid > 10.6)
OVER(ORDER BY ts)
FROM Tickstore3;
ts
| symbol | bid | cte
---------------------+--------+------+----2009-01-01 03:00:00 | XYZ
|
10 | 0
2009-01-01 03:00:03 | XYZ
|
11 | 1
2009-01-01 03:00:06 | XYZ
| 10.5 | 1
2009-01-01 03:00:09 | XYZ
|
11 | 2
The following figure is a graphical illustration that shows the bid values and window ID changes.
Because the bid value is greater than $10.6 on only the second and fourth time slices (3:00:03 and
3:00:09), the window ID returns <0,1,1,2>:
In the following example, the first query returns all records from the TickStore3 table, ordered by
the tickstore values (ts). The second query uses the CONDITIONAL_TRUE_EVENT function to
increment the window ID each time the bid value is greater than 10.6. The first time the function
increments the event window ID is on row 2, where the value is 11. The window ID then
increments each time after that, because the expression (bid > 10.6) tests true for each time slice:
SELECT ts, symbol, bid
FROM Tickstore3
ORDER BY ts;
ts
| symbol | bid
---------------------+--------+------
SELECT CONDITIONAL_TRUE_EVENT(bid > 10.6)
OVER(ORDER B ts)
FROM Tickstore3;
ts
| symbol | bid | cte
---------------------+--------+------+-----
-363-
Programmer's Guide
2009-01-01
2009-01-01
2009-01-01
2009-01-01
03:00:00
03:00:03
03:00:06
03:00:09
|
|
|
|
XYZ
XYZ
XYZ
XYZ
|
|
|
|
10
11
11
11
2009-01-01
2009-01-01
2009-01-01
2009-01-01
==>
03:00:00
03:00:03
03:00:06
03:00:09
|
|
|
|
XYZ
XYZ
XYZ
XYZ
|
|
|
|
10
11
11
11
|
|
|
|
0
1
2
3
The following figure is a graphical illustration that shows the bid values and window ID changes.
The bid value is greater than 10.6 on the second time slice (3:00:03) and remains for the
remaining two time slices. The function increments the event window ID each time because the
expression tests true:
Advanced use of event-based windows
In event-based window functions, the condition expression accesses values from the current row
only. To access a previous value, you can use a more powerful event-based window that allows
the window event condition to include previous data points. For example, the LAG(x, n) analytic
function retrieves the value of column X in the nth to last input record. In this case, LAG() shares
the OVER() clause specifications of the CONDITIONAL_CHANGE_EVENT or
CONDITIONAL_TRUE_EVENT function expression.
In the following example, the first query returns all records from the TickStore3 table. The second
query uses the CONDITIONAL_TRUE_EVENT function with the LAG() function in its boolean
expression. In this case, the CONDITIONAL_TRUE_EVENT function increments the event
window ID each time the bid value on the current row is less than the previous value. The first time
CONDITIONAL_TRUE_EVENT increments the window ID starts on the third time slice, when the
expression tests true. The current value (10.5) is less than the previous value. The window ID is
not incremented in the last row because the final value is greater than the previous row:
SELECT ts, symbol, bid
FROM Tickstore3
ORDER BY ts;
ts
| symbol | bid
---------------------+--------+-----2009-01-01 03:00:00 | XYZ
|
10
2009-01-01 03:00:03 | XYZ
|
11
2009-01-01 03:00:06 | XYZ
| 10.5
2009-01-01 03:00:09 | XYZ
|
11
SELECT CONDITIONAL_TRUE_EVENT(bid <
LAG(bid)) OVER(ORDER BY ts)
FROM Tickstore;
ts
| symbol | bid | cte
---------------------+--------+------+----2009-01-01 03:00:00 | XYZ
|
10 | 0
2009-01-01 03:00:03 | XYZ
|
11 | 0
2009-01-01 03:00:06 | XYZ
| 10.5 | 1
2009-01-01 03:00:09 | XYZ
|
11 | 1
-364-
Using SQL Analytics
The following figure illustrates the second query above. When the bid price is less than the
previous value, the window ID gets incremented, which occurs only in the third time slice
(3:00:06):
See Also
Sessionization with Event-based Windows (page 365)
Using Time Series Analytics (page 368)
CONDITIONAL_CHANGE_EVENT(), CONDITIONAL_TRUE_EVENT() and LAG() in the SQL
Reference Manual
Sessionization with Event-based Windows
Sessionization, a special case of event-based windows, is a feature often used to analyze click
streams, such as identifying web browsing sessions from recorded web clicks.
In HP Vertica, given an input clickstream table, where each row records a Web page click made by
a particular user (or IP address), the sessionization computation attempts to identify Web
browsing sessions from the recorded clicks by grouping the clicks from each user based on the
time-intervals between the clicks. If two clicks from the same user are made too far apart in time,
as defined by a time-out threshold, the clicks are treated as though they are from two different
browsing sessions.
Example Schema
The examples in this topic use the following WebClicks schema to represent a simple clickstream
table:
CREATE
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
TABLE WebClicks(userId INT, timestamp
INTO WebClicks VALUES (1, '2009-12-08
INTO WebClicks VALUES (1, '2009-12-08
INTO WebClicks VALUES (1, '2009-12-08
INTO WebClicks VALUES (1, '2009-12-08
INTO WebClicks VALUES (2, '2009-12-08
INTO WebClicks VALUES (2, '2009-12-08
INTO WebClicks VALUES (2, '2009-12-08
-365-
TIMESTAMP);
3:00:00 pm');
3:00:25 pm');
3:00:45 pm');
3:01:45 pm');
3:02:45 pm');
3:02:55 pm');
3:03:55 pm');
Programmer's Guide
COMMIT;
The input table WebClicks contains the following rows:
=> SELECT * FROM WebClicks;
userId |
timestamp
--------+--------------------1 | 2009-12-08 15:00:00
1 | 2009-12-08 15:00:25
1 | 2009-12-08 15:00:45
1 | 2009-12-08 15:01:45
2 | 2009-12-08 15:02:45
2 | 2009-12-08 15:02:55
2 | 2009-12-08 15:03:55
(7 rows)
In the following query, sessionization performs computation on the SELECT list columns, showing
the difference between the current and previous timestamp value using LAG(). It evaluates to true
and increments the window ID when the difference is greater than 30 seconds.
=> SELECT userId, timestamp,
CONDITIONAL_TRUE_EVENT(timestamp - LAG(timestamp) > '30 seconds')
OVER(PARTITION BY userId ORDER BY timestamp) AS session FROM WebClicks;
userId |
timestamp
| session
--------+---------------------+--------1 | 2009-12-08 15:00:00 |
0
1 | 2009-12-08 15:00:25 |
0
1 | 2009-12-08 15:00:45 |
0
1 | 2009-12-08 15:01:45 |
1
2 | 2009-12-08 15:02:45 |
0
2 | 2009-12-08 15:02:55 |
0
2 | 2009-12-08 15:03:55 |
1
(7 rows)
In the output, the session column contains the window ID from the
CONDITIONAL_TRUE_EVENT function. The window ID evaluates to true on row 4 (timestamp
15:01:45), and the ID that follows row 4 is zero because it is the start of a new partition (for user ID
2), and that row does not evaluate to true until the last line in the output.
You might want to give users different time-out thresholds. For example, one user might have a
slower network connection or could be multi-tasking, while another user might have a faster
connection and is focused on a single Web site, doing a single task.
To compute an adaptive time-out threshold based on the last 2 clicks, use
CONDITIONAL_TRUE_EVENT with LAG to return the average time between the last 2 clicks with
a grace period of 3 seconds:
SELECT userId, timestamp,
CONDITIONAL_TRUE_EVENT(timestamp - LAG(timestamp) >
(LAG(timestamp, 1) - LAG(timestamp, 3)) / 2 + '3 seconds')
OVER(PARTITION BY userId ORDER BY timestamp) AS session
FROM WebClicks;
userId |
timestamp
| session
--------+---------------------+--------2 | 2009-12-08 15:02:45 |
0
-366-
Using SQL Analytics
2 |
2 |
1 |
1 |
1 |
1 |
(7 rows)
2009-12-08
2009-12-08
2009-12-08
2009-12-08
2009-12-08
2009-12-08
15:02:55
15:03:55
15:00:00
15:00:25
15:00:45
15:01:45
|
|
|
|
|
|
0
0
0
0
0
1
Note: You cannot define a moving window in time series data. For example, if the query is
evaluating the first row and there’s no data, it will be the current row. If you have a lag of 2, no
results are returned until the third row.
See Also
Event-based Windows (page 360)
-367-
368
Using Time Series Analytics
Time series analytics evaluate the values of a given set of variables over time and group those
values into a window (based on a time interval) for analysis and aggregation.
Common scenarios are changes over time, such as stock market trades and performance, as well
as charting trend lines over data.
Because both time and the state of data within a time series are continuous, it can be challenging
to evaluate SQL queries over time. Input records usually occur at non-uniform intervals, which
means they might have gaps. HP Vertica provides gap-filling functionality—which fills in missing
data points, as—and an interpolation scheme, which is a method of constructing new data points
within the range of a discrete set of known data points. HP Vertica interpolates the non-time series
columns in the data (such as analytic function results computed over time slices) and adds the
missing data points to the output. Gap filling and interpolation are described in detail in this
section.
You can also use event-based windows (page 360) to break time series data into windows that
border on significant events within the data. This is especially relevant in financial data where
analysis might focus on specific events as triggers to other activity. Sessionization (page 365), a
special case of event-based windows, is a feature often used to analyze click streams, such as
identifying web browsing sessions from recorded web clicks.
HP Vertica provides additional support for time series analytics with the following SQL extensions,
which you can read about in the SQL Reference Manual.



The SELECT..TIMESERIES clause supports gap-filling and interpolation (GFI) computation.
TS_FIRST_VALUE and TS_LAST_VALUE are time series aggregate functions that return the
value at the start or end of a time slice, respectively, which is determined by the interpolation
scheme.
TIME_SLICE is a (SQL extension) date/time function that aggregates data by different
fixed-time intervals and returns a rounded-up input TIMESTAMP value to a value that
corresponds with the start or end of the time slice interval.
See Also
Event-based Windows (page 360)
Sessionization with Event Based Windows (page 365)
-368-
369
Gap Filling and Interpolation (GFI)
The examples and graphics that explain the concepts in this topic use the following simple
schema:
CREATE TABLE TickStore (ts TIMESTAMP, symbol VARCHAR(8), bid FLOAT);
INSERT INTO TickStore VALUES ('2009-01-01 03:00:00', 'XYZ', 10.0);
INSERT INTO TickStore VALUES ('2009-01-01 03:00:05', 'XYZ', 10.5);
COMMIT;
In HP Vertica, time series data is represented by a sequence of rows that conforms to a particular
table schema, where one of the columns stores the time information.
Both time and the state of data within a time series are continuous. This means that evaluating
SQL queries over time can be challenging because input records usually occur at non-uniform
intervals and could contain gaps. Consider, for example, the following table, which contains two
input rows five seconds apart, at 3:00:00 and 3:00:05.
=> SELECT * FROM TickStore;
ts
| symbol | bid
---------------------+--------+-----2009-01-01 03:00:00 | XYZ
|
10
2009-01-01 03:00:05 | XYZ
| 10.5
(2 rows)
Given those two inputs, how would you determine a bid price that fell between the two points, such
as at 3:00:03 PM?
The TIME_SLICE function, which normalizes timestamps into corresponding time slices, might
seem like a logical candidate; however, TIME_SLICE does not solve the problem of missing inputs
(time slices) in the data. Instead, HP Vertica provides gap-filling and interpolation (GFI)
functionality to fill in missing data points and add new (missing) data points within a range of
known data points to the output using time series aggregate functions and the SQL TIMESERIES
clause.
But first, we'll illustrate the components that make up gap filling and interpolation in HP Vertica,
starting with Constant Interpolation (page 370).
The images in the following topics use the following legend:





The x-axis represents the timestamp (ts) column
The y-axis represents the bid column.
The vertical blue lines delimit the time slices.
The red dots represent the input records in the table, $10.0 and $10.5.
The blue stars represent the output values, including interpolated values.
-369-
Programmer's Guide
Constant Interpolation
Given known input timestamps at 03:00:00 and 03:00:05 in the sample TickStore schema (page
369), how might you determine the bid price at 03:00:03?
A common interpolation scheme used on financial data is to set the bid price to the last seen value
so far. This scheme is referred to as constant interpolation, in which HP Vertica computes a new
value based on the previous input records.
Note: Constant is HP Vertica's default interpolation scheme. Another interpolation scheme,
linear (page 372), is discussed in an upcoming topic.
Returning to the problem query, here is the table output, which shows a 5-second lag between
bids at 03:00:00 and 03:00:05:
=> SELECT * FROM TickStore;
ts
| symbol | bid
---------------------+--------+-----2009-01-01 03:00:00 | XYZ
|
10
2009-01-01 03:00:05 | XYZ
| 10.5
(2 rows)
Using constant interpolation, the interpolated bid price of XYZ remains at $10.0 at 3:00:03, which
falls between the two known data inputs (3:00:00 PM and 3:00:05). At 3:00:05, the value changes
to $10.5. The known data points are represented by a red dot, and the interpolated value at
3:00:03 is represented by the blue star.
In order to write a query that makes the input rows more uniform, you first need to understand the
TIMESERIES clause and time series aggregate functions (page 370).
The TIMESERIES Clause and Aggregates
The SELECT..TIMESERIES clause and time series aggregates help solve the problem of gaps in
input records by normalizing the data into 3-second time slices and interpolating the bid price
when it finds gaps.
-370-
Using Time Series Analy tics
The TIMESERIES clause
The TIMESERIES clause is an important component of time series analytics computation. It
performs gap filling and interpolation (GFI) to generate time slices missing from the input records.
The clause applies to the timestamp columns/expressions in the data, and takes the following
form:
TIMESERIES slice_time AS 'length_and_time_unit_expression' OVER (
... [ window_partition_clause [ , ... ] ]
... ORDER BY time_expression )
... [ ORDER BY table_column [ , ... ] ]
Note: The TIMESERIES clause requires an ORDER BY operation on the timestamp column.
Time series aggregate (TSA) functions
Timeseries Aggregate (TSA) functions evaluate the values of a given set of variables over time
and group those values into a window for analysis and aggregation.
TSA functions process the data that belongs to each time slice. One output row is produced per
time slice or per partition per time slice if a partition expression is present.
The following table shows 3-second time slices where


The first two rows fall within the first time slice, which runs from 3:00:00 to 3:00:02. These are
the input rows for the TSA function's output for the time slice starting at 3:00:00.
The second two rows fall within the second time slice, which runs from 3:00:03 to 3:00:05.
These are the input rows for the TSA function's output for the time slice starting at 3:00:03.
The result is the start of each time slice.
Example
The following statement uses both the TIMESERIES clause and the TS_FIRST_VALUE TSA
function to process the data that belongs to each 3-second time slice. The query returns the
values of the bid column, as determined by the specified constant interpolation scheme:
=> SELECT slice_time, TS_FIRST_VALUE(bid, 'CONST') bid FROM TickStore
TIMESERIES slice_time AS '3 seconds' OVER(PARTITION by symbol ORDER BY ts);
Now the original data inputs (at left) look like the output on the right because HP Vertica
interpolated the last known value and filled in the missing data point, returning 10 at 3:00:03:
Original query
Interpolated value
slice_time
| bid
---------------------+----2009-01-01 03:00:00 | 10
2009-01-01 03:00:03 |10.5
(2 rows)
slice_time
| bid
---------------------+----2009-01-01 03:00:00 | 10
2009-01-01 03:00:03 | 10
(2 rows)
==>
-371-
Programmer's Guide
Linear Interpolation
Instead of interpolating data points based on the last seen value (constant interpolation (page
370)), linear interpolation is where HP Vertica interpolates values in a linear slope based on the
specified time slice.
The query that follows uses linear interpolation to place the input records in 2-second time slices
and return the first bid value for each symbol/time slice combination (the value at the start of the
time slice):
=> SELECT slice_time, TS_FIRST_VALUE(bid, 'LINEAR') bid FROM Tickstore
TIMESERIES slice_time AS '2 seconds' OVER(PARTITION BY symbol ORDER BY ts);
slice_time
| bid
---------------------+-----2009-01-01 03:00:00 |
10
2009-01-01 03:00:02 | 10.2
2009-01-01 03:00:04 | 10.4
(3 rows)
The following figure illustrates the previous query results, showing the 2-second time gaps
(3:00:02 and 3:00:04) in which no input record occurs. Note that the interpolated bid price of XYZ
changes to 10.2 at 3:00:02 and 10.3 at 3:00:03 and 10.4 at 3:00:04, all of which fall between the
two known data inputs (3:00:00 and 3:00:05). At 3:00:05, the value would change to 10.5.
Note: The known data points above are represented by a red dot, and the interpolated values
are represented by blue stars.
The following is a side-by-side comparison of constant and linear interpolation schemes.
CONST interpolation
LINEAR interpolation
-372-
Using Time Series Analy tics
GFI Examples
This topic illustrates some of the queries you can write using the constant and linear interpolation
schemes.
Constant interpolation
The first query uses TS_FIRST_VALUE() and the TIMESERIES clause to place the input records
in 3-second time slices and return the first bid value for each symbol/time slice combination (the
value at the start of the time slice).
Note: The TIMESERIES clause requires an ORDER BY operation on the TIMESTAMP
column.
=> SELECT slice_time, symbol, TS_FIRST_VALUE(bid) AS first_bid FROM TickStore
TIMESERIES slice_time AS '3 seconds' OVER (PARTITION BY symbol ORDER BY ts);
Because the bid price of stock XYZ is 10.0 at 3:00:03, the first_bid value of the second time
slice, which starts at 3:00:03 is till 10.0 (instead of 10.5) because the input value of 10.5 does not
occur until 3:00:05. In this case, the interpolated value is inferred from the last value seen on stock
XYZ for time 3:00:03:
slice_time
| symbol | first_bid
---------------------+--------+----------2009-01-01 03:00:00 | XYZ
|
10
2009-01-01 03:00:03 | XYZ
|
10
(2 rows)
The next example places the input records in 2-second time slices to return the first bid value for
each symbol/time slice combination:
=> SELECT slice_time, symbol, TS_FIRST_VALUE(bid) AS first_bid FROM TickStore
TIMESERIES slice_time AS '2 seconds' OVER (PARTITION BY symbol ORDER BY ts);
The result now contains three records in 2-second increments, all of which occur between the first
input row at 03:00:00 and the second input row at 3:00:05. Note that the second and third output
record correspond to a time slice where there is no input record:
-373-
Programmer's Guide
slice_time
| symbol | first_bid
---------------------+--------+----------2009-01-01 03:00:00 | XYZ
|
10
2009-01-01 03:00:02 | XYZ
|
10
2009-01-01 03:00:04 | XYZ
|
10
(3 rows)
Using the same table schema, the next query uses TS_LAST_VALUE(), with the TIMESERIES
clause to return the last values of each time slice (the values at the end of the time slices).
Note: Time series aggregate functions process the data that belongs to each time slice. One
output row is produced per time slice or per partition per time slice if a partition expression is
present.
=> SELECT slice_time, symbol, TS_LAST_VALUE(bid) AS last_bid FROM TickStore
TIMESERIES slice_time AS '2 seconds' OVER (PARTITION BY symbol ORDER BY ts);
Notice that the last value output row is 10.5 because the value 10.5 at time 3:00:05 was the last
point inside the 2-second time slice that started at 3:00:04:
slice_time
| symbol | last_bid
---------------------+--------+---------2009-01-01 03:00:00 | XYZ
|
10
2009-01-01 03:00:02 | XYZ
|
10
2009-01-01 03:00:04 | XYZ
|
10.5
(3 rows)
Remember that because constant interpolation is the default, the same results are returned if you
write the query using the CONST parameter as follows:
=> SELECT slice_time, symbol, TS_LAST_VALUE(bid, 'CONST') AS last_bid FROM
TickStore
TIMESERIES slice_time AS '2 seconds' OVER (PARTITION BY symbol ORDER BY ts);
Linear interpolation
Based on the same input records described in the constant interpolation examples, which specify
2-second time slices, the result of TS_LAST_VALUE with linear interpolation is as follows:
=> SELECT slice_time, symbol, TS_LAST_VALUE(bid, 'linear') AS last_bid
FROM TickStore
TIMESERIES slice_time AS '2 seconds' OVER (PARTITION BY symbol ORDER BY ts);
In the results, no last_bid value is returned for the last row because the query specified
TS_LAST_VALUE, and there is no data point after the 3:00:04 time slice to interpolate.
slice_time
| symbol | last_bid
---------------------+--------+---------2009-01-01 03:00:00 | XYZ
|
10.2
2009-01-01 03:00:02 | XYZ
|
10.4
2009-01-01 03:00:04 | XYZ
|
(3 rows)
-374-
Using Time Series Analy tics
Using multiple time series aggregate functions
Multiple time series aggregate functions can exists in the same query. They share the same
gap-filling policy as defined in the TIMESERIES clause; however, each time series aggregate
function can specify its own interpolation policy. In the following example, there are two constant
and one linear interpolation schemes, but all three functions use a three-second time slice:
=> SELECT slice_time, symbol,
TS_FIRST_VALUE(bid, 'const') fv_c,
TS_FIRST_VALUE(bid, 'linear') fv_l,
TS_LAST_VALUE(bid, 'const') lv_c
FROM TickStore
TIMESERIES slice_time AS '3 seconds' OVER(PARTITION BY symbol ORDER BY ts);
In the following output, the original output is compared to output returned by multiple time series
aggregate functions.
ts
| symbol | bid
----------+--------+-----03:00:00 | XYZ
|
10
03:00:05 | XYZ
| 10.5
(2 rows)
==>
slice_time
| symbol | fv_c | fv_l | lv_c
---------------------+--------+------+------+-----2009-01-01 03:00:00 | XYZ
|
10 |
10 |
10
2009-01-01 03:00:03 | XYZ
|
10 | 10.3 | 10.5
(2 rows)
Using the analytic LAST_VALUE() function
Here's an example using LAST_VALUE(), so you can see the difference between it and the GFI
syntax.
=> SELECT *, LAST_VALUE(bid) OVER(PARTITION by symbol ORDER BY ts)
AS "last bid" FROM TickStore;
There is no gap filling and interpolation to the output values.
ts
| symbol | bid | last bid
---------------------+--------+------+---------2009-01-01 03:00:00 | XYZ
|
10 |
10
2009-01-01 03:00:05 | XYZ
| 10.5 |
10.5
(2 rows)
Using slice_time
In a TIMESERIES query, you cannot use the column slice_time in the WHERE clause
because the WHERE clause is evaluated before the TIMESERIES clause, and the slice_time
column is not generated until the TIMESERIES clause is evaluated. For example, HP Vertica does
not support the following query:
=> SELECT symbol, slice_time, TS_FIRST_VALUE(bid IGNORE NULLS) AS fv
FROM TickStore
WHERE slice_time = '2009-01-01 03:00:00'
TIMESERIES slice_time as '2 seconds' OVER (PARTITION BY symbol ORDER BY ts);
ERROR: Time Series timestamp alias/Time Series Aggregate Functions not allowed
in WHERE clause
Instead, you could write a subquery and put the predicate on slice_time in the outer query:
=> SELECT * FROM (
SELECT symbol, slice_time,
-375-
Programmer's Guide
TS_FIRST_VALUE(bid IGNORE NULLS) AS fv
FROM TickStore
TIMESERIES slice_time AS '2 seconds'
OVER (PARTITION BY symbol ORDER BY ts) ) sq
WHERE slice_time = '2009-01-01 03:00:00';
symbol |
slice_time
| fv
--------+---------------------+---XYZ
| 2009-01-01 03:00:00 | 10
(1 row)
Creating a dense time series
The TIMESERIES clause provides a convenient way to create a dense time series for use in an
outer join with fact data. The results represent every time point, rather than just the time points for
which data exists.
The examples that follow use the same TickStore schema described in Gap Filling and
Interpolation (GFI) (page 369), along with the addition of a new inner table for the purpose of
creating a join:
=> CREATE TABLE inner_table (
ts TIMESTAMP,
bid FLOAT
);
=> CREATE PROJECTION inner_p (ts, bid) as SELECT * FROM inner_table
ORDER BY ts, bid UNSEGMENTED ALL NODES;
=> INSERT INTO inner_table VALUES ('2009-01-01 03:00:02', 1);
=> INSERT INTO inner_table VALUES ('2009-01-01 03:00:04', 2);
=> COMMIT;
You can create a simple union between the start and end range of the timeframe of interest in
order to return every time point. This example uses a 1-second time slice:
=> SELECT ts FROM (
SELECT '2009-01-01 03:00:00'::TIMESTAMP AS time FROM TickStore
UNION
SELECT '2009-01-01 03:00:05'::TIMESTAMP FROM TickStore) t
TIMESERIES ts AS '1 seconds' OVER(ORDER BY time);
ts
--------------------2009-01-01 03:00:00
2009-01-01 03:00:01
2009-01-01 03:00:02
2009-01-01 03:00:03
2009-01-01 03:00:04
2009-01-01 03:00:05
(6 rows)
The next query creates a union between the start and end range of the timeframe using
500-millisecond time slices:
-376-
Using Time Series Analy tics
=> SELECT ts FROM (
SELECT '2009-01-01 03:00:00'::TIMESTAMP AS time
FROM TickStore
UNION
SELECT '2009-01-01 03:00:05'::TIMESTAMP FROM TickStore) t
TIMESERIES ts AS '500 milliseconds' OVER(ORDER BY time);
ts
----------------------2009-01-01 03:00:00
2009-01-01 03:00:00.5
2009-01-01 03:00:01
2009-01-01 03:00:01.5
2009-01-01 03:00:02
2009-01-01 03:00:02.5
2009-01-01 03:00:03
2009-01-01 03:00:03.5
2009-01-01 03:00:04
2009-01-01 03:00:04.5
2009-01-01 03:00:05
(11 rows)
The following query creates a union between the start- and end-range of the timeframe of interest
using 1-second time slices:
=> SELECT * FROM (
SELECT ts FROM (
SELECT '2009-01-01 03:00:00'::timestamp AS time FROM TickStore
UNION
SELECT '2009-01-01 03:00:05'::timestamp FROM TickStore) t
TIMESERIES ts AS '1 seconds' OVER(ORDER BY time) ) AS outer_table
LEFT OUTER JOIN inner_table ON outer_table.ts = inner_table.ts;
The union returns a complete set of records from the left-joined table with the matched records in
the right-joined table. Where the query found no match, it extends the right side column with null
values:
ts
|
ts
| bid
---------------------+---------------------+----2009-01-01 03:00:00 |
|
2009-01-01 03:00:01 |
|
2009-01-01 03:00:02 | 2009-01-01 03:00:02 |
1
2009-01-01 03:00:03 |
|
2009-01-01 03:00:04 | 2009-01-01 03:00:04 |
2
2009-01-01 03:00:05 |
|
(6 rows)
-377-
378
When Time Series Data Contains Null Values
Null values are not common inputs for gap-filling and interpolation (GFI) computation, but if null
values do exist, you can use time series aggregate functions
(TS_FIRST_VALUE/TS_LAST_VALUE) with the IGNORE NULLS arguments to affect output of
the interpolated values. The TSA functions are treated similarly to their analytic counterparts
(FIRST_VALUE/LAST_VALUE) in that if the timestamp itself is null HP Vertica filter out those rows
before gap filling and interpolation occurs.
The three images below will illustrate the points that follow on how HP Vertica handles time series
data that contains null values.
Figure 1. Interpolated bid values
Figure 2. CONST-interpolated bid Figure 3. LINEA R-interpolat ed bid
when the input has no NULLs
values when the input has NULL
values
values when the input has NULL
values
Constant interpolation with null values
Figure 1 illustrates a default (constant) interpolation result on four input rows where none of the
inputs contains a NULL value. Figure 2 shows the same input rows with the addition of another
input record whose bid value is NULL, and whose timestamp (ts) value is 3:00:03.
For constant interpolation, the bid value starting at 3:00:03 is null until the next non-null bid value
appears in time. In Figure 2, the presence of the null row makes the interpolated bid value null in
the time interval denoted by the shaded region. As a result, if TS_FIRST_VALUE(bid) is
evaluated with constant interpolation on the time slice that begins at 3:00:02, its output is non-null.
However, TS_FIRST_VALUE(bid) on the next time slice produces null. If the last value of the
3:00:02 time slice is null, the first value for the next time slice (3:00:04) is null. However, if yo u
were to use a TSA function with IGNORE NULLS, then the value at 3:00:04 would be the same
value as it was at 3:00:02.
To illustrate, insert a new row into the TickStore table at 03:00:03 with a null bid value, HP Vertica
will output a row for the 03:00:02 record with a null value but no row for the 03:00:03 input:
=> INSERT INTO tickstore VALUES('2009-01-01 03:00:03', 'XYZ', NULL);
=> SELECT slice_time, symbol, TS_LAST_VALUE(bid) AS last_bid FROM TickStore
-> TIMESERIES slice_time AS '2 seconds' OVER (PARTITION BY symbol ORDER BY ts);
-378-
Using Time Series Analytics
slice_time
| symbol | last_bid
---------------------+--------+---------2009-01-01 03:00:00 | XYZ
|
10
2009-01-01 03:00:02 | XYZ
|
2009-01-01 03:00:04 | XYZ
|
10.5
(3 rows)
If you specify IGNORE NULLS, HP Vertica fills in the missing data point using a constant
interpolation scheme. Here, the bid price at 03:00:02 is interpolated to the last known input record
for bid, which was $10 at 03:00:00:
=> SELECT slice_time, symbol, TS_LAST_VALUE(bid IGNORE NULLS) AS last_bid FROM
TickStore
-> TIMESERIES slice_time AS '2 seconds' OVER (PARTITION BY symbol ORDER BY ts);
slice_time
| symbol | last_bid
---------------------+--------+---------2009-01-01 03:00:00 | XYZ
|
10
2009-01-01 03:00:02 | XYZ
|
10
2009-01-01 03:00:04 | XYZ
|
10.5
(3 rows)
Now if you were to insert a row where the timestamp column contained a null value, HP Vertica
would filter out that row before gap filling and interpolation occurred.
=> INSERT INTO tickstore VALUES(NULL, 'XYZ', 11.2);
=> SELECT slice_time, symbol, TS_LAST_VALUE(bid) AS last_bid FROM TickStore
-> TIMESERIES slice_time AS '2 seconds' OVER (PARTITION BY symbol ORDER BY ts);
Notice there is no output for the 11.2 bid row:
slice_time
| symbol | last_bid
---------------------+--------+---------2009-01-01 03:00:00 | XYZ
|
10
2009-01-01 03:00:02 | XYZ
|
2009-01-01 03:00:04 | XYZ
|
10.5
(3 rows)
Linear interpolation with null values
For linear interpolation, the interpolated bid value becomes null in the time interval, which is
represented by the shaded region in Figure 3. In the presence of an input null value at 3:00:03, HP
Vertica cannot linearly interpolate the bid value around that time point.
HP Vertica takes the closest non null value on either side of the time slice and uses that value. For
example, if you use a linear interpolation scheme and you do not specify IGNORE NULLS, and
your data has one real value and one null, the result is null. If the value on either side is null, the
result is null. Therefore, to evaluate TS_FIRST_VALUE(bid) with linear interpolation on the time
slice that begins at 3:00:02, its output is null. TS_FIRST_VALUE(bid) on the next time slice
remains null.
=> SELECT slice_time, symbol, TS_FIRST_VALUE(bid, 'linear') AS fv_l FROM TickStore
-> TIMESERIES slice_time AS '2 seconds' OVER (PARTITION BY symbol ORDER BY ts);
-379-
Programmer's Guide
slice_time
| symbol | fv_l
---------------------+--------+-----2009-01-01 03:00:00 | XYZ
|
10
2009-01-01 03:00:02 | XYZ
|
2009-01-01 03:00:04 | XYZ
|
(3 rows)
Event Series Joins
An event series join is an HP Vertica SQL extension that enables the analysis of two series when
their measurement intervals don’t align precisely, such as with mismatched timestamps. You can
compare values from the two series directly, rather than having to normalize the series to the
same measurement interval.
Event series joins are an extension of outer joins (page 322), but instead of padding the
non-preserved side with NULL values when there is no match, the event series join pads the
non-preserved side values that it interpolates from the previous value.
The difference in how you write a regular join versus an event series join is the INTERPOLATE
predicate, which is used in the ON clause. For example, the following two statements show the
differences, which are shown in greater detail in Writing Event Series Joins (page 383).
Regular full outer join
SELECT * FROM hTicks h
FULL OUTER JOIN aTicks a
ON (h.time = a.time);
Event series join
SELECT * FROM hTicks h
FULL OUTER JOIN aTicks a
ON (h.time INTERPOLATE PREVIOUS VALUE a.time);
Similar to regular joins, an event series join has inner and outer join modes, which are described in
the topics that follow.
For full syntax, including notes and restrictions, see INTERPOLATE in the SQL Reference Manual
Sample Schema for Event Series Joins Examples
If you don't plan to run the queries and just want to look at the examples, you can skip this topic
and move straight to Writing Event Series Joins (page 383).
Schema of hTicks and aTicks tables
The examples that follow use the following hTicks and aTicks tables schemas:
-380-
Event Series Joins
CREATE TABLE hTicks (
stock VARCHAR(20),
time TIME,
price NUMERIC(8,2)
);
CREATE TABLE aTicks (
stock VARCHAR(20),
time TIME,
price NUMERIC(8,2)
);
Although TIMESTAMP is more commonly used for the event series column, the examples in this
topic use TIME to keep the output simple.
INSERT INTO
INSERT INTO
INSERT INTO
INSERT INTO
INSERT INTO
INSERT INTO
INSERT INTO
INSERT INTO
COMMIT;
hTicks
hTicks
hTicks
hTicks
aTicks
aTicks
aTicks
aTicks
VALUES
VALUES
VALUES
VALUES
VALUES
VALUES
VALUES
VALUES
('HPQ', '12:00', 50.00);
('HPQ', '12:01', 51.00);
('HPQ', '12:05', 51.00);
('HPQ', '12:06', 52.00);
('ACME', '12:00', 340.00);
('ACME', '12:03', 340.10);
('ACME', '12:05', 340.20);
('ACME', '12:05', 333.80);
Output of the two tables:
hTicks
aTicks
=> SELECT * FROM hTicks;
=> SELECT * FROM aTicks;
Notice there are no entry
records between 12:02-12:04:
Notice there are no entry
records at 12:01, 12:02 and at
12:04:
stock |
time
| price
-------+----------+------HPQ
| 12:00:00 | 50.00
HPQ
| 12:01:00 | 51.00
HPQ
| 12:05:00 | 51.00
HPQ
| 12:06:00 | 52.00
(4 rows)
stock |
time
| price
-------+----------+-------ACME | 12:00:00 | 340.00
ACME | 12:03:00 | 340.10
ACME | 12:05:00 | 340.20
ACME | 12:05:00 | 333.80
(4 rows)
Example query showing gaps
A full outer join shows the gaps in the timestamps:
=> SELECT * FROM hTicks h FULL OUTER JOIN aTicks a ON h.time = a.time;
stock |
time
| price | stock |
time
| price
-------+----------+-------+-------+----------+-------HPQ
| 12:00:00 | 50.00 | ACME | 12:00:00 | 340.00
HPQ
| 12:01:00 | 51.00 |
|
|
HPQ
| 12:05:00 | 51.00 | ACME | 12:05:00 | 333.80
-381-
Programmer's Guide
HPQ
HPQ
| 12:05:00 | 51.00 | ACME
| 12:06:00 | 52.00 |
|
|
| ACME
(6 rows)
| 12:05:00 | 340.20
|
|
| 12:03:00 | 340.10
Schema of bid and asks tables
The examples that follow use the following hTicks and aTicks tables schemas:
CREATE TABLE bid(stock
CREATE TABLE ask(stock
INSERT INTO bid VALUES
INSERT INTO bid VALUES
INSERT INTO bid VALUES
INSERT INTO bid VALUES
INSERT INTO bid VALUES
INSERT INTO ask VALUES
INSERT INTO ask VALUES
INSERT INTO ask VALUES
COMMIT;
VARCHAR(20), time TIME, price NUMERIC(8,2));
VARCHAR(20), time TIME, price NUMERIC(8,2));
('HPQ', '12:00', 100.10);
('HPQ', '12:01', 100.00);
('ACME', '12:00', 80.00);
('ACME', '12:03', 79.80);
('ACME', '12:05', 79.90);
('HPQ', '12:01', 101.00);
('ACME', '12:00', 80.00);
('ACME', '12:02', 75.00);
Output of the two tables:
bid
=> SELECT * FROM bid;
ask
=> SELECT * FROM ask;
Notice there are no entry records for
stock ORCL at 12:02 and at 12:04:
Notice there are no entry records for stock
IBM at 12:00 and none for ORCL at 12:01:
stock |
time
| price
-------+----------+-------HPQ
| 12:00:00 | 100.10
HPQ
| 12:01:00 | 100.00
ACME | 12:00:00 | 80.00
ACME | 12:03:00 | 79.80
ACME | 12:05:00 | 79.90
(5 rows)
stock |
time
| price
-------+----------+-------HPQ
| 12:01:00 | 101.00
ACME | 12:00:00 | 80.00
ACME | 12:02:00 | 75.00
(3 rows)
Example query showing gaps
A full outer join shows the gaps in the timestamps:
=> SELECT * FROM bid b FULL OUTER JOIN ask a ON b.time = a.time;
stock |
time
| price | stock |
time
| price
-------+----------+--------+-------+----------+-------HPQ
| 12:00:00 | 100.10 | ACME | 12:00:00 | 80.00
HPQ
| 12:01:00 | 100.00 | HPQ
| 12:01:00 | 101.00
ACME | 12:00:00 | 80.00 | ACME | 12:00:00 | 80.00
ACME | 12:03:00 | 79.80 |
|
|
ACME | 12:05:00 | 79.90 |
|
|
|
|
| ACME | 12:02:00 | 75.00
-382-
Event Series Joins
(6 rows)
Writing Event Series Joins
The examples in this topic contains mismatches between timestamps—just as you'd find in real
life situations; for example, there could be a period of inactivity on stocks where no trade occurs,
which can present challenges when you want to compare two stocks whose timestamps don't
match.
The hTicks and aTicks tables
As described in the example ticks schema (page 380), tables, hTicks is missing input rows for
12:02, 12:03, and 12:04, and aTicks is missing inputs at 12:01, 12:02, and 12:04.
hTicks
=> SELECT * FROM hTicks;
stock |
time
| price
-------+----------+------HPQ
| 12:00:00 | 50.00
HPQ
| 12:01:00 | 51.00
HPQ
| 12:05:00 | 51.00
HPQ
| 12:06:00 | 52.00
(4 rows)
aTicks
=> SELECT * FROM aTicks;
stock |
time
| price
-------+----------+-------ACME | 12:00:00 | 340.00
ACME | 12:03:00 | 340.10
ACME | 12:05:00 | 340.20
ACME | 12:05:00 | 333.80
(4 rows)
Querying event series data with full outer joins
Using a traditional full outer join, this query find a match between tables hTicks and aTicks at
12:00 and 12:05 and pads the missing data points with NULL values.
=> SELECT * FROM hTicks h FULL OUTER JOIN aTicks a ON (h.time = a.time);
stock |
time
| price | stock |
time
| price
-------+----------+-------+-------+----------+-------HPQ
| 12:00:00 | 50.00 | ACME | 12:00:00 | 340.00
HPQ
| 12:01:00 | 51.00 |
|
|
HPQ
| 12:05:00 | 51.00 | ACME | 12:05:00 | 333.80
HPQ
| 12:05:00 | 51.00 | ACME | 12:05:00 | 340.20
HPQ
| 12:06:00 | 52.00 |
|
|
|
|
| ACME | 12:03:00 | 340.10
(6 rows)
To replace the gaps with interpolated values for those missing data points, use the
INTERPOLATE predicate to create an event series join. The join condition is restricted to the ON
clause, which evaluates the equality predicate on the timestamp columns from the two input
tables. In other words, for each row in outer table hTicks, the ON clause predicates are evaluated
for each combination of each row in the inner table aTicks.
Simply rewrite the full outer join query to use the INTERPOLATE predicate with the required
PREVIOUS VALUE keywords. Note that a full outer join on event series data is the most common
scenario for event series data, where you keep all rows from both tables
-383-
Programmer's Guide
=> SELECT * FROM hTicks h FULL OUTER JOIN aTicks a
ON (h.time INTERPOLATE PREVIOUS VALUE a.time);
HP Vertica interpolates the missing values (which appear as NULL in the full outer join) using that
table's previous value:
Note: The output ordering above is different from the regular full outer join because in the event
series join, interpolation occurs independently for each stock (hTicks and aTicks), where the
data is partitioned and sorted based on the equality predicate. This means that interpolation
occurs within, not across, partitions.
If you review the regular full outer join output, you can see that both tables have a match in the
time column at 12:00 and 12:05, but at 12:01, there is no entry record for ACME. So the operation
interpolates a value for ACME (ACME,12:00,340) based on the previous value in the aTicks
table.
Querying event series data with left outer joins
You can also use left and right outer joins. You might, for example, decide you want to preserve
only hTicks values. So you'd write a left outer join:
=> SELECT * FROM hTicks h LEFT OUTER JOIN aTicks a
ON (h.time INTERPOLATE PREVIOUS VALUE a.time);
stock |
time
| price | stock |
time
| price
-------+----------+-------+-------+----------+-------HPQ
| 12:00:00 | 50.00 | ACME | 12:00:00 | 340.00
HPQ
| 12:01:00 | 51.00 | ACME | 12:00:00 | 340.00
HPQ
| 12:05:00 | 51.00 | ACME | 12:05:00 | 333.80
HPQ
| 12:05:00 | 51.00 | ACME | 12:05:00 | 340.20
HPQ
| 12:06:00 | 52.00 | ACME | 12:05:00 | 340.20
(5 rows)
Here's what the same data looks like using a traditional left outer join:
=> SELECT * FROM hTicks h LEFT OUTER JOIN aTicks a ON h.time = a.time;
stock |
time
| price | stock |
time
| price
-------+----------+-------+-------+----------+-------HPQ
| 12:00:00 | 50.00 | ACME | 12:00:00 | 340.00
HPQ
| 12:01:00 | 51.00 |
|
|
HPQ
| 12:05:00 | 51.00 | ACME | 12:05:00 | 333.80
HPQ
| 12:05:00 | 51.00 | ACME | 12:05:00 | 340.20
HPQ
| 12:06:00 | 52.00 |
|
|
(5 rows)
-384-
Event Series Joins
Note that a right outer join has the same behavior with the preserved table reversed.
Querying event series data with inner joins
Note that INNER event series joins behave the same way as normal ANSI SQL-99 joins, where all
gaps are omitted. Thus, there is nothing to interpolate, and the following two queries are
equivalent and return the same result set:
A regular inner join:
=> SELECT * FROM HTicks h JOIN aTicks a
ON (h.time INTERPOLATE PREVIOUS VALUE a.time);
stock |
time
| price | stock |
time
| price
-------+----------+-------+-------+----------+-------HPQ
| 12:00:00 | 50.00 | ACME | 12:00:00 | 340.00
HPQ
| 12:05:00 | 51.00 | ACME | 12:05:00 | 333.80
HPQ
| 12:05:00 | 51.00 | ACME | 12:05:00 | 340.20
(3 rows)
An event series inner join:
=> SELECT * FROM HTicks h INNER JOIN aTicks a ON (h.time = a.time);
stock |
time
| price | stock |
time
| price
-------+----------+-------+-------+----------+-------HPQ
| 12:00:00 | 50.00 | ACME | 12:00:00 | 340.00
HPQ
| 12:05:00 | 51.00 | ACME | 12:05:00 | 333.80
HPQ
| 12:05:00 | 51.00 | ACME | 12:05:00 | 340.20
(3 rows)
The bid and ask tables
Using the example schema (page 380) for the bid and ask tables, write a full outer join to
interpolate the missing data points:
=> SELECT * FROM bid b FULL OUTER JOIN ask a
ON (b.stock = a.stock AND b.time INTERPOLATE PREVIOUS VALUE a.time);
In the below output, the first row for stock HPQ shows nulls because there is no entry record for
HPQ before 12:01.
stock |
time
| price | stock |
time
| price
-------+----------+--------+-------+----------+-------ACME | 12:00:00 | 80.00 | ACME | 12:00:00 | 80.00
ACME | 12:00:00 | 80.00 | ACME | 12:02:00 | 75.00
ACME | 12:03:00 | 79.80 | ACME | 12:02:00 | 75.00
ACME | 12:05:00 | 79.90 | ACME | 12:02:00 | 75.00
HPQ
| 12:00:00 | 100.10 |
|
|
HPQ
| 12:01:00 | 100.00 | HPQ
| 12:01:00 | 101.00
(6 rows)
-385-
Programmer's Guide
Note also that the same row (ACME,12:02,75) from the ask table appears three times. The first
appearance is because no matching rows are present in the bid table for the row in ask, so
Vertica interpolates the missing value using the ACME value at 12:02 (75.00). The second
appearance occurs because the row in bid (ACME,12:05,79.9) has no matches in ask. The
row from ask that contains (ACME,12:02,75) is the closest row; thus, it is used to interpolate the
values.
If you write a regular full outer join, you can see where the mismatched timestamps occur:
=> SELECT * FROM bid b FULL OUTER JOIN ask a ON (b.time = a.time);
stock |
time
| price | stock |
time
| price
-------+----------+--------+-------+----------+-------ACME | 12:00:00 | 80.00 | ACME | 12:00:00 | 80.00
ACME | 12:03:00 | 79.80 |
|
|
ACME | 12:05:00 | 79.90 |
|
|
HPQ
| 12:00:00 | 100.10 | ACME | 12:00:00 | 80.00
HPQ
| 12:01:00 | 100.00 | HPQ
| 12:01:00 | 101.00
|
|
| ACME | 12:02:00 | 75.00
(6 rows)
Event Series Pattern Matching
The SQL MATCH clause syntax (described in the SQL Reference Manual) lets you screen large
amounts of historical data in search of event patterns. You specify a pattern as a regular
expression and can then search for the pattern within a sequence of input events. MATCH
provides subclauses for analytic data partitioning and ordering, and the pattern matching occurs
on a contiguous set of rows.
Pattern matching is particularly useful for clickstream analysis where you might want to identify
users' actions based on their Web browsing behavior (page clicks). A typical online clickstream
funnel is:
Company home page -> product home page -> search -> results -> purchase online
Using the above clickstream funnel, you can search for a match on the user's sequence of web
clicks and identify that the user:





landed on the company home page
navigated to the product page
ran a search
clicked a link from the search results
made a purchase
Clickstream funnel schema
The examples in this topic use this clickstream funnel and the following clickstream_log table
schema:
-386-
Event Series Pattern Matching
CREATE TABLE clickstream_log (
uid INT,
--user ID
sid INT,
--browsing session ID, produced by previous sessionization com putation
ts TIME,
--timestamp that occurred during the user's page visit
refURL VARCHAR(20), --URL of the page referencing PageURL
pageURL VARCHAR(20), --URL of the page being visited
action CHAR(1)
--action the user took after visiting the page ('P' = Purchase, 'V' = View)
);
INSERT INTO clickstream_log VALUES (1,100,'12:00','website1.com','website2.com/home', 'V');
INSERT INTO clickstream_log VALUES (1,100,'12:01','website2.com/home','website2.com/floby', 'V');
INSERT INTO clickstream_log VALUES (1,100,'12:02','website2.com/floby','website2.com/shamwow', 'V');
insert into clickstream_log values (1,100,'12:03','website2.com/shamwow','website2.com/buy', 'P');
insert into clickstream_log values (2,100,'12:10','website1. com','website2.com/home', 'V');
insert into clickstream_log values (2,100,'12:11','website2.com/home','website2.com/forks', 'V');
insert into clickstream_log values (2,100,'12:13','website2.com/forks','website2.com/buy', 'P');
COMMIT;
Here's the clickstream_log table's output:
=> SELECT * FROM clickstream_log;
uid | sid |
ts
|
refURL
|
pageURL
| action
-----+-----+----------+----------------------+----------------------+-------1 | 100 | 12:00:00 | website1.com
| website2.com/home
| V
1 | 100 | 12:01:00 | website2.com/home
| website2.com/floby
| V
1 | 100 | 12:02:00 | website2.com/floby
| website2.com/shamwow | V
1 | 100 | 12:03:00 | website2.com/shamwow | website2.com/buy
| P
2 | 100 | 12:10:00 | website1.com
| website2.com/home
| V
2 | 100 | 12:11:00 | website2.com/home
| website2.com/forks
| V
2 | 100 | 12:13:00 | website2.com/forks
| website2.com/buy
| P
(7 rows)
Example
This example includes the HP Vertica pattern matching functions to analyze users' browsing
history over website2.com. It identifies patterns where the user performed the following tasks:
 Landed on website2.com from another web site (Entry)
 Browsed to any number of other pages (Onsite)
 Made a purchase (Purchase)
In the following statement, pattern P (Entry Onsite* Purchase) consist of three event types:
Entry, Onsite, and Purchase. When HP Vertica finds a match in the input table, the associated
pattern instance must be an event of type Entry followed by 0 or more events of type Onsite, and
an event of type Purchase
SELECT uid,
sid,
ts,
refurl,
pageurl,
action,
event_name(),
pattern_id(),
match_id()
FROM clickstream_log
MATCH
-387-
Programmer's Guide
(PARTITION BY uid, sid
DEFINE
Entry
AS RefURL
'%website2.com%',
Onsite
AS PageURL
Purchase AS PageURL
PATTERN
P AS (Entry Onsite*
RESULTS ALL ROWS);
ORDER BY ts
NOT ILIKE '%website2.com%' AND PageURL ILIKE
ILIKE
ILIKE
'%website2.com%' AND Action='V',
'%website2.com%' AND Action = 'P'
Purchase)
In the output below, the first four rows represent the pattern for user 1's browsing activity, while the
following three rows show user 2's browsing habits.
uid | sid |
ts
|
refurl
|
pageurl
| action | event_name |
pattern_id | match_id
-----+-----+----------+----------------------+----------------------+--------+------------+-----------+---------1 | 100 | 12:00:00 | website1.com
| website2.com/home
|V
| Entry
|
1
|
1
1 | 100 | 12:01:00 | website2.com/home
| website2.com/floby
|V
| Onsite
|
1
|
2
1 | 100 | 12:02:00 | website2.com/floby
| website2.com/shamwow | V
| Onsite
|
1
|
3
1 | 100 | 12:03:00 | website2.com/shamwow | website2.com/buy
|P
| Purchase
|
1
|
4
2 | 100 | 12:10:00 | website1.com
| website2.com/home
|V
| Entry
|
1
|
1
2 | 100 | 12:11:00 | website2.com/home
| website2.com/forks
|V
| Onsite
|
1
|
2
2 | 100 | 12:13:00 | website2.com/forks
| website2.com/buy
|P
| Purchase
|
1
|
3
(7 rows)
See Also
MATCH Clause and Pattern Matching Functions in the SQL Reference Manual
Perl Regular Expressions Documentation (http://perldoc.perl.org/perlre.html)
-388-
Optimizing Query Performance
When you submit a query to HP Vertica for processing, the HP Vertica query optimizer
automatically chooses a set of operations to compute the requested result. These operations
together are called a query plan. The choice of operations can drastically affect the run-time
performance and resource consumption needed to compute the query results. Depending on the
properties of the projections defined in your database, the query optimizer can choose faster and
more efficient operations to compute the query results.
This section describes the different operations that the optimizer uses and how you can get the
optimizer to use the most efficient operations to compute the results of your query.
Note: Database response time depends on factors such as type and size of the application
query, database design, data size and data types stored, available computational power, and
network bandwidth. Adding nodes to a database cluster does not necessarily improve the
system response time for every query, especially if the response time is already short, e.g., less
then 10 seconds, or the response time is not hardware bound.
First Steps for Improving Query Performance
To improve the performance of your queries, take these steps to ensure that the database is
optimized for query performance:
Run Database Designer
Your first step should always be to run Database Designer. Database Designer creates a physical
schema for your database that provides optimal query performance. The first time you run
Database Designer, create a comprehensive design and make sure Database Designer has
relevant sample queries and data on which to base the design. If you develop performance issues
later, consider loading additional queries that you run frequently and rerun Database Designer to
create a query-specific design.
For more information about running Database Designer, see Using the Database Designer.
When you run Database Designer, choose the Update Statistics option. The HP Vertica query
optimizer uses statistics about the data to create a query plan. Statistics help the optimizer
determine:



Multiple eligible projections to answer the query
The best order in which to perform joins
Data distribution algorithms, such as broadcast and re-segmentation
If your statistics become out of date, run the ANALYZE_STATISTICS or ANALYZE_HISTOGRAM
function to update statistics for a given schema, table, or column. For more information, see
Collecting Database Statistics.
-389-
Programmer's Guide
Check Query Events Proactively
The QUERY_EVENTS system table identifies whether there are issues with the planning phase of
a query. In particular, the following values in the EVENT_TYPE column might indicate a problem
that needs to be addressed:

PREDICATE OUTSIDE HISTOGRAM: The optimizer encountered a predicate that was false
for the entire histogram created by ANALYZE_STATISTICS or ANALYZE HISTOGRAM.
 NO HISTOGRAM: The optimizer encountered a predicate on a column for which it does not
have a histogram.
 MEMORY LIMIT HIT: The optimizer used all its allocated memory creating the query plan. If
you see this value, simplify your query instead of increasing the memory allocation.
The QUERY_EVENTS table also gives a detailed description of each issue and suggests
solutions. For more information about this system table, see QUERY_EVENTS.
Review the Query Plan
A query plan is a sequence of step-like paths that the Vertica query optimizer selects to access or
alter information in your Vertica database. There are two ways to get information about the query
plan:


Run the EXPLAIN command. Each step (path) represents a single operation that the optimizer
uses for its execution strategy.
Query the QUERY_PLAN_PROFILES system table. This table provides detailed execution
status for currently running queries. Output from the QUERY_PLAN_PROFILES table shows
the real-time flow of data and the time and resources consumed for each path in each query
plan.
For more information, see How to get query plan information.
Optimizing Encoding to Improve Query Performance
You can potentially make queries faster by changing the encoding of the columns. Encoding
reduces the on-disk size of your data so that the amount of I/O required for queries is reduced,
resulting in faster execution times. Make sure that all columns and projections included in the
query are using the correct encoding for the data. To do this, take the following steps:
1 Run Database Designer to create a query-specific design. Database Designer implements the
optimum encoding and projection design.
2 After updating the query-specific design, update statistics using the ANALYZE_STATISTICS
function.
3 Run EXPLAIN with one or more of the queries you submitted to the design to make sure it is
using the new projections.
Alternatively, run DESIGNER_DESIGN_PROJECTION_ENCODINGS to re-evaluate the current
encoding and update it if necessary.
-390-
Optimizing Query Per formance
Improving the Compression of FLOAT Columns
If you are seeing slow performance or a large storage footprint with your FLOAT data, evaluate the
data and your business needs to to see if it can be contained in a NUMERIC column with a
precision of 18 digits or less. Converting a FLOAT column to a NUMERIC column can improve
data compression, reduce the on-disk size of your database, and improve the performance of
queries on that column.
When you define a NUMERIC data type, you specify the precision and the scale; NUMERIC data
are exact representations of data. FLOAT data types represent variable precision and
approximate values; they take up more space in the database.
Converting FLOAT columns to NUMERIC columns is most effective when:


The precision of the NUMERIC column is 18 digits or less. Vertica has finetuned the
performance of NUMERIC data for the common case of 18 digits of precision. Vertica does not
recommend converting FLOAT columns to NUMERIC columns that require a precision of
more than 18 digits.
The precision of the FLOAT values is bounded, and the values will all fall within a specified
precision for a NUMERIC column. One example is monetary values like product prices or
financial transaction amounts. For example, a column defined as NUMERIC(11,2) can
accommodate prices from 0 to a few million dollars and can store cents, and compresses more
efficiently than a FLOAT column.
If you try to load a value into a NUMERIC column that exceeds the specified precision, Vertica
gives an error and does not load the data. If you assign a value with more decimal digits than the
specified scale, the value is rounded to match the specified scale and stored in that column.
For more information, see Numeric Data Types.
Using Run Length Encoding (RLE) to Improve Query Performance
If you run Database Designer and choose to optimize for loads, which minimizes database
footprint, Database Designer applies the most appropriate encodings, including RLE, to columns
in order to maximize query performance.
In an HP Vertica database, run length encoding (RLE) replaces sequences (runs) of identical data
values in a column with a set of pairs, where each pair represents the value and number of
occurrences. For example, a gender column might have 47 instances of F and 56 instances of M.
Using RLE, HP Vertica can save disk space by storing the pairs (47, F) and (56, M).
The advantage of RLE is that it reduces disk I/O and results in a smaller storage footprint for the
database. Use RLE for low-cardinality columns, where the average repetition count is less than
10. For example, a gender column with 47 F values and 56 M values can benefit from RLE. A
gender column with 6 F values and 10 M values, where the average repetition count is 8, does not
benefit from RLE.
-391-
Programmer's Guide
Optimizing Projections for Queries with Predicates
If your query contains one or more predicates, you can modify the projections to improve the
query's performance, as described in the following two examples.
Example 1: Queries that Use Date Ranges
This first example shows how to encode data using RLE and change the projection sort order to
improve the performance of a query that retrieves all data within a given date range.
Suppose you have a query that looks like this:
=> SELECT * FROM trades
WHERE trade_date BETWEEN '2007-11-01' AND '2007-12-01';
To optimize this query, determine whether all of the projections can perform the SELECT
operation in a timely manner. Run SELECT COUNT(*) statement for each projection, specifying
the date range, and note the response time. For example:
=> SELECT COUNT(*) FROM [ projection_name ]
WHERE trade_date BETWEEN '2007-11-01' AND '2007-12-01;
If one or more of the queries is slow, check the uniqueness of the trade_date column and
determine if it needs to be in the projection’s ORDER BY clause and/or can be encoded using
RLE. RLE replaces sequences of the same data values within a column by a pair that represents
the value and a count. For best results, order the columns in the projection from lowest cardinality
to highest cardinality, and use RLE to encode the data in low-cardinality columns.
Note: For an example of using sorting and RLE, see Choosing Sort Orders for Low-Cardinality
Predicates.
If the number of unique columns is unsorted, or if the average number of repeated rows is less
than 10, trade_date is too close to being unique and cannot be encoded using RLE. In this
case, add a new column to minimize the search scope.
The following example adds a new column trade_year:
1 Determine if the new column trade_year returns a manageable result set. The following
query returns the data grouped by trade_year:
=> SELECT DATE_TRUNC('trade_year', trade_date), COUNT(*)
FROM trades
GROUP BY DATE_TRUNC('trade_year',trade_date);
2 Assuming that trade_year = 2007 is near 8k, add a column for trade_year to the trades
table. The SELECT statement then becomes:
=> SELECT * FROM trades
WHERE trade_year = 2007
AND trade_date BETWEEN '2007-11-01' AND '2007-12-01';
As a result, you have a projection that is sorted on trade_year, which can be encoded using
RLE.
-392-
Optimizing Query Per formance
Example 2: Queries for Tables with a High-Cardinality Primary Key
This example demonstrates how you can modify the projection to improve the performance of
queries that select data from a table with a high-cardinality primary key.
Suppose you have the following query:
=> SELECT FROM [table]
WHERE pk IN (12345, 12346, 12347,...);
Because the primary key is a high-cardinality column, HP Vertica has to search a large amount of
data.
To optimize the schema for this query, create a new column named buckets and assign it the
value of the primary key divided by 10000. In this example, buckets=(int) pk/10000. Use the
buckets column to limit the search scope as follows:
=> SELECT FROM [table]
WHERE buckets IN (1,...)
AND pk IN (12345, 12346, 12347,...);
Creating a lower cardinality column and adding it to the query limits the search scope and
improves the query performance. In addition, if you create a projection where buckets is first in
the sort order, the query may run even faster.
-393-
394
Optimizing GROUP BY Queries
This section explains several ways you can design your projections to optimize the performance of
your GROUP BY queries.
Avoiding GROUPBY HASH with Projection Design
If your query contains a GROUP BY clause, HP Vertica computes the result with either the
GROUPBY PIPELINED or GROUPBY HASH algorithm.
Both algorithms compute the same results and have similar performance when the query
produces a small number of distinct groups (typically a thousand per node in the cluster). For
queries that contain a large number of groups, GROUPBY PIPELINED uses less memory and can
be faster but is only used when the input data is pre-sorted on the GROUP BY columns.
To improve the performance of a query that has a large number of distinct groups that is currently
using the GROUP BY HASH algorithm, you can enable the use of the GROUPBY PIPELINED
algorithm, as this section describes.
To determine which algorithm your query is using, run the EXPLAIN statement on the query.
The three conditions described in this section refer to the following schema.
CREATE TABLE sortopt (
a INT NOT NULL,
b INT NOT NULL,
c INT,
d INT
);
CREATE PROJECTION sortopt_p (
a_proj,
b_proj,
c_proj,
d_proj )
AS SELECT * FROM sortopt
ORDER BY a,b,c
UNSEGMENTED ALL NODES;
INSERT INTO sortopt VALUES(5,2,13,84);
INSERT INTO sortopt VALUES(14,22,8,115);
INSERT INTO sortopt VALUES(79,9,401,33);
Condition #1
All columns in the query's GROUP BY clause must be included in the projection's sort columns. If
even one column in the GROUP BY clause is excluded from the projection's ORDER BY clause, HP
Vertica uses GROUPBY HASH instead of GROUPBY PIPELINED:
Given a projection sort order ORDER BY a, b, c:
-394-
Optimizing Query Per formance
GROUP
GROUP
GROUP
GROUP
BY
BY
BY
BY
a
a,b
b,a
a,b,c
The query optimizer uses GROUPBY PIPELINED because
columns a, b, and c are included in the projection sort
columns.
GROUP BY c,a,b
GROUP BY a,b,c,d
The query optimizer uses GROUPBY HASH because
column d is not part of the projection sort columns.
Condition #2
If the number of columns in the query's GROUP BY clause is less than the number of columns in the
projection's ORDER BY clause, columns in the query's GROUP BY clause must occur first in the
projection's ORDER BY clause.
Given a projection sort order ORDER BY a, b ,c:
GROUP BY a
GROUP
GROUP
GROUP
GROUP
BY
BY
BY
BY
a,b
b,a
a,b,c
c,a,b
GROUP BY a,c
The query optimizer uses GROUPBY PIPELINED because
columns a, b, c are included in the projection sort
columns.
The query optimizer uses GROUPBY HASH because
columns a and c do not occur first in the projection sort
columns.
Condition #3
If the columns in a query's GROUP BY clause do not appear first in the projection's ORDER BY
clause, then any early-appearing projection sort columns that are missing in the query's GROUP BY
clause must be present as single-column constant equality predicates in the query's WHERE
clause.
Given a projection sort order ORDER BY a, b, c:
SELECT a FROM tab WHERE a = 10
GROUP BY b
The query optimizer uses GROUPBY PIPELINED because
all columns preceding b in the projection sort order appear
as constant equality predicates.
SELECT a FROM tab WHERE a = 10
The query optimizer uses GROUPBY PIPELINED even if
redundant grouping column a is pres ent.
GROUP BY a, b
SELECT a FROM tab WHERE a = 10
GROUP BY b, c
The query optimizer uses GROUPBY PIPELINED because
all columns preceding b and c in the projection sort order
appear as constant equality predicates.
SELECT a FROM tab WHERE a = 10
The query optimizer uses GROUPBY PIPELINED because
all columns preceding b and c in the projection sort order
appear as constant equality predicates.
GROUP BY c, b
-395-
Programmer's Guide
SELECT a FROM tab WHERE a = 10
GROUP BY c
The query optimizer uses GROUPBY HASH because all
columns preceding c in the projection sort order do not
appear as constant equality predicates.
Avoiding Resegmentation During GROUP BY Optimization with
Projection Design
To compute the correct result of a query that contains a GROUP BY clause, HP Vertica must ensure
that all rows with the same value in the GROUP BY expressions end up at the same node for final
computation. If the projection design already guarantees the data is segmented by the GROUP BY
columns, no resegmentation is required ar run time.
To avoid resegmentation, the GROUP BY clause must contain all the segmentation columns of the
projection, but it can also contain other columns.
When your query includes a GROUP BY clause and joins, the joins are performed first. The result of
the join operation is the input to the GROUP BY clause. The segmentation of those intermediate
results may not be consistent with the GROUP BY clause in your query, resulted in resegmentation
at run time.
If your query does not include joins, the GROUP BY clauses are processed using the existing
database projections.
Examples
Assume the following projection:
CREATE PROJECTION … SEGMENTED B
HASH(a,b) ALL NODES
The following table explains whether or not resegmentation occurs at run time and why.
GROUP BY a
Requires resegmentation at run time. The query does
not contain all the projection segmentation columns.
GROUP BY a, b
Does not require resegmentation at run time. The
GROUP BY clause contains all the projection
segment ation columns.
GROUP BY a, b, c
Does not require resegmentation at run time. The
GROUP BY clause contains all the projection
segment ation columns.
GROUP BY a+1, b
Requires resegment ation at run time because of the
expression on column a.
To determine if resegmentation will occurs during your GROUP BY query, look at the EXPLAIN
plan.
For example, the following plan uses GROUPBY PIPELINED sort optimization and requires
resegmentation to perform the GROUP BY calculation:
-396-
Optimizing Query Per formance
+-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 194, Rows: 10K (NO STATISTICS)]
(PATH ID: 1)
The following plan uses GROUPBY PIPELINED sort optimization, but does not require
resegmentation:
+-GROUPBY PIPELINED [Cost: 459, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
Optimizing DISTINCT in a SELECT Query List
This section describes how to optimize queries that have the DISTINCT keyword in their SELECT
list. The techniques for optimizing DISTINCT queries are similar to the techniques for optimizing
GROUP BY queries because when processing queries that use DISTINCT, the HP Vertica
optimizer rewrites the query as a GROUP BY query.
The examples in this section use the following table:
CREATE TABLE table1 (
a INT,
b INT,
c INT
);
If the query has no aggregates in the SELECT list
If your query has no aggregates in the SELECT list, internally, Vertica treats the query as if it uses
GROUP BY instead.
For example, you can rewrite the following query:
SELECT DISTINCT a, b, c FROM table1;
As:
SELECT a, b, c FROM table1 GROUP BY a, b, c;
For fastest execution, apply the optimization techniques for GROUP BY queries described in
Optimizing GROUP BY Queries (page 394).
Optimizing COUNT DISTINCT and Other DISTINCT Aggregates
Computing a DISTINCT aggregate generally requires much more work than other aggregates, so
if your query can be expressed without DISTINCT aggregates, the query executes faster.
Similarly, a query that uses a single DISTINCT aggregate requires less time and resources to
compute than a query with multiple DISTINCT aggregates. Internally, Vertica handles queries
with a single DISTINCT aggregate differently from queries with multiple DISTINCT aggregates.
The examples in this section use the following table:
CREATE TABLE table1 (
a INT,
b INT,
c INT
);
-397-
Programmer's Guide
If the query has a single DISTINCT aggregate
Vertica computes a DISTINCT aggregate by first removing all duplicate values of the aggregate's
argument to find the distinct values. Then it computes the aggregate.
For example, you can rewrite the following query:
SELECT a, b, COUNT(DISTINCT c) AS dcnt FROM table1 GROUP BY a, b;
As:
SELECT a, b, COUNT(dcnt) FROM
(SELECT a, b, c AS dcnt FROM table1 GROUP BY a, b, c)
GROUP BY a, b;
For fastest execution, apply the optimization techniques for GROUP By queries.
If the query has a multiple DISTINCT aggregates
If your query has multiple DISTINCT aggregates, there is no straightforward SQL rewrite that can
compute them. The following query cannot easily be rewritten for improved performance:
SELECT a, COUNT(DISTINCT b), COUNT(DISTINCT c) AS dcnt FROM table1 GROUP BY a;
For a query with multiple DISTINCT aggregates, there is no projection design that can avoid using
GROUPBY HASH and resegmenting the data. To improve performance of this query, make sure that
it has large amounts of memory available. For more information about memory allocation for
queries, see The Resource Manager.
-398-
399
Optimizing JOIN Queries
When you run a query that references more than one table, HP Vertica may need to do one or both
of the following operations to join the tables together:


Sort the data
Resegment the data
The following sections provide recommendations for designing your projections and your queries
to reduce query run time and improve the performance of queries that perform joins.
Hash Joins vs. Merge Joins
When processing a join, the HP Vertica optimizer has two algorithms to choose from:


Merge join—If both inputs are pre-sorted on the join column, the optimizer chooses the faster
merge join, which also uses less memory. Vertica can only perform merge joins on queries that
have INSERT and SELECT operations.
Hash join—Using the hash join algorithm, HP Vertica uses the smaller (inner) joined table to
build an in-memory hash table on the join column. HP Vertica then scans the outer (larger)
table and probes the hash table for matches. A hash join has no sort requirement, but it
consumes more memory because a hash table is built with the values in the inner table.The
cost of performing a hash join is low if the entire hash table can fit in memory, but the cost rises
when the hash table is written to disk. The optimizer chooses a hash join when projections are
not sorted on the join columns.
The optimizer automatically chooses the most appropriate algorithm given the query and
projections in a system. You can facilitate a merge join by adding a projection that is sorted on the
join key.
Optimizing for Merge Joins
To have HP Vertica perform a merge join, which is usually faster than a hash join, design
projections where the join key is the first sorted column. HP Vertica also performs a merge join if
the join key is second in the sort order, following the column used in the equality predicate.
Otherwise, use a subquery to sort the table on the join key before performing the join.
The following projections are ordered by join_key, so the SELECT statement executes as a
merge join:
CREATE TABLE first (
data INT,
join_key INT
);
CREATE TABLE second (
data INT,
-399-
Programmer's Guide
join_key INT
);
CREATE PROJECTION first_p (data, join_key) AS
SELECT data, join_key FROM first
ORDER BY join_key;
CREATE PROJECTION second_p (data, join_key) AS
SELECT data, join_key FROM second
ORDER BY join_key;
SELECT first.data, second.data FROM first, second
WHERE first.join_key = second.join_key;
You also get a merge join if your query has an equality predicate in your query and the join key
immediately follows the column used in the equality predicate, as in the following example. After
applying the predicate, the data is sorted by the join_key column:
CREATE TABLE first (
data INT,
join_key INT
);
CREATE TABLE second (
data INT,
join_key INT
);
CREATE PROJECTION first_p (data, join_key)
AS SELECT data, join_key FROM first
ORDER BY data, join_key;
CREATE PROJECTION second_p (data, join_key)
AS SELECT data, join_key FROM second
ORDER BY join_key;
SELECT first.data, second.data FROM first, second
WHERE first.join_key = second.join_key AND first.data = 5;
Using Equality Predicates to Optimize Joins
Joins run faster if all the columns on the left side of the equality predicate come from one table and
all the columns on the right side of the equality predicate come from another table. For example:
=> SELECT * FROM T JOIN X WHERE T.a + T.b = X.x1 - X.x2;
The following query requires much more work to compute:
=> SELECT * FROM T JOIN X WHERE T.a = X.x1 + T.b
Specifying INNER and OUTER Tables to Optimize Joins
To improve the performance of queries that perform joins, make sure that HP Vertica chooses the
larger table as the outer (left hand) input by ensuring that any applicable constraints are defined.
-400-
Optimizing Query Per formance
Avoiding Resegmentation During Joins
To improve query performance when you join multiple tables, create projections that are
identically segmented on the join keys. These are called identically-segmented projections (ISPs).
Identically-segmented projections allow the joins to occur locally on each node without any data
movement across the network during query processing.
To determine if the projections are identically-segmented on the query join keys, create a query
plan with EXPLAIN. If the query plan contains RESEGMENT or BROADCAST, the projections are not
identically segmented.
The HP Vertica optimizer chooses a projection to supply rows for each table in a query. If two
chosen projections to be joined are segmented, the optimizer uses their segmentation
expressions and the join expressions in the query to determine if the rows are correctly placed to
perform the join without any data movement.
Note: Executing queries that join identically-segmented projections is relevant for multi-node
databases.
Join Conditions for Identically Segmented Projections (ISPs)
A projection p is segmented on join columns if all column references in p’s segmentation
expression are a subset of the columns in the join expression.
The following conditions must be true for two segmented projections p1 of table t1 and p2 of table
t2 to participate in a join of t1 to t2:


The join condition must have the following form:
t1.j1 = t2.j1 AND t1.j2 = t2.j2 AND ... t1.jN = t2.jN
The join columns must share the same base data type. For example:
 If t1.j1 is an INTEGER, t2.j1 can be an INTEGER but it cannot be a FLOAT.

If t1.j1 is a CHAR(10), t2.j1 can be any CHAR or VARCHAR (for example, CHAR(10),
VARCHAR(10), VARCHAR(20)), but t2.j1 cannot be an INTEGER.

If p1 is segmented by an expression on columns {t1.s1, t1.s2, ... t1.sN}, each
segmentation column t1.sX must be in the join column set {t1.jX}.

If p2 is segmented by an expression on columns {t2.s1, t2.s2, ... t2.sN}, each
segmentation column t2.sX must be in the join column set {t2.jX}.
The segmentation expressions of p1 and p2 must be structurally equivalent. For example:





If p1 is SEGMENTED BY hash(t1.x) and p2 is SEGMENTED BY hash(t2.x), p1 and p2
are identically segmented.
If p1 is SEGMENTED BY hash(t1.x) and p2 is SEGMENTED BY hash(t2.x + 1), p1
and p2 are not identically segmented.
p1 and p2 must have the same segment count.
The assignment of segments to nodes must match. For example, if p1 and p2 use an OFFSET
clause, their offsets must match.
-401-
Programmer's Guide

If p1 and p2 are range segmented, the ranges must be identical.
If HP Vertica finds projections for t1 and t2 that are not identically segmented, the data is
redistributed across the network during query run time, as necessary.
Tip: If you are creating custom designs, try to use segmented projections for joins whenever
possible. See the following section "Designing Identically Segmented Projections for
K-Safety".
The following statements create two tables and specify ISP conditions:
CREATE TABLE t1 (id INT, x1 INT, y1 INT) SEGMENTED BY HASH(id) ALL NODES;
CREATE TABLE t2 (id INT, x2 INT, y2 INT) SEGMENTED BY HASH(id) ALL NODES;
Corresponding to this design, the following syntax shows ISP-supported join conditions:
SELECT
SELECT
SELECT
SELECT
*
*
*
*
FROM
FROM
FROM
FROM
t1
t1
t1
t1
JOIN
JOIN
JOIN
JOIN
t2
t2
t2
t2
ON
ON
ON
ON
t1.id
t1.id
t1.x1
t1.id
=
=
=
=
t2.id;
-- ISP
t2.id AND t1.x1 = t2.x2; -- ISP
t2.x2;
-- NOT ISP
t2.x2;
-- NOT ISP
Designing Identically Segmented Projections for K-Safety
For K-safety, if A and B are two identically segmented projections, their buddy projections, Abuddy
and Bbuddy, should also be identically segmented to each another.
The following syntax illustrates suboptimal buddy projection design because the projections are
not identically segmented to each other because their OFFSET values differ:
CREATE PROJECTION t1_b1 (id, x1, y1)
AS SELECT * FROM t1
SEGMENTED BY HASH(id)
ALL NODES OFFSET 1;
CREATE PROJECTION t2_b1 (id, x2, y2)
AS SELECT * FROM t2
SEGMENTED BY HASH(id)
ALL NODES OFFSET 2;
The following syntax is another example of suboptimal buddy projection design. The projections
are not identically segmented to each other because their segmentation expressions differ, so the
projections do not qualify as buddies:
CREATE PROJECTION t1_b2 (id, x1, y1)
AS SELECT * FROM t1
SEGMENTED BY HASH(id, x1)
ALL NODES OFFSET 1;
CREATE PROJECTION t2_b2 (id, x2, y2)
AS SELECT * FROM t2
SEGMENTED BY HASH(id)
ALL NODES OFFSET 2;
Buddy projections can use different sort orders. For details, see Hash Segmentation in the SQL
Reference Manual.
Notes
 HP Vertica recommends that you use Database Designer to create projections, which uses
HASH and ALL NODES syntax.
 HP Vertica recommends that all tables use hash segmentation or be replicated.
See Also
Partitioning and Segmenting Data
CREATE PROJECTION in the the SQL Reference Manual
-402-
Optimizing Query Per formance
-403-
404
Optimizing ORDER BY Queries
You can improve the performance of queries that contain only ORDER BY clauses if the columns in
a projection's ORDER BY clause are the same as the columns in the query.
Pre-sorting Projections to Optimize ORDER BY Clauses
If you define the projection sort order in the CREATE PROJECTION statement, the HP Vertica
query optimizer does not have to sort projection data before performing certain ORDER BY queries.
The following table, sortopt, contains the columns a, b, c, and d. Projection sortopt_p
specifies to order on columns a, b, and c.
CREATE TABLE sortopt (
a INT NOT NULL,
b INT NOT NULL,
c INT,
d INT
);
CREATE PROJECTION sortopt_p (
a_proj,
b_proj,
c_proj,
d_proj )
AS SELECT * FROM sortopt
ORDER BY a,b,c
UNSEGMENTED ALL NODES;
INSERT INTO sortopt VALUES(5,2,13,84);
INSERT INTO sortopt VALUES(14,22,8,115);
INSERT INTO sortopt VALUES(79,9,401,33);
Based on this sort order, if a SELECT * FROM sortopt query contains one of the following ORDER
BY clauses, the query does not have to resort the projection:



ORDER BY a
ORDER BY a, b
ORDER BY a, b, c
For example, HP Vertica does not have to resort the projection in the following query because the
sort order includes columns specified in the CREATE PROJECTION..ORDER BY a, b, c clause,
which mirrors the query's ORDER BY a, b, c clause:
=> SELECT * FROM sortopt ORDER BY a, b, c;
a | b | c | d
----+----+-----+----5 | 2 | 13 | 84
14 | 22 |
8 | 115
-404-
Optimizing Query Per formance
79 | 9 | 401 |
(3 rows)
33
If you include column d in the query, HP Vertica must re-sort the projection data because column
d was not defined in the CREATE PROJECTION..ORDER BY clause. Therefore, the ORDER BY d
query won't benefit from any sort optimization.
You cannot specify an ASC or DESC clause in the CREATE PROJECTION statement's ORDER
BY clause. HP Vertica always uses an ascending sort order in physical storage, so if your query
specifies descending order for any of its columns, the query still causes HP Vertica to re-sort the
projection data. For example, the following query requires HP Vertica to sort the results:
=> SELECT * FROM sortopt ORDER BY a DESC, b, c;
a | b | c | d
----+----+-----+----79 | 9 | 401 | 33
14 | 22 |
8 | 115
5 | 2 | 13 | 84
(3 rows)
See Also
CREATE PROJECTION in the SQL Reference Manual
-405-
406
Optimizing SQL-99 Analytic Functions
The following sections describe how to optimizing the SQL-99 analytic functions that HP Vertica
supports.
Avoiding Single-Node Execution by Avoiding Empty OVER() Clauses
The OVER() clause does not require a windowing clause. If your query uses an analytic function
like SUM(x) and you specify an empty OVER() clause, the analytic function is used as a reporting
function, where the entire input is treated as a single partition; the aggregate returns the same
aggregated value for each row of the result set. The query executes on a single node, potentially
resulting in poor performance.
If you add a PARTITION BY clause to the OVER() clause, the query executes on multiple nodes,
improving its performance.
NULL Placement by Analytic Functions
By default, projection column values are stored in ascending order, but the placement of NULLs
depends on a column's data type.
The analytic OVER(window_order_clause) and the SQL ORDER BY clause have slightly
different semantics:
OVER(ORDER BY ...)
(SQL) ORDER BY
The analytic window_order_clause sorts data, based on
the results of the analytic function, as either ascending
(ASC) or descending (DESC) and specifies where NULL
values appear in the sorted result as either NULLS FIRST
or NULLS LAST.
The SQL ORDER BY clause specifies only ascending or
descending order.
In HP Vertica, however, default NULL placement depends
on that column's data type:
The following is the analytics default sort order and NULL
placement:
 If you order ASC, the null placement is NULLS
LAST. NULL values appear at the end of the
sorted result.
 If you order DESC, the null placement is NULLS
FIRST. NULL values appear at the beginning of
the sorted result.


NUME RIC, INTEGE R, DA TE, TIME,
TIMES TAMP, and INTERVAL columns: NULLS
FIRST (NULL values appear at the beginning of a
sorted projection.)
FLOA T, STRING, and BOOLEA N columns:
NULLS LAST (NULL values appear at the end of a
sorted projection.)
If you do not care about NULL placement in queries that involve analytic computations, or if you
know that columns contain no NULL values, specify NULLS AUTO—irrespective of data type. HP
Vertica chooses the placement that gives the fastest performance. Otherwise, specify NULLS
FIRST or NULLS LAST.
-406-
Optimizing Query Per formance
You can carefully formulate queries so HP Vertica can avoid sorting the data and process the
query quicker, as illustrated by the following example. HP Vertica sorts inputs from table t on
column x, as specified in the OVER(ORDER BY) clause, and then evaluates RANK():
=> CREATE TABLE t (
x FLOAT,
y FLOAT );
=> CREATE PROJECTION t_p (x, y) AS SELECT * FROM t
ORDER BY x, y UNSEGMENTED ALL NODES;
=> SELECT x, RANK() OVER (ORDER BY x) FROM t;
In the preceding SELECT statement, HP Vertica eliminates the ORDER BY clause and runs the
query quickly because column x is a FLOAT data type. As a result, the projection sort order
matches the analytic default ordering (ASC + NULLS LAST). HP Vertica can also avoid having to
sort the data when the underlying projection is already sorted.
However, if column x is defined as INTEGER, HP Vertica must sort the data because the
projection sort order for INTEGER data types (ASC + NULLS FIRST) does not match the default
analytic ordering (ASC + NULLS LAST). To help HP Vertica eliminate the sort, specify the
placement of NULLs to match the default ordering:
=> SELECT x, RANK() OVER (ORDER BY x NULLS FIRST) FROM t;
If column x is a STRING, the following query eliminates the sort:
=> SELECT x, RANK() OVER (ORDER BY x NULLS LAST) FROM t;
If you omit NULLS LAST in the preceding query, it eliminates the sort because ASC + NULLS LAST
is the default sort specification for both the analytic ORDER BY clause and for string-related
columns in HP Vertica.
If you do not care about NULL placement in queries that involve analytic computations, or if you
know that columns contain no NULL values, specify NULLS AUTO. In the following query, HP
Vertica chooses the placement that gives the fastest performance:
=> SELECT x, RANK() OVER (ORDER BY x NULLS AUTO) FROM t;
See Also
 Designing Tables to Minimize Run-Time Sorting of NULL Values in Analytic Functions
(page 407)
 Using SQL Analytics (page 329)
Designing Tables to Minimize Run-Time Sorting of NULL Values in
Analytic Functions
By carefully writing queries or creating your design (or both), you can help the HP Vertica query
optimizer skip sorting all columns in a table when performing an analytic function, which can
improve query performance.
-407-
Programmer's Guide
To minimize HP Vertica's need to sort projections during query execution, redefine the employee
table and specify that NULL values are not allowed in the sort fields:
DROP TABLE employee CASCADE;
CREATE TABLE employee
(empno INT,
deptno INT NOT NULL,
sal INT NOT NULL);
CREATE PROJECTION employee_p AS
SELECT * FROM employee
ORDER BY deptno, sal;
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INTO
INTO
INTO
INTO
INTO
INTO
employee
employee
employee
employee
employee
employee
VALUES(101,10,50000);
VALUES(103,10,43000);
VALUES(104,10,45000);
VALUES(105,20,97000);
VALUES(108,20,33000);
VALUES(109,20,51000);
=> SELECT * FROM employee;
empno | deptno | sal
-------+--------+------101 |
10 | 50000
103 |
10 | 43000
104 |
10 | 45000
105 |
20 | 97000
108 |
20 | 33000
109 |
20 | 51000
(6 rows)
=> SELECT deptno, sal, empno, RANK() OVER
(PARTITION BY deptno ORDER BY sal)
FROM employee;
deptno | sal | empno | ?column?
--------+-------+-------+---------10 | 43000 |
103 |
1
10 | 45000 |
104 |
2
10 | 50000 |
101 |
3
20 | 33000 |
108 |
1
20 | 51000 |
109 |
2
20 | 97000 |
105 |
3
(6 rows)
Tip: If you do not care about NULL placement in queries that involve analytic computations, or
if you know that columns contain no NULL values, specify NULLS AUTO in your queries. HP
Vertica attempts to choose the placement that gives the fastest performance. Otherwise,
specify NULLS FIRST or NULLS LAST.
-408-
409
Optimizing LIMIT Queries with ROW_NUMBER Predicates
Queries that use the LIMIT Clause with ORDER BY or the SQL-99 analytic function
ROW_NUMBER() return a specific subset of rows in the query result. HP Vertica processes these
queries efficiently using Top-K Optimization, which is a database query ranking process. Top-K
optimization avoids sorting (and potentially writing to disk) an entire data set to find a small number
of rows. This can significantly improve query performance.
For example, in the following query, HP Vertica extracts only the three smallest rows from column
x:
=> SELECT * FROM t1 ORDER BY x LIMIT 3;
If table t1 contains millions of rows, it is time consuming to sort all the x values. Instead, HP
Vertica keeps track of the smallest three values in x.
Note: If you omit the ORDER BY clause, when using the LIMIT clause, the results can be
nondeterministic.
Sort operations that precede a SQL analytics (page 329) computation benefit from Top-K
optimization if the query contains an OVER(ORDER BY) clause and a predicate on the
ROW_NUMBER function, as in the following example:
=> SELECT x FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY x) AS row
FROM t1) t2 WHERE row <= 3;
The preceding query has the same behavior as the following query, which uses a LIMIT clause:
=> SELECT ROW_NUMBER() OVER (ORDER BY x) AS RANK FROM t1 LIMIT 3;
You can use ROW_NUMBER() with the analytic window_partition_clause, something you cannot do
if you use LIMIT:
=> SELECT x, y FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y)
AS row FROM t1) t2 WHERE row <= 3;
Note: When the OVER() clause includes the window_partition_clause, Top-K optimization
occurs only when the analytic sort matches the input's sort, for example, if the projection is
sorted on columns x and y in table t1.
If you still want to improve the performance of your query, consider using the optimization
techniques described in Optimizing ORDER BY Queries (page 404).
-409-
410
Optimizing INSERT-SELECT Operations
There are several ways to optimize an INSERT-SELECT query that has the following format:
INSERT /*+direct*/ INTO destination SELECT * FROM source;
Optimizing INSERT-SELECT Queries for Tables with Pre-join
Projections
If you have an INSERT-SELECT query where the SELECT clause includes a join, HP Vertica
determines the order for the SELECT part using the rules defined in Hash Joins vs. Merge Joins
(page 399). When inserting into a pre-join projection, a join must be performed during the
INSERT-SELECT query. If the incoming data is not already sorted correctly for a merge join, add
an ORDER BY clause that matches the sort order of the dimension table's projection to the SELECT
clause to facilitate the merge join.
To determine whether your query is using a hash join or a merge join, run the EXPLAIN statement
on the query.
The following example generates a hash join instead of a merge join for a FK-PK validation when
inserting into a pre-join projection:
-- Would like to use a MERGE JOIN for FK-PK validation, but getting a HASH JOIN
DROP TABLE f1 CASCADE;
DROP TABLE d1 CASCADE;
DROP TABLE f1_staging CASCADE;
CREATE TABLE f1(a varchar(10) NOT NULL, b varchar(10) NOT NULL);
CREATE TABLE d1(a varchar(10) NOT NULL, b varchar(10) NOT NULL);
CREATE TABLE f1_staging(a varchar(10) NOT NULL, b varchar(10) NOT NULL);
ALTER TABLE d1 ADD CONSTRAINT d1_pk PRIMARY KEY (a, b);
ALTER TABLE f1 ADD CONSTRAINT f1_fk FOREIGN KEY (a, b) references d1 (a, b);
CREATE PROJECTION f1_super(a, b) AS SELECT * FROM f1 ORDER BY a, b;
CREATE PROJECTION d1_super(a, b) AS SELECT * FROM d1 ORDER BY a, b;
CREATE PROJECTION f1_staging_super(a, b) AS SELECT * FROM f1_staging ORDER BY a,
b;
CREATE PROJECTION prejoin(f1_a, f1_b, d1_a, d1_b)
AS SELECT f1.a, f1.b, d1.a, d1.b
FROM f1 join d1 on f1.a=d1.a and f1.b=d1.b
ORDER BY d1.a, d1.b;
COPY d1 FROM stdin delimiter ' ' direct;
one one
two two
-410-
Optimizing Query Per formance
\.
COPY f1 FROM stdin delimiter ' ' direct;
one one
two two
\.
INSERT INTO f1_staging values('one', 'one');
-- Performing HASH JOIN instead of MERGE JOIN
INSERT INTO f1
SELECT f1s.a, f1s.b
FROM f1_staging f1s join d1
on f1s.a=d1.a and f1s.b=d1.b;
-- Adding an ORDER BY clause to force a MERGE JOIN
INSERT INTO f1
SELECT f1s.a, f1s.b
FROM f1_staging f1s join d1
on f1s.a=d1.a and f1s.b=d1.b ORDER BY f1s.a, f1s.b;
Optimizing INSERT-SELECT Queries by Matching Sort Orders
When performing INSERT-SELECT operations, to avoid the sort phase of the INSERT, make sure
that the sort order for the SELECT query matches the projection sort order of the target table.
For example, on a single-node database:
=> CREATE TABLE source (col1 INT, col2 INT, col3 INT);
=> CREATE PROJECTION source_p (col1, col2, col3)
AS SELECT col1, col2, col3 FROM source
ORDER BY col1, col2, col3
SEGMENTED BY HASH(col3)
ALL NODES;
=> CREATE TABLE destination (col1 INT, col2 INT, col3 INT);
=> CREATE PROJECTION destination_p (col1, col2, col3)
AS SELECT col1, col2, col3 FROM destination
ORDER BY col1, col2, col3
SEGMENTED BY HASH(col3)
ALL NODES;
The following INSERT does not require a sort because the query result has the column order of
the projection:
=> INSERT /*+direct*/ INTO destination SELECT * FROM source;
The following INSERT requires a sort because the order of the columns in the SELECT statement
does not match the projection order:
=> INSERT /*+direct*/ INTO destination SELECT col1, col3, col2 FROM source;
The following INSERT does not require a sort. The order of the columns doesn't match, but the
explicit ORDER BY causes the output to be sorted by c1, c3, c2 in HP Vertica:
-411-
Programmer's Guide
=> INSERT /*+direct*/ INTO destination SELECT col1, col3, col2 FROM source
GROUP BY col1, col3, col2
ORDER BY col1, col2, col3 ;
Avoiding Resegmentation of INSERT -SELECT Queries
When performing an INSERT-SELECT operation from a segmented source table to a segmented
destination table, segment both projections on the same column to avoid resegmenting the data,
as in the following example:
CREATE TABLE source (col1 INT, col2 INT, col3 INT);
CREATE PROJECTION source_p (col1, col2, col3) AS
SELECT col1, col2, col3 FROM source
SEGMENTED BY HASH(col3) ALL NODES;
CREATE TABLE destination (col1 INT, col2 INT, col3 INT);
CREATE PROJECTION destination_p (col1, col2, col3) AS
SELECT col1, col2, col3 FROM destination
SEGMENTED BY HASH(col3) ALL NODES;
INSERT /*+direct*/ INTO destination SELECT * FROM source;
-412-
413
Optimizing DELETE and UPDATE Queries
HP Vertica is optimized for query-intensive workloads, so DELETE and UPDATE queries might not
achieve the same level of performance as other queries. A DELETE and UPDATE operation has to
update all projections, so the operation is as slow as the slowest projection. For additional
information, see Using INSERT, UPDATE, and DELETE.
The topics that follow discuss best practices for optimizing DELETE and UPDATE queries in HP
Vertica.
Performance Considerations for DELETE and UPDATE Queries
To improve the performance of your DELETE and UPDATE queries, consider the following issues:




Query performance after large deletes—A large number of (unpurged) deleted rows can
negatively affect query performance.
To eliminate rows that have been deleted from the result, a query must do extra processing. If
10% or more of the total rows in a table have been deleted, the performance of a query on the
table degrades. However, your experience may vary depending on the size of the table, the
table definition, and the query. If a table has a large number of deleted rows, consider purging
those rows to improve performance. For more information on purging, see Purging Deleted
Data.
Recovery performance—Recovery is the action required for a cluster to restore K-safety
after a crash. Large numbers of deleted records can degrade the performance of a recovery.
To improve recovery performance, purge the deleted rows. For more information on purging,
see Purging Deleted Data.
Concurrency—DELETE and UPDATE take exclusive locks on the table. Only one DELETE or
UPDATE transaction on a table can be in progress at a time and only when no loads (or
INSERTs) are in progress. DELETEs and UPDATEs on different tables can be run concurrently.
Pre-join projections—Avoid pre-joining dimension tables that are frequently updated.
DELETE and UPDATE operations on pre-join projections cascade to the fact table, causing
large DELETE or UPDATE operations.
For detailed tips about improving DELETE and UPDATE performance, see Optimizing Deletes
and Updates for Performance (page 414).
Caution: HP Vertica does not remove deleted data immediately but keeps it as history for the
purposes of historical query. A large amount of history can result in slower query performance. For
information about how to configure the appropriate amount of history to retain, see Purging
Deleted Data.
-413-
Programmer's Guide
Optimizing DELETEs and UPDATEs for Performance
The process of optimizing DELETE and UPDATE queries is the same for both operations. Some
simple steps can increase the query performance by tens to hundreds of times. The following
sections describe several ways to improve projection design and improve DELETE and UPDATE
queries to significantly increase DELETE and UPDATE performance.
Note: For large bulk deletion, HP Vertica recommends using Partitioned Tables where possible
because they provide the best DELETE performance and improve query performance.
Projection Column Requirements for Optimized Deletes
When all columns required by the DELETE or UPDATE predicate are present in a projection, the
projection is optimized for DELETEs and UPDATEs. DELETE and UPDATE operations on such
projections are significantly faster than on non-optimized projections. Both simple and pre-join
projections can be optimized.
For example, consider the following table and projections:
CREATE TABLE t (a INTEGER, b INTEGER, c INTEGER);
CREATE PROJECTION p1 (a, b, c) AS SELECT * FROM t ORDER BY a;
CREATE PROJECTION p2 (a, c) AS SELECT a, c FROM t ORDER BY c, a;
In the following query, both p1 and p2 are eligible for DELETE and UPDATE optimization because
column a is available:
DELETE from t WHERE a = 1;
In the following example, only projection p1 is eligible for DELETE and UPDATE optimization
because the b column is not available in p2:
DELETE from t WHERE b = 1;
Optimized Deletes in Subqueries
To be eligible for DELETE optimization, all target table columns referenced in a DELETE or UPDATE
statement's WHERE clause must be in the projection definition.
For example, the following simple schema has two tables and three projections:
CREATE TABLE tb1 (a INT, b INT, c INT, d INT);
CREATE TABLE tb2 (g INT, h INT, i INT, j INT);
The first projection references all columns in tb1 and sorts on column a:
CREATE PROJECTION tb1_p AS SELECT a, b, c, d FROM tb1 ORDER BY a;
The buddy projection references and sorts on column a in tb1:
CREATE PROJECTION tb1_p_2 AS SELECT a FROM tb1 ORDER BY a;
This projection references all columns in tb2 and sorts on column i:
CREATE PROJECTION tb2_p AS SELECT g, h, i, j FROM tb2 ORDER BY i;
-414-
Optimizing Query Per formance
Consider the following DML statement, which references tb1.a in its WHERE clause. Since both
projections on tb1 contain column a, both are eligible for the optimized DELETE:
DELETE FROM tb1 WHERE tb1.a IN (SELECT tb2.i FROM tb2);
Restrictions
Optimized DELETEs are not supported under the following conditions:



With pre-join projections on nodes that are down
With replicated and pre-join projections if subqueries reference the target table. For example,
the following syntax is not supported:
DELETE FROM tb1 WHERE tb1.a IN (SELECT e FROM tb2, tb2 WHERE tb2.e =
tb1.e);
With subqueries that do not return multiple rows. For example, the following syntax is not
supported:
DELETE FROM tb1 WHERE tb1.a = (SELECT k from tb2);
Projection Sort Order for Optimizing Deletes
Design your projections so that frequently-used DELETE or UPDATE predicate columns appear in
the sort order of all projections for large DELETEs and UPDATEs.
For example, suppose most of the DELETE queries you perform on a projection look like the
following:
DELETE from t where time_key < '1-1-2007'
To optimize the DELETEs, make time_key appear in the ORDER BY clause of all your projections.
This schema design results in better performance of the DELETE operation.
In addition, add additional sort columns to the sort order such that each combination of the sort key
values uniquely identifies a row or a small set of rows. For more information, see Choosing Sort
Orders for Low Cardinality Predicates. To analyze projections for sort order issues, use the
EVALUATE_DELETE_PERFORMANCE function.
-415-
Using External Procedures
An external procedure is a procedure external to HP Vertica that you create, maintain, and store
on the server. External procedures are simply executable files such as shell scripts, compiled
code, code interpreters, and so on.
-416-
417
Implementing External Procedures
To implement an external procedure:
1 Create an external procedure executable file.
See Requirements for External Procedures (page 418).
2 Enable the UID attribute for the file and allow read and execute permission for the group (if the
owner is not the database administrator). For example:
chmod 4777 helloplanet.sh
3 Install the external procedure executable file (page 419).
4 Create the external procedure in HP Vertica (page 420).
Once a procedure is created in HP Vertica, you can execute (page 421) or drop (page 422) it, but
you cannot alter it.
-417-
418
Requirements for External Procedures
External procedures have requirements regarding their attributes, where you store them, and how
you handle their output. You should also be cognizant of their resource usage.
Procedure File Attributes
A procedure file must be owned by the database administrator (OS account) or by a user in the
same group as the administrator. The procedure file owner cannot be root and must have the set
UID attribute enabled and allow read and execute permission for the group if the owner is not the
database administrator.
Note: The file should end with exit 0, and exit 0 must reside on its own line. This naming
convention instructs HP Vertica to return 0 when the script succeeds.
Handling Procedure Output
HP Vertica does not provide a facility for handling procedure output. Therefore, you must make
your own arrangements for handling procedure output, which should include writing error, logging,
and program information directly to files that you manage.
Handling Resource Usage
The HP Vertica resource manager is unaware of resources used by external procedures.
Additionally, HP Vertica is intended to be the only major process running on your system. If your
external procedure is resource intensive, it could affect the performance and stability of HP
Vertica. Consider the types of external procedures you create and when you run them. For
example, you might run a resource-intensive procedure during off hours.
Sample Procedure File
#!/bin/bash
echo "hello planet argument: $1" >> /tmp/myprocedure.log
exit 0
-418-
419
Installing External Procedure Executable Files
To install an external procedure, use the Administration Tools from either the graphical user
interface or the command line.
Graphical User Interface
1 Run the Administration Tools.
$ /opt/vertica/bin/adminTools
2 On the AdminTools Main Menu, click Configuration Menu, and then click OK.
3 On the Configuration Menu, click Install External Procedure and then click OK.
4 Select the database on which you want to install the external procedure.
5 Either select the file to install or manually type the complete file path, and then click OK.
6 If you are not the superuser, you are prompted to enter your password and click OK.
The Administration Tools automatically create the
<database_catalog_path>/procedures directory on each node in the database and
installs the external procedure in these directories for you.
7 Click OK in the dialog that indicates that the installation was successful.
Command Line
If you use the command line, be sure to specify the full path to the procedure file and the password
of the Linux user who owns the procedure file;
for example:
$ admintools -t install_procedure -d vmartdb -f /scratch/helloworld.sh -p
ownerpassword
Installing external procedure...
External procedure installed
Once you have installed an external procedure, you need to make HP Vertica aware of it. To do
so, use the CREATE PROCEDURE statement, but review Creating External Procedures (page
420) first.
-419-
420
Creating External Procedures
Once you have installed an external procedure, you need to make HP Vertica aware of it. To do
so, use the CREATE PROCEDURE statement.
By default, only a superuser can create and execute a procedure. However, a superuser can grant
the right to execute a stored procedure to a user on the operating system. (See GRANT
(Procedure).)
Once created, a procedure is listed in the V_CATALOG.USER_PROCEDURES system table. Users
can see only those procedures that they have been granted the privilege to execute.
Example
This example creates a procedure named helloplanet for the helloplanet.sh external
procedure file. This file accepts one VARCHAR argument. The sample code is provided in
Requirements for External Procedures (page 418).
=> CREATE PROCEDURE helloplanet(arg1 VARCHAR) AS 'helloplanet.sh' LANGUAGE
'external'
USER 'release';
This example creates a procedure named proctest for the copy_vertica_database.sh
script. This script copies a database from one cluster to another, and it is included in the server
RPM located in the /opt/vertica/scripts directory.
=> CREATE PROCEDURE proctest(shosts VARCHAR, thosts VARCHAR, dbdir VARCHAR)
AS 'copy_vertica_database.sh' LANGUAGE 'external' USER 'release';
See Also
CREATE PROCEDURE and GRANT (Procedure) in the SQL Reference Manual
-420-
421
Executing External Procedures
Once you define a procedure through the CREATE PROCEDURE statement, you can use it as a
meta command through a simple SELECT statement. HP Vertica does not support using
procedures in more complex statements or in expressions.
The following example runs a procedure named helloplanet:
=> SELECT helloplanet('earthlings');
helloplanet
------------0
(1 row)
The following example runs a procedure named proctest. This procedure references the
copy_vertica_database.sh script that copies a database from one cluster to another. It is
installed by the server RPM in the /opt/vertica/scripts directory.
=> SELECT proctest(
'-s qa01',
'-t rbench1',
'-D /scratch_b/qa/PROC_TEST' );
Note: External procedures have no direct access to database data. If available, use ODBC or
JDBC for this purpose.
Procedures are executed on the initiating node. HP Vertica runs the procedure by forking and
executing the program. Each procedure argument is passed to the executable file as a string. The
parent fork process waits until the child process ends.
To stop execution, cancel the process by sending a cancel command (for example, CTRL+C)
through the client. If the procedure program exits with an error, an error message with the exit
status is returned.
Permissions
To execute an external procedure, the user needs:


EXECUTE privilege on procedure
USAGE privilege on schema that contains the procedure
See Also
CREATE PROCEDURE in the SQL Reference Manual
Procedure Privileges in the Administrator's Guide
-421-
422
Dropping External Procedures
Only a superuser can drop an external procedure. To drop the definition for an external
procedure from HP Vertica, use the DROP PROCEDURE statement. Only the reference to the
procedure is removed. The external file remains in the
<database_catalog_path>/procedures directory on each node in the database.
Note: The definition HP Vertica uses for a procedure cannot be altered; it can only be dropped.
Example
=> DROP PROCEDURE helloplanet(arg1 varchar);
See Also
DROP PROCEDURE in the SQL Reference Manual
-422-
Using User-Defined SQL Functions
User-Defined SQL Functions let you define and store commonly-used SQL expressions as a
function. User-Defined SQL Functions are useful for executing complex queries and combining
HP Vertica built-in functions. You simply call the function name you assigned in your query.
A User-Defined SQL Function can be used anywhere in a query where an ordinary SQL
expression can be used, except in the table partition clause or the projection segmentation clause.
For syntax and parameters for the commands and system table discussed in this section, see the
following topics in the SQL Reference Manual:






CREATE FUNCTION
ALTER FUNCTION
DROP FUNCTION
GRANT (User Defined Extension)
REVOKE (User Defined Extension)
V_CATALOG.USER_FUNCTIONS
Creating User-Defined SQL Functions
A user-defined SQL function can be used anywhere in a query where an ordinary SQL expression
can be used, except in the table partition clause or the projection segmentation clause.
To create a SQL function, the user must have CREATE privileges on the schema. To use a SQL
function, the user must have USAGE privileges on the schema and EXECUTE privileges on the
defined function.
This following statement creates a SQL function called myzeroifnull that accepts an INTEGER
argument and returns an INTEGER result.
=> CREATE FUNCTION myzeroifnull(x INT) RETURN INT
AS BEGIN
RETURN (CASE WHEN (x IS NOT NULL) THEN x ELSE 0 END);
END;
You can use the new SQL function (myzeroifnull) anywhere you use an ordinary SQL
expression. For example, create a simple table:
=>
=>
=>
=>
CREATE
INSERT
INSERT
INSERT
TABLE tabwnulls(col1 INT);
INTO tabwnulls VALUES(1);
INTO tabwnulls VALUES(NULL);
INTO tabwnulls VALUES(0);
=> SELECT * FROM tabwnulls;
a
--1
-423-
Programmer's Guide
0
(3 rows)
Use the myzeroifnull function in a SELECT statement, where the function calls col1 from
table tabwnulls:
=> SELECT myzeroifnull(col1) FROM tabwnulls;
myzeroifnull
-------------1
0
0
(3 rows)
Use the myzeroifnull function in the GROUP BY clause:
=> SELECT COUNT(*) FROM tabwnulls GROUP BY myzeroifnull(col1);
count
------2
1
(2 rows)
If you want to change a user-defined SQL function's body, use the CREATE OR REPLACE syntax.
The following command modifies the CASE expression:
=> CREATE OR REPLACE FUNCTION myzeroifnull(x INT) RETURN INT
AS BEGIN
RETURN (CASE WHEN (x IS NULL) THEN 0 ELSE x END);
END;
To see how this information is stored in the HP Vertica catalog, see Viewing Information About
SQL Functions (page 426) in this guide.
See Also
CREATE FUNCTION and USER_FUNCTIONS in the SQL Reference Manual
Altering and Dropping User-Defined SQL Functions
HP Vertica allows multiple functions to share the same name with different argument types.
Therefore, if you try to alter or drop a SQL function without specifying the argument data type, the
system returns an error message to prevent you from dropping the wrong function:
=> DROP FUNCTION myzeroifnull();
ROLLBACK: Function with specified name and parameters does not exist:
myzeroifnull
Note: Only a superuser or owner can alter or drop a SQL Function.
Altering a user-defined SQL Function
The ALTER FUNCTION command lets you assign a new name to a user-defined function, as well
as move it to a different schema.
-424-
Using User-Defined SQL Functions
In the previous topic, you created a SQL function called myzeroifnull. The following command
renames the myzeroifnull function to zerowhennull:
=> ALTER FUNCTION myzeroifnull(x INT) RENAME TO zerowhennull;
ALTER FUNCTION
This next command moves the renamed function into a new schema called macros:
=> ALTER FUNCTION zerowhennull(x INT) SET SCHEMA macros;
ALTER FUNCTION
Dropping a SQL Function
The DROP FUNCTION command drops a SQL function from the HP Vertica catalog.
Like with ALTER FUNCTION, you must specify the argument data type or the system returns the
following error message:
=> DROP FUNCTION zerowhennull();
ROLLBACK: Function with specified name and parameters does not exist:
zerowhennull
Specify the argument type:
=> DROP FUNCTION macros.zerowhennull(x INT);
DROP FUNCTION
HP Vertica does not check for dependencies, so if you drop a SQL function where other objects
reference it (such as views or other SQL Functions), HP Vertica returns an error when those
objects are used, not when the function is dropped.
Tip: To view a list of all user-defined SQL functions on which you have EXECUTE privileges,
(which also returns their argument types), query the V_CATALOG.USER_FUNCTIONS
system table.
See Also
ALTER FUNCTION and DROP FUNCTION in the SQL Reference Manual
Managing Access to SQL Functions
Before a user can execute a user-defined SQL function, he or she must have USAGE privileges
on the schema and EXECUTE privileges on the defined function. Only the superuser or owner can
grant/revoke EXECUTE usage on a function.
To grant EXECUTE privileges to user Fred on the myzeroifnull function:
=> GRANT EXECUTE ON FUNCTION myzeroifnull (x INT) TO Fred;
To revoke EXECUTE privileges from user Fred on the myzeroifnull function:
=> REVOKE EXECUTE ON FUNCTION myzeroifnull (x INT) FROM Fred;
-425-
Programmer's Guide
See Also
GRANT (User Defined Extensions) and REVOKE (User Defined Extensionsn) in the SQL
Reference Manual
Viewing Information About User-Defined SQL Functions
You can access information about any User-Defined SQL Functions on which you have
EXECUTE privileges. This information is available in the system table
V_CATALOG.USER_FUNCTIONS and from the vsql meta-command \df.
To view all of the User-Defined SQL Functions on which you have EXECUTE privileges, query the
USER_FUNCTIONS table:
=> SELECT * FROM USER_FUNCTIONS;
-[ RECORD 1 ]----------+--------------------------------------------------schema_name
| public
function_name
| myzeroifnull
function_return_type
| Integer
function_argument_type | x Integer
function_definition
| RETURN CASE WHEN (x IS NOT NULL) THEN x ELSE 0 END
volatility
| immutable
is_strict
| f
If you want to change a User-Defined SQL Function's body, use the CREATE OR REPLACE
syntax. The following command modifies the CASE expression:
=> CREATE OR REPLACE FUNCTION myzeroifnull(x INT) RETURN INT
AS BEGIN
RETURN (CASE WHEN (x IS NULL) THEN 0 ELSE x END);
END;
Now when you query the USER_FUNCTIONS table, you can see the changes in the
function_definition column:
=> SELECT * FROM USER_FUNCTIONS;
-[ RECORD 1 ]----------+--------------------------------------------------schema_name
| public
function_name
| myzeroifnull
function_return_type
| Integer
function_argument_type | x Integer
function_definition
| RETURN CASE WHEN (x IS NULL) THEN 0 ELSE x END
volatility
| immutable
is_strict
| f
If you use CREATE OR REPLACE syntax to change only the argument name or argument type (or
both), the system maintains both versions of the function. For example, the following command
tells the function to accept and return a numeric data type instead of an integer for the
myzeroifnull function:
=> CREATE OR REPLACE FUNCTION myzeroifnull(z NUMERIC) RETURN NUMERIC
AS BEGIN
RETURN (CASE WHEN (z IS NULL) THEN 0 ELSE z END);
-426-
Using User-Defined SQL Functions
END;
Now query the USER_FUNCTIONS table, and you can see the second instance of
myzeroifnull in Record 2, as well as the changes in the function_return_type,
function_argument_type, and function_definition columns.
Note: Record 1 still holds the original definition for the myzeroifnull function:
=> SELECT * FROM USER_FUNCTIONS;
-[ RECORD 1
]----------+-----------------------------------------------------------schema_name
| public
function_name
| myzeroifnull
function_return_type
| Integer
function_argument_type | x Integer
function_definition
| RETURN CASE WHEN (x IS NULL) THEN 0 ELSE x END
volatility
| immutable
is_strict
| f
-[ RECORD 2
]----------+-----------------------------------------------------------schema_name
| public
function_name
| myzeroifnull
function_return_type
| Numeric
function_argument_type | z Numeric
function_definition
| RETURN (CASE WHEN (z IS NULL) THEN (0) ELSE z
END)::numeric
volatility
| immutable
is_strict
| f
Because HP Vertica allows functions to share the same name with different argument types, you
must specify the argument type when you alter or drop a function. If you do not, the system
returns an error message:
=> DROP FUNCTION myzeroifnull();
ROLLBACK: Function with specified name and parameters does not exist:
myzeroifnull
See Also
USER_FUNCTIONS in the SQL Reference Manual
Migrating Built-in SQL Functions
If you have built-in SQL functions from another RDBMS that do not map to an HP
Vertica-supported function, you can migrate them into your HP Vertica database by using a
user-defined SQL function.
The example scripts below show how to create user-defined functions for the following DB2
built-in functions:



UCASE()
LCASE()
LOCATE()
-427-
Programmer's Guide

POSSTR()
UCASE()
This script creates a user-defined SQL function for the UCASE() function:
=> CREATE OR REPLACE FUNCTION UCASE (x VARCHAR)
RETURN VARCHAR
AS BEGIN
RETURN UPPER(x);
END;
LCASE()
This script creates a user-defined SQL function for the LCASE() function:
=> CREATE OR REPLACE FUNCTION LCASE (x VARCHAR)
RETURN VARCHAR
AS BEGIN
RETURN LOWER(x);
END;
LOCATE()
This script creates a user-defined SQL function for the LOCATE() function:
=> CREATE OR REPLACE FUNCTION LOCATE(a VARCHAR, b VARCHAR)
RETURN INT
AS BEGIN
RETURN POSITION(a IN b);
END;
POSSTR()
This script creates a user-defined SQL function for the POSSTR() function:
=> CREATE OR REPLACE FUNCTION POSSTR(a VARCHAR, b VARCHAR)
RETURN INT
AS BEGIN
RETURN POSITION(b IN a);
END;
-428-
Developing and Using User Defined Functions
User-Defined Functions (UDFs) are functions contained in external shared libraries that you
develop in C++, R, or other supported languages, and load into HP Vertica using the CREATE
LIBRARY statement. They are best suited for analytic operations that are difficult to perform in
SQL, and need to be performed frequently enough that their speed is a major concern.
UDF's primary strengths are:

They can run almost as fast as HP Vertica's own functions when loaded as a library directly by
the HP Vertica server process.
 They can be used much more flexibly than external procedures within SQL statements.
Generally, they can be used anywhere an internal function can be used.
 They take full advantage of HP Vertica's distributed computing features. Functions are
executed in parallel on each node in the cluster.
 HP Vertica handles the distribution of the UDF library to the individual nodes. You only need to
copy the library to the initiator node.
 All of the complicated aspects of developing a distributed piece of analytic code are handled
for you by HP Vertica. Your main programming task is to read in data, process it, and then
write it out using the HP Vertica SDK APIs.
The only major drawback to UDFs is that when run in "unfenced mode" (directly within the HP
Vertica process), poorly-coded UDF's can cause the database to become unstable. Other
methods of extending HP Vertica do not carry this risk. External procedures run in a separate
process, and are unlikely to cause database stability issues. SQL macros use HP Vertica's own
statements and functions and are likewise unlikely to cause stability issues. To avoid causing
instability and data loss, you should test your UDF code extensively before deploying it to a
production environment.
UDFs also support a limited number of programming languages (C++ and R). External procedures
support any programming language that is available on your server hosts.
Since they run on the HP Vertica cluster, you should not use UDF's for highly memory or
CPU-intensive tasks. UDFs that do consume lots of computing resources can result in poor
database performance. They also should not perform potentially blocking operations (such as
network communications), since they would cause the HP Vertica process to block as well.
This chapter explains how to create and use user-defined functions (UDFs).
How UDFs Work
User Defined Functions are contained in libraries. Multiple functions can be defined in a library,
and multiple libraries can be loaded by HP Vertica. You load a library by:
1 Copying the library .so file to a location on the initiator node.
2 Connecting to the initiator node using vsql.
3 Using the CREATE LIBRARY statement, passing it the path where you saved the library file.
-429-
Programmer's Guide
The initiator node takes care of distributing the library file to the rest of the nodes in the cluster.
Once the library is loaded, you define individual UDFs using the CREATE FUNCTION SQL
statement. This statement assigns SQL function names to the UDF classes in the library. From
then on, you are able to use your function within your SQL statements. Whenever you call a UDF,
HP Vertica creates an instance of the UDF class on each node in the cluster and passes it data to
process.
The CREATE FUNCTION statement adds the UDF to the database catalog. They remain
available after a database restart. The database superuser can grant access privileges to the
UDFs for users. See GRANT (User Defined Extension) in the SQL Reference Manual for details.
User Defined Extensions and User Defined Functions
User Defined Extension (UDx) refers to all extensions to HP Vertica developed using the APIs in
the HP Vertica SDK. UDxs encompass functions such as User Defined Scalar Functions
(UDSFs), and utilities such as the User Defined Load (UDL) feature that let you create custom
data load routines.
Thanks to their tight integration with HP Vertica, UDxs usually have better performance than
User-defined SQL functions or External Procedures.
User Defined Functions (UDFs) are a specific type of UDx. You use them in SQL statements to
process data similarly to HP Vertica's own built-in functions. They give you the power of creating
your own functions that run just slightly slower than HP Vertica's own function.
The HP Vertica SDK uses the term UDx extensively, even for APIs that deal exclusively with
developing UDFs.
Types of UDFs
There are five different types of user defined functions:




User Defined Scalar Functions (UDSFs) take in a single row of data and return a single value.
These functions can be used anywhere a native HP Vertica function can be used, except
CREATE TABLE BY PARTITION and SEGMENTED BY expressions.
User Defined Transform Functions (UDTFs) operate on table segments and return zero or
more rows of data. The data they return can be an entirely new table, unrelated to the schema
of the input table, including having its own ordering and segmentation expressions. They can
only be used in the SELECT list of a query. For details see Using User Defined Transforms
(page 456).
User Defined Aggregate Functions (UDAF) allow you to create custom aggregate functions
specific to your needs. They read one column of data, and return one output column.
User Defined Analytic Functions (UDAnF) are similar to UDSFs, in that they read a row of data
and return a single row. However, the function can read input rows independently of outputting
rows, so that the output values can be calculated over several input rows.
-430-
Developing and Using User Defined Functions

The User Defined Load (UDL) feature allows you to create custom routines to load your data
into HP Vertica. You create custom libraries using the HP Vertica SDK to handle various steps
in the loading process.
There are many similarities in developing the different types of functions. They can even coexist in
the same library. The main difference is the base class you use for your UDF (see Developing a
UDF (page 434) for details).
Fenced Mode
User Defined Functions, Transforms, Analytics, Aggregates, etc., (collectively referred to as UDx)
can cause issues with HP Vertica if they are not correctly developed. Poorly designed UDx code
can result in memory leaks and ultimately bring an HP Vertica node down.
Fenced mode runs UDx code outside of the main HP Vertica process in a separate zygote
process. UDx code that crashes while running in fenced mode does not impact the core HP
Vertica process. There is a small performance impact when running UDx code in fenced mode. On
average, using fenced mode adds about 10% more time to execution compared to unfenced
mode.
Fenced mode is currently available for all UDx's with the exception of User Defined Aggregates
and User Defined Load.
Using fenced mode does not affect the development of your UDx. Fenced mode is enabled by
default for UDx's that support fenced mode. The CREATE FUNCTION command can optionally be
issued with the NOT FENCED modifier to disable fenced mode for the function. Also, you can
enable or disable fenced mode on any fenced mode-supported C++ UDx by using the ALTER
FUNCTION command. Note: UDx's created in the R language always run in fenced mode.
About the Zygote process:
The HP Vertica zygote process starts when HP Vertica starts. Each node has a single zygote
process. Side processes are created "on demand". The zygote listens for requests and spawns a
UDx side session that runs the UDx in fenced mode when a UDx is called by the user.
About Fenced Mode Logging:
UDx code that runs in fenced mode is logged in the UDxZygote.log and is stored in the
UDxLogs directory in the catalog directory of HP Vertica. Log entries for the side process are
denoted by the UDx language (for example, C++), node, and zygote process ID, and the
UdxSideProcess ID.
For example, for the following processes...
-431-
Programmer's Guide
dbadmin => select * from UDX_FENCED_PROCESSES;
node_name
|
process_type
|
session_id
| pid
| port | status
------------------+------------------+----------------------------------+------+-------+-------v_vmart_node0001 | UDxZygoteProcess |
| 27468
| 51900 | UP
v_vmart_node0001 | UDxSideProcess
| localhost.localdoma-27465:0x800b | 5677
| 44123 | UP
... the corresponding log file displays:
2012-05-16 11:24:43.990 [C++-localhost.localdoma-27465:0x800b-5677]
0x2b3ff17e7fd0 UDx side process started
11:24:43.996 [C++-localhost.localdoma-27465:0x800b-5677] 0x2b3ff17e7fd0
Finished setting up signal handlers.
11:24:43.996 [C++-localhost.localdoma-27465:0x800b-5677] 0x2b3ff17e7fd0 My
port: 44123
11:24:43.996 [C++-localhost.localdoma-27465:0x800b-5677] 0x2b3ff17e7fd0 My
address: 0.0.0.0
11:24:43.996 [C++-localhost.localdoma-27465:0x800b-5677] 0x2b3ff17e7fd0
Vertica port: 51900
11:24:43.996 [C++-localhost.localdoma-27465:0x800b-5677] 0x2b3ff17e7fd0
Vertica address: 127.0.0.1
11:25:19.749 [C++-localhost.localdoma-27465:0x800b-5677] 0x41837940 Setting
memory resource limit to -1
11:30:11.523 [C++-localhost.localdoma-27465:0x800b-5677] 0x41837940 Exiting
UDx side process
The last line indicates that the side process was killed. In this case it was killed when the user
session (vsql) closed.
About Fenced Mode Configuration Parameters:
Fenced mode supports two configuration parameters:


FencedUDxMemoryLimitMB - The maximum memory size, in MB, to use for Fenced Mode
processes. The default is -1 (no limit). The side process is killed if this limit is exceeded.
ForceUDxFencedMode - When set to 1, force all UDx's that support fenced mode to run in
fenced mode even if their definition specified NOT FENCED. The default is 0 (disabled).
See also:
 CREATE LIBRARY
 CREATE FUNCTION
 CREATE TRANSFORM FUNCTION
 CREATE ANALYTIC FUNCTION
 ALTER FUNCTION
 UDX_FENCED_PROCESSES
-432-
Developing and Using User Defined Functions
Setting up a UDF Development Environment
You should develop your UDF code on the same Linux platform that you use on your HP Vertica
database cluster. This will ensure that your UDF library is compatible with the HP Vertica version
deployed on your cluster.
At a minimum, you need to install the following on your development machine:


g++ http://gcc.gnu.org/ and its associated tool chain such as ld. (Note: some Linux
distributions package g++ separately from gcc.)
A copy of the HP Vertica SDK. See The HP Vertica SDK (page 433) for details.
Note: The HP Vertica binaries are compiled using the default version of g++ installed on the
supported Linux platforms. While other versions of g++ (or even entirely different compilers)
may produce compatible libraries, only the platform’s default g++ version is supported for
compiling UDFs.
While not required, the following additional software packages are highly recommended:



make, or some other build-management tool.
gdb or some other debugger.
Valgrind, or similar tools that detect memory leaks.
You should also have access to a non-production HP Vertica database for testing and debugging.
You may want to install a single-node HP Vertica database on your development machine for
easier development.
If you want to use any third-party libraries (for example, statistical analysis libraries), you need to
install them on your development machine. (If you do not statically link these libraries into your
UDF library, you also have to install them on every node in the cluster. See Compiling Your UDF
(page 493) for details.)
The HP Vertica SDK
The HP Vertica Software Development Kit (SDK) is distributed as part of the server installation. It
contains the source and header files you need to create your UDF library, as well as several
sample source files that you can use as a basis for your own UDFs.
The SDK files are located in the sdk subdirectory off of the root HP Vertica server directory
(usually, /opt/vertica/sdk). This directory contains:



include which contains the headers and source files needed to compile UDF libraries.
examples which contains the source code and sample data for UDF examples.
doc which contains the API documentation for the HP Vertica SDK.
-433-
Programmer's Guide
Running the Examples
See the README file in the examples directory for instructions on compiling and running the
examples. Running the examples not only helps you understand how a UDF works, it also helps
you ensure your development environment is properly set up to compile UDF libraries.
Note: You can copy /opt/vertica/sdk/examples to your home directory and run "make"
to build the example libraries. You must have a g++ development environment installed. To
install a g++ development environment on Red Hat systems, run yum install gcc gcc-c++
make.
Include File Overview
There are two files in the include directory you need when compiling your UDF:


Vertica.h is the main header file for the SDK. Your UDF code needs to include this file in
order to find the SDK's definitions.
Vertica.cpp contains support code that needs to be compiled into the UDF library.
Much of the HP Vertica SDK API is defined in the VerticaUDx.h header file (which is included by
the Vertica.h file). If you're curious, you may want to review the contents of this file in addition to
reading the API documentation.
The HP Vertica SDK API Documentation
This documentation only provides a brief overview of the classes and class functions defined by
the User Defined Function API. To learn more, see the HP Vertica SDK API. You can find this
documentation in two locations:


In the same directory as the other HP Vertica SDK files: /opt/vertica/sdk/doc.
Included with the full documentation set, available either online or for download. See Installing
HP Vertica Documentation in the Installation Guide.
Note: If you are viewing the HTML version of the documentation, you can follow this link to view
the SDK API documentation.
Developing a User Defined Function in C++
To create a User Defined Function (UDF) in C++, you need to create two classes:


A function class that performs the actual processing you want the UDF to perform.
A factory class that tells HP Vertica the name of the UDF and its parameters and return values.
The class you use depends on whether you are creating a scalar or transform UDF (see UDF
Types (page 430) for details).
The following sections explain how you develop and compile the code for your UDF.
-434-
Developing and Using User Defined Functions
HP Vertica C++ SDK Data Types
The HP Vertica SDK has typedefs and classes for representing HP Vertica data types within your
UDF code. Using these typedefs ensures data type compatibility between the data your UDF
processes and generates and the HP Vertica database. The following table describes some of the
typedefs available. Consult the HP Vertica SDK API Documentation (page 434) for a complete
list, as well as lists of helper functions to convert and manipulate these data types.
Type Definition
Description
Interval
An HP Vertica interval
IntervalYM
An HP Vertica year-to-mont h interval.
Timestamp
An HP Vertica timestamp
vint
A standard HP Vertica 64-bit integer
vint_null
A null value for integer values
vbool
A Boolean value in HP Vertica
vbool_null
A null value for a Boolean data types
vfloat
An HP Vertica floating point value
VString
String dat a types (such as varc har and
char)
Note: Do not use a VString object to
hold an intermediate res ult. Use a
std::string or char[] instead.
VNumeric
Fixed-point data types from HP Vertica
Notes
 When making some HP Vertica SDK API calls (such as VerticaType::getNumericLength) on
objects that have the correct data type. To minimize overhead and improve performance, most
of the APIs do not check the data types of the objects on which they are called. Calling a
function on an incorrect data type can result in an error.
 A NULL HP Vertica value string data type is converted into an empty C++ string.
Developing a User Defined Scalar Function
A UDSF function returns a single value for each row of data it reads. It can be used anywhere a
built-in HP Vertica function can be used. You usually develop a UDF to perform data
manipulations that are too complex or too slow to perform using SQL statements and functions.
UDFs also let you use analytic functions provided by third-party libraries within HP Vertica while
still maintaining high performance.
-435-
Programmer's Guide
The topics in this section guide you through developing a UDSF.
UDSF Requirements
There are several requirements for your UDSF:


Your UDSF must return a value for every input row (unless it generates an error, see Handling
Errors (page 514) for details). Failing to return a value for a row will result in incorrect results
(and potentially destabilizing the HP Vertica server if not run in fenced mode).
A UDSF cannot have more than 32 arguments.
If you intend your UDSF to run in unfenced mode, it is vital you pay attention to these additional
precautions (although fenced-mode UDSFs should following these guidelines as well).



Your UDSF must not allow an exception to be passed back to HP Vertica. Doing so could
result in a memory leak, as any memory allocated by the exception will not be reclaimed. It is a
good practice to use a top-level try-catch block to catch any stray exceptions that may be
thrown by your code or any functions or libraries your code calls.
If your UDSF allocates its own memory, you must make absolutely sure it properly frees it.
Failing to free even a single byte of allocated memory can have huge consequences if your
UDF is called to operate on a multi-million row table. Instead of having your code allocate its
own memory, you should use the vt_alloc macro, which uses HP Vertica's own memory
manager to allocate and track memory. This memory is guaranteed to be properly disposed of
when your UDSF completes execution. See Allocating Resources (page 508) for more
information.
In general, remember that unfenced UDSFs run within the HP Vertica process. Any problems
it causes may result in database instability or even data loss.
UDSF Class Overview
You create your UDSF by subclassing two classes defined by the HP Vertica SDK:
Vertica::ScalarFunction and Vertica::ScalarFunctionFactory.
The ScalarFunctionFactory performs two roles:


It lists the parameters accepted by the UDSF and the data type of the UDSF's return value. HP
Vertica uses this data when you call the CREATE FUNCTION SQL statement to add the
function to the database catalog.
It returns an instance of the UDSF function's ScalarFunction subclass that HP Vertica can
call to process data.
-436-
Developing and Using User Defined Functions
The ScalarFunction class is where you define the processBlock function that performs the
actual data processing. When a user calls your UDSF function in a SQL statement, HP Vertica
bundles together the data from the function parameters and sends it to the processBlock
statement.
The input and output of the processBlock function are supplied by objects of the
Vertica::BlockReader and Vertica::BlockWriter class. They define functions that you
use to read the input data and write the output data for your UDSF.
In addition to processBlock, the ScalarFunction class defines two optional class functions
that you can implement to allocate and free resources: setup and destroy. You should use
these class functions to allocate and deallocate resources that you do not allocate through the
UDF API (see Allocating Resources (page 508) for details).
The ServerInterface Class
All of the class functions that you will define in your UDSF receive an instance of the
ServerInterface class as a parameter. This object is used by the underlying HP Vertica SDK
code to make calls back into the HP Vertica process. For example, the macro you use to
instantiate a member of your ScalarFunction subclass (vt_createFuncObj) needs a pointer
to this object to able able to ask HP Vertica to allocate the memory for the new object. You
generally will not interact with this object directly, but instead pass it along to HP Vertica SDK
function and macro calls.
Subclassing ScalarFunction
The ScalarFunction class is the heart of a UDSF. Your own subclass must contain a single
class function named processBlock that carries out all of the processing that you want your
UDSF to perform.
Note: While the name you choose for your ScalarFunction subclass does not have to
match the name of the SQL function you will later assign to it, HP considers making the names
the same a best practice.
The following example shows a very basic subclass of ScalarFunction called Add2ints. As
the name implies it adds two integers together, returning a single integer result. It also
demonstrates including the main HP Vertica SDK header file (HP Vertica.h) and using the HP
Vertica namespace. While not required, using the namespace saves you from having to prefix
every HP Vertica SDK class reference with HP Vertica::.
// Include the top-level Vertica SDK file
#include "Vertica.h"
// Using the Vertica namespace means we don't have to prefix all
// class references with Vertica::
using namespace Vertica;
/*
* ScalarFunction implementation for a UDSF that adds
* two numbers together.
*/
-437-
Programmer's Guide
class Add2Ints : public ScalarFunction
{
public:
/*
* This function does all of the actual processing for the UDF.
* In this case, it simply reads two integer values and returns
* their sum.
*
* The inputs are retrieved via arg_reader
* The outputs are returned via arg_writer
*/
virtual void processBlock(ServerInterface &srvInterface,
BlockReader &arg_reader,
BlockWriter &res_writer)
{
// While we have input to process
do
{
// Read the two integer input parameters by calling the
// BlockReader.getIntRef class function
const vint a = arg_reader.getIntRef(0);
const vint b = arg_reader.getIntRef(1);
// Call BlockWriter.setInt to store the output value, which is the
// two input values added together
res_writer.setInt(a+b);
// Finish writing the row, and advance to the next output row
res_writer.next();
// Continue looping until there are no more input rows
}
while (arg_reader.next());
}
};
The majority of the work in developing a UDSF is creating your processBlock class function.
This is where all of the processing in your function occurs. Your own UDSF should follow the same
basic pattern as this example:




Read in a set of parameters from the BlockReader object using data-type-specific class
functions.
Process the data in some manner.
Output the resulting value using one of the BlockWriter class's data-type-specific class
functions.
Advance to the next row of output and input by calling BlockWriter.next() and
BlockReader.next().
This process continues until there are no more rows data to be read (BlockReader.next()
returns false).
-438-
Developing and Using User Defined Functions
Notes
 You must make sure that processBlock reads all of the rows in its input and outputs a single
value for each row. Failure to do so can corrupt the data structures that HP Vertica reads to get
the output of your UDSF. The only exception to this rule is if your processBlock function
uses the vt_report_error macro to report an error back to HP Vertica (see Handling
Errors (page 514) for more). In that case, HP Vertica does not attempt to read the incomplete
result set generated by the UDSF.
 Writing too many output rows can cause HP Vertica an out of bounds error.
Subclassing ScalarFunctionFactory
The ScalarFunctionFactory class tells HP Vertica metadata about your User Defined Scalar
Function (UDSF): its number of parameters and their data types, as well as the data type of its
return value. It also instantiates a member of the UDSF's ScalarFunction subclass for HP
Vertica.
After defining your factory class, you need to call the RegisterFactory macro. This macro
instantiates a member of your factory class, so HP Vertica can interact with it and extract the
metadata it contains about your UDSF.
The following example shows the ScalarFunctionFactory subclass for the example
ScalarFunction function subclass shown in Subsclassing ScalarFunction (page 437).
/*
* This class provides metadata about the ScalarFunction class, and
* also instantiates a member of that class when needed.
*/
class Add2IntsFactory : public ScalarFunctionFactory
{
// return an instance of Add2Ints to perform the actual addition.
virtual ScalarFunction *createScalarFunction(ServerInterface &interface)
{
// Calls the vt_createFuncObj to create the new Add2Ints class instance.
return vt_createFuncObj(interface.allocator, Add2Ints);
}
// This function returns the description of the input and outputs of the
// Add2Ints class's processBlock function. It stores this information in
// two ColumnTypes objects, one for the input parameters, and one for
// the return value.
virtual void getPrototype(ServerInterface &interface,
ColumnTypes &argTypes,
ColumnTypes &returnType)
{
// Takes two ints as inputs, so add ints to the argTypes object
argTypes.addInt();
argTypes.addInt();
// returns a single int, so add a single int to the returnType object.
// Note that ScalarFunctions *always* return a single value.
returnType.addInt();
}
-439-
Programmer's Guide
};
There are two required class functions you must implement your ScalarFunctionFactory
subclass:


createScalarFunction instantiates a member of the UDSF's ScalarFunction class.
The implementation of this function is simple—you just supply the name of the
ScalarFunction subclass in a call to the vt_createFuncObj macro. This macro takes
care of allocating and instantiating the class for you.
getPrototype tells HP Vertica about the parameters and return type for your UDSF. In
addition to a ServerInterface object, this function gets two ColumnTypes objects. All you
ned to do in this function is to call class functions on these two objects to build the list of
parameters and the single return value type.
After you define your ScalarFunctionFactory subclass, you need to use the
RegisterFactory macro to make the factory available to HP Vertica. You just pass this macro
the name of your factory class.
The getReturnType Function
If your function returns a sized column (a return data type whose length can vary, such as a
varchar) or a value that requires precision, you need to implement a class function named
getReturnType. This function is called by HP Vertica to find the length or precision of the data
being returned in each row of the results. The return value of this function depends on the data
type your processBlock function returns:
 CHAR or VARCHAR return the maximum length of the string.
 NUMERIC types specify the precision and scale.
 TIME and TIMESTAMP values (with or without timezone) specify precision.
 INTERVAL YEAR TO MONTH specifies range.
 INTERVAL DAY TO SECOND specifies precision and range.
If your UDSF does not return one of these data types, it does not need a getReturnType
function.
The input to getReturnType function is a SizedColumnTypes object that contains the input
argument types along with their lengths that will be passed to an instance of your processBlock
function. Your implementation of getReturnType has to extract the data types and lengths from
this input and determine the length or precision of the output rows. It then saves this information in
another instance of the SizedColumnTypes class.
The following demonstration comes from one of the UDSF examples that is included with the HP
Vertica SDK. This function determines the length of the VARCHAR data being returned by a
UDSF that removes all spaces from the input string. It extracts the return value as a
VerticaType object, then uses the getVarcharLength class function to get the length of the
string.
// Determine the length of the varchar string being returned.
virtual void getReturnType(ServerInterface &srvInterface,
const SizedColumnTypes &argTypes,
SizedColumnTypes &returnType)
-440-
Developing and Using User Defined Functions
{
const VerticaType &t = argTypes.getColumnType(0);
returnType.addVarchar(t.getVarcharLength());
}
The RegisterFactory Macro
Once you have completed your ScalarFunctionFactory subclass, you need to register it
using the RegisterFactory macro. This macro instantiates your factory class and makes the
metadata it contains available for HP Vertica to access. To call this macro, you just pass it the
name of your factory class.
// Register the factory with HP Vertica
RegisterFactory(Add2IntsFactory);
Setting Null Input and Volatility Behavior
Normally, HP Vertica calls your UDSF for every row of data in the query. There are two cases
where HP Vertica could avoid calling your UDSF code:


If your function returns NULL when any parameter is a NULL, HP Vertica can just return NULL
without having to call the function. This optimization is also helpful since you do not need to
handle null input parameters in your UDSF code.
If your function produces the same output value given the same input parameters, HP Vertica
can cache the function's return value. If the UDSF is called with the same set of input
parameters again, it can return the cached value, rather than calling your UDSF.
Letting HP Vertica know about these behaviors of your function allows it to optimize queries
containing your UDSF.
You indicate the volatility and null handling of your function by setting the vol and strict fields
in your ScalarFunctionFactory class's constructor.
Volatility Settings
To indicate your function's volatility, set the vol field to one of the following values:
Value
Description
VOLA TILE
Repeated calls to the function with the
same input parameters always result in
different values. HP Vertica always calls
volatile functions for eac h invocation.
IMMUTABLE
Calls to the function wit h the same input
parameters always results in the same
output.
-441-
Programmer's Guide
STABLE
Repeated calls to the function with the
same input within the same statement
returns the same output. For ex ample, a
function that returns the current user name
would be stable since the user cannot
change within a statement, but could
change bet ween statements.
DEFAULT_VOLA TILITY
The default volatility. This is the same as
VOLA TILE.
The following example code shows a version of the Add2ints example factory class that makes the
function immutable.
class Add2intsImmutableFactory : public Vertica::ScalarFunctionFactory
{
virtual Vertica::ScalarFunction
*createScalarFunction(Vertica::ServerInterface &srvInterface)
{ return vt_createFuncObj(srvInterface.allocator, Add2ints); }
virtual void getPrototype(Vertica::ServerInterface &srvInterface,
Vertica::ColumnTypes &argTypes,
Vertica::ColumnTypes &returnType)
{
argTypes.addInt();
argTypes.addInt();
returnType.addInt();
}
public:
Add2intsImmutableFactory() {vol = IMMUTABLE;}
};
RegisterFactory(Add2intsImmutableFactory);
Null Input Behavior
To indicate how your function reacts to NULL input, set the strictness field to one of the
following values.
Value
Description
CALLE D_ON_NULL_INPUT
The function must be called, even if one
or more input values are NULL.
RE TURN_NULL_ON_NULL_INP
UT
The function always returns a NULL
value if any of its inputs are NULL.
STRICT
A synonym for
RE TURN_NULL_ON_NULL_INPUT
DEFAULT_S TRICTNESS
The default strictness setting. This is
the same as
CALLE D_ON_NULL_INPUT.
-442-
Developing and Using User Defined Functions
The following example demonstrates setting the null behavior of Add2ints so HP Vertica does not
call the function with NULL values.
class Add2intsNullOnNullInputFactory : public Vertica::ScalarFunctionFactory
{
virtual Vertica::ScalarFunction
*createScalarFunction(Vertica::ServerInterface &srvInterface)
{ return vt_createFuncObj(srvInterface.allocator, Add2ints); }
virtual void getPrototype(Vertica::ServerInterface &srvInterface,
Vertica::ColumnTypes &argTypes,
Vertica::ColumnTypes &returnType)
{
argTypes.addInt();
argTypes.addInt();
returnType.addInt();
}
public:
Add2intsNullOnNullInputFactory() {strict = RETURN_NULL_ON_NULL_INPUT;}
};
RegisterFactory(Add2intsNullOnNullInputFactory);
Deploying and Using UDSFs
To deploy a UDSF on your HP Vertica database:
1 Copy the UDF shared library file (.so) that contains your function to a node on your HP Vertica
cluster.
2 Connect to the node where you copied the library (for example, using vsql).
3 Use the CREATE LIBRARY statement to load the UDF library into HP Vertica. You pass this
statement the location of the UDF library file you copied to the node earlier. HP Vertica
distributes the library to each node in the cluster, and each HP Vertica process loads a copy of
the library.
4 Use the CREATE FUNCTION statement to add the functions to HP Vertica's catalog. This
maps a SQL function name to the name of the UDSF's factory class. If you are not sure of the
name of the UDSF's factory class, you can list all of the UDFs in the library (see Listing the
UDFs Contained in a Library (page 542) for details).
The following example demonstrates loading the Add2ints UDSF that is included in the SDK
examples directory. It assumes that the ScalarFunctions.so library that contains the function
has been copied to the dbadmin user's home directory on the initiator node.
=> CREATE LIBRARY ScalarFunctions AS
-> '/home/dbadmin/ScalarFunctions.so';
CREATE LIBRARY
=> CREATE FUNCTION Add2Ints AS LANGUAGE 'C++'
-443-
Programmer's Guide
-> NAME 'Add2IntsFactory' LIBRARY ScalarFunctions;
CREATE FUNCTION
After creating the Add2ints UDSF, it can be used almost everywhere a built-in function can be
used:
=> SELECT Add2Ints(27,15);
Add2ints
---------42
(1 row)
=> SELECT * FROM MyTable;
a | b
-----+---7 | 0
12 | 2
12 | 6
18 | 9
1 | 1
58 | 4
450 | 15
(7 rows)
=> SELECT * FROM MyTable WHERE Add2ints(a, b) > 20;
a | b
-----+---18 | 9
58 | 4
450 | 15
(3 rows)
See Also
CREATE LIBRARY and CREATE FUNCTION in the SQL Reference Manual.
Developing a User Defined Transform Function in C++
A User Defined Transform Function (UDTF) reads one or more arguments (treated as a row of
data), and returns zero or more rows of data consisting of one or more columns. The schema of
the output table does not need to correspond to the schema of the input table—they can be totally
different. The UDTF can return any number of output rows for each row of input.
UDTFs can only be used in the SELECT list that contains just the UDTF call and a required OVER
clause.
Unlike other types of User Defined Functions, UDTFs do not have a limit on the number of
arguments that they can accept. Most other types of UDFs have a maximum of 32 arguments..
The topics in this section guide you through developing a UDTF.
-444-
Developing and Using User Defined Functions
UDTF Requirements
There are several requirements for UDTFs:





UDTF's are run after 'group by', but before the final 'order by', when used in conjunction with
'group by' and 'order by' in a statement.
The UDTF can produce as little or as many rows as it wants as output. However, each row it
outputs must be complete. Advancing to the next row without having added a value for each
column results in incorrect results.
Your UDTF must not allow an exception to be passed back to Vertica. You should use a
top-level try-catch block to catch any stray exceptions that may be thrown by your code or any
functions or libraries your code calls.
If your UDTF allocates its own memory, you must make absolutely sure it properly frees it.
Failing to free even a single byte of allocated memory can have huge consequences if your
UDF is called to operate on a multi-million row table. Instead of having your code allocate its
own memory, you should use the vt_alloc macro, which uses Vertica's own memory
manager to allocate and track memory. This memory is guaranteed to be properly disposed of
when your UDTF finishes executing. See Allocating Resources (page 508) for more
information.
If you intend your UDTF to run in unfenced mode, you need to ensure it is error-free. Any
errors in an unfenced UDF can result in database instability or even data loss.
UDTF Class Overview
You create your UDTF by subclassing two classes defined by the HP Vertica SDK:
Vertica::TransformFunction and Vertica::TransformFunctionFactory.
The TransformFunctionFactory performs two roles:

It provides the number of parameters and their and data types accepted by the UDTF and the
number of output columns and their data types UDTF's output. HP Vertica uses this data when
you call the CREATE FUNCTION SQL statement to add the function to the database catalog.
 It returns an instance of the UDTF function's TransformFunction subclass that HP Vertica
can call to process data.
The TransformFunction class is where you define the processPartition function, which
performs the data processing that you want your UDTF to perform. When a user calls your UDTF
function in a SQL SELECT statement, HP Vertica sends a partition of data to the
processPartition statement.
The input and output of the processPartition function are supplied by objects of the
Vertica::PartitionReader and Vertica::PartitionWriter class. They define
functions that you use to readthe input data and write the output data for your UDTF.
In addition to processPartition, the TransformFunction class defines two optional class
functions that you can implement to allocate and free resources: setup and destroy. You
should use these class functions to allocate and deallocate resources that you do not allocate
through the UDF API (see Allocating Resources (page 508) for details).
-445-
Programmer's Guide
The ServerInterface Class
All of the class functions that you will define in your UDF receive an instance of the
ServerInterface class as a parameter. This object is used by the underlying HP Vertica SDK
code to make calls back into the HP Vertica process. For example, the macro you use to
instantiate a member of your TransformFunction subclass (vt_createFuncObj) needs a
pointer to a class function on this object to able able to ask HP Vertica to allocate the memory for
the new object. You generally will not interact with this object directly, but instead pass it along to
HP Vertica SDK function and macro calls.
Subclassing TransformFunction
Your subclass of Vertica::TransformFunction is where you define the processing you want
your UDTF to perform. The only required function in this class is processPartition, which
reads the parameters sent to your UDTF via a Vertica::PartitionReader object, and writes
output values to a Vertica::PartitionWriter object.
The following example shows a subclass of TransformFunction named StringTokenizer
that breaks input strings into individual words, returning each on its own row. For example:
=> SELECT * FROM t;
text
-----------------------row row row your boat
gently down the stream
(2 rows)
=> SELECT tokenize(text) OVER (partition by text) FROM t;
words
-------gently
down
the
stream
row
row
row
your
boat
(9 rows)
Notice that the number of rows in the result table (and the name of the results column) are different
than the input table. This is one of the strengths of a UDTF.
The following code defines the StringTokenizer class.
-446-
Developing and Using User Defined Functions
#include "Vertica.h"
#include <sstream>
// Use the Vertica namespace to make referring
// to SDK classes easier.
using namespace Vertica;
using namespace std;
// The primary class for the StringTokenizer UDTF.
class StringTokenizer : public TransformFunction {
// Called for each partition in the table. Recieves the data from
// The source table and
virtual void processPartition(ServerInterface &srvInterface,
PartitionReader &inputReader,
PartitionWriter &outputWriter) {
try {
// Loop through the input rows
do {
// Get a single varchar as input.
const VString &sentence = inputReader.getStringRef(0);
// If input string is NULL, then output is NULL as well
if (sentence.isNull())
{
VString &word = outputWriter.getStringRef(0);
word.setNull();
outputWriter.next();
}
else
{
// Otherwise, let's tokenize the string and output the words
std::string tmp = sentence.str();
istringstream ss(tmp);
do
{
std::string buffer;
ss >> buffer;
// Copy to output
if (!buffer.empty()) {
VString &word = outputWriter.getStringRef(0);
word.copy(buffer);
outputWriter.next();
}
} while (ss);
}
} while (inputReader.next()); // Loop until no more input rows
} catch (exception& e) {
// Standard exception. Quit.
vt_report_error(0, "Exception while processing partition: [%s]",
e.what());
}
}
};
-447-
Programmer's Guide
The processPartition function in this example follows a pattern that you will follow in your
own UDTF: it loops over all rows in the table partition that HP Vertica sends it, processing each
row. For UDTF's you do not have to actually process every row. You can exit your function without
having read all of the input without any issues. You may choose to do this if your UDTF is
performing some sort search or some other operation where it can determine that the rest of the
input is unneeded.
Extracting Parameters
The first task your UDTF function needs to perform in its main loop is to extract its parameters.
You call a data-type specific function in the PartitionReader object to extract each input
parameter. All of these functions take a single parameter: the column number in the input row that
you want to read. In this example, processPartition extracts the single VString
../SDK/html/class_vertica_1_1_v_string.htm input parameter from the
PartitionReader object. The VString class represents an HP Vertica string value
(VARCHAR or CHAR).
In more complex UDTFs, you may need to extract multiple values. This is done the same way as
shown in the example, calling the data-type specific function to extract the value of each column in
the input row.
Note: In some cases, you may want to determine the number and types of parameters using
PartitionReader's getNumCols and getTypeMetaData functions, instead of just
hard-coding the data types of the columns in the input row. This is useful if you want your
TransformFunction to be able to process input tables with different schemas. You can then
use different TransformFunctionFactory classes to define multiple function signatures
that call the same TransformFunction class. See Subclassing
TransformFunctionFactory (page 449) for more information.
Handling Null Values
When developing UDTFs, you often need to handle NULL input values in a special manner. In this
example, a NULL input value results in a NULL output value, which is handled as a special case.
After writing a NULL to the output, processPartition moves on to the next input row.
Processing Input Values
After handling any NULL values, the processPartition shown in the example moves on to
performing the actual processing. It breaks the string into individual words and adds each word
to its own row in the output.
Writing Output
After your UDTF has performed its processing, it may need to write output. Unlike a UDSF,
outputting data is optional for a UDTF. However, if it does write output, it must supply values for all
of the output columns you defined for your UDTF (see Subclassing TransformFunctionFactory
(page 449) for details on how you specify the output columns of your UDTF) . There are no default
values for your output. If you want to output a NULL value in one of the columns, you must
explicitly set it.
-448-
Developing and Using User Defined Functions
Similarly to reading input columns, there are function on the PartitionWriter object for writing
each type of data to the output row. In this case, the example calls the PartitionWriter
object's getStringRef function to allocate a new VString object to hold the word it needs to
output. Once it has copied the buffer containing the word, the example calls
PartitionWriter.next() to complete the output row.
Advancing to the Next Input Row
In most UDTFs, processing will continue until all of the rows of input have been read. You advance
to the next row by calling ProcessReader.next(). This function returns true if there is another
row of input data to process and false if all the data in the partition has been read. Once the input
rows are exhausted, your UDTF usually exits so its results are returned back to HP Vert ica.
Subclassing TransformFunctionFactory
Your subclass of the TransformFunctionFactory provides metadata about your UDTF to HP
Vertica. Included in this information is the function's name, number and data type of parameters,
and the number and data types of output columns.
There are three required functions you need to implement in your
TransformFunctionFactory:



getPrototype returns two ColumnTypes objects that describe the columns your UDTF
takes as input and returns as output.
createTransformFunction instantiates a member of your TransformFunction
subclass that HP Vertica can call to process data.
getReturnType tells HP Vertica details about the output values: the width of variable sized
data types (such as VARCHAR) and the precision of data types that have settable precision
(such as TIMESTAMP). You can also set the names of the output columns using in this
function.
Note: The getReturnType function is optional for User Defined Scalar Functions since they
do not return a table, and therefore do not require column names. It is required for UDTFs.
The following example shows the factory class that corresponds to the TransformFunction
subclass shown in Subclassing TransformFunction (page 446).
-449-
Programmer's Guide
class TokenFactory : public TransformFunctionFactory {
// Tell Vertica that StringTokenizer reads in a row with 1 string,
// and returns a row with 1 string
virtual void getPrototype(ServerInterface &srvInterface, ColumnTypes
&argTypes, ColumnTypes &returnType) {
argTypes.addVarchar();
returnType.addVarchar();
}
// Tell Vertica the maxiumu return string length will be, given the input
// string length. Also names the output column. This function is only
// necessary for columns that have a variable size (i.e. strings) or
// have to report their precision.
virtual void getReturnType(ServerInterface &srvInterface,
const SizedColumnTypes &input_types,
SizedColumnTypes &output_types) {
int input_len = input_types.getColumnType(0).getStringLength();
// Output size will never be more than the input size
// Also sets the name of the output column.
output_types.addVarchar(input_len, "words");
}
virtual TransformFunction *createTransformFunction(ServerInterface
&srvInterface) {
return vt_createFuncObj(srvInterface.allocator, StringTokenizer);
}
};
The getPrototype function is straightforward. You call functions on the ColumnTypes objects
to set the data types of the input and output columns for your function. In this example, the UDTF
takes a single VARCHAR column as input and returns a single VARCHAR column as output, so it
calls the addVarchar() function on both of the ColumnTypes objects. See the ColumnTypes
entry in the HP Vertica API documentation for a full list of the data type functions you can call to set
input and output column types.
The getReturnType function is similar to getPrototype, but instead of returning just the data
types of the output columns, this function returns the precision of data types that require it
(INTERVAL, INTERVAL YEAR TO MONTH, TIMESTAMP, TIMESTAMP WITH TIMEZONE, or
VNumeric) or the maximum length of variable-width columns (VARCHAR). This example just
returns the length of the input string, since the output will never be longer than the input string. It
also sets the name of the output column to "words."
Note: You do not have to supply a name for an output column in this function, since the column
name has a default value of "". However, if you do not supply a column name here, the SQL
statements that call your UDTF must provide aliases for the unnamed columns or they will fail
with an error message. From a usability standpoint, its easier for you to supply the column
names here once, rather than to force all of the users of your function to supply their own
column names for each call to the UDTF.
createTransformFunction is essentially boilerplate code. It just calls the
vt_returnFuncObj macro with the name of the TransformFunction class associated with
this factory class. This macro takes care of instantiating a copy of the TransformFunction class
that HP Vertica can use to process data.
-450-
Developing and Using User Defined Functions
Registering the UDTF Factory Subclass
The final step in creating your UDTF is to call the RegisterFactory macro. This macro ensures
that your factory class is instantiated when HP Vertica loads the shared library containing your
UDTF. having your factory class instantiated is the only way that HP Vertica can find your UDTF
and determine what its inputs and outputs are.
The RegisterFactory macro just takes the name of your factory class:
RegisterFactory(TokenFactory);
Creating Multi-phase UDTFs
Multi-phase UDTFs let you break your data processing into multiple steps. Using this feature, your
UDTFs can perform processing in a way similar to Hadoop or other MapReduce frameworks. You
can use the first phase to break down and gather data, and then use subsequent phases to
process the data. For example, the first phase of your UDTF could extract specific types of user
interactions from a web server log stored in the column of a table, and subsequent phases could
perform analysis on those interactions.
Multi-phase UDTFs also let you decide where processing should should occur: locally on each
node, or throughout the cluster. If your multi-phase UDTF is like a MapReduce process, you want
the first phase of your multi-phase UDTF to process data that is stored locally on the node where
the instance of the UDTF is running. This prevents large segments of data from being copied
around the HP Vertica cluster. Depending on the type of processing being performed in later
phases, you may choose to have the data segmented and distributed across the HP Vertica
cluster.
Each phase of the UDTF is the same as a traditional (single-phase) UDTF: it receives a table as
input, and generates a table as output. The schema for each phase's output does not have to
match its input, and each phase can output as many or as few rows as it wants. You create a
subclass of TransformFunction to define the processing performed by each stage. If you
already have a TransformFunction from a single-phase UDTF that performs the processing
you want a phase of your multi-phase UDTF to perform, you can easily adapt it to work within the
multi-phase UDTF.
What makes a multi-phase UDTF different from a traditional UDTF is the factory class you use.
You define a multi-phase UDTF using a subclass of
MultiPhaseTransformFunctionFactory, rather than the TransformFunctionFactory.
This factory class acts as a container for all of the phases in your multi-step UDTF. It provides HP
Vertica with the input and output requirements of the entire multi-phase UDTF (through the
getPrototype function), and a list of all the phases in the UDTF.
-451-
Programmer's Guide
Within your subclass of the MultiPhaseTransformFunctionFactory class, you define one
or more subclasses of TransformFunctionPhase. These classes fill the same role as
TransformFunctionFactory class for each phase in your multi-phase UDTF. They define the
input and output of each phase and create instances of their associated TransformFunction
classes to perform the processing for each phase of the UDTF. In addition to these subclasses,
your MultiPhaseTransformFunctionFactory includes fields that provide a handle to an
instance of each of the TransformFunctionPhase subclasses.
The following code fragment is from the InvertedIndex UDTF example distributed with the HP
Vertica SDK. It demonstrates subclassing the MultiPhaseTransformFunctionFactory
including two TransformFunctionPhase subclasses that defines the two phases in this UDTF.
-452-
Developing and Using User Defined Functions
class InvertedIndexFactory : public MultiPhaseTransformFunctionFactory
{
public:
/**
* Extracts terms from documents.
*/
class ForwardIndexPhase : public TransformFunctionPhase
{
virtual void getReturnType(ServerInterface &srvInterface,
const SizedColumnTypes &inputTypes,
SizedColumnTypes &outputTypes)
{
// Sanity checks on input we've been given.
// Expected input: (doc_id INTEGER, text VARCHAR)
vector<size_t> argCols;
inputTypes.getArgumentColumns(argCols);
if (argCols.size() < 2 ||
!inputTypes.getColumnType(argCols.at(0)).isInt() ||
!inputTypes.getColumnType(argCols.at(1)).isVarchar())
vt_report_error(0, "Function only accepts two arguments"
"(INTEGER, VARCHAR))");
// Output of this phase is:
//
(term_freq INTEGER) OVER(PBY term VARCHAR OBY doc_id INTEGER)
// Number of times term appears within a document.
outputTypes.addInt("term_freq");
// Add analytic clause columns: (PARTITION BY term ORDER BY doc_id).
// The length of any term is at most the size of the entire document.
outputTypes.addVarcharPartitionColumn(
inputTypes.getColumnType(argCols.at(1)).getStringLength(),
"term");
// Add order column on the basis of the document id's data type.
outputTypes.addOrderColumn(inputTypes.getColumnType(argCols.at(0)),
"doc_id");
}
virtual TransformFunction *createTransformFunction(ServerInterface
&srvInterface)
{ return vt_createFuncObj(srvInterface.allocator, ForwardIndexBuilder); }
};
/**
* Constructs terms' posting lists.
*/
class InvertedIndexPhase : public TransformFunctionPhase
{
virtual void getReturnType(ServerInterface &srvInterface,
const SizedColumnTypes &inputTypes,
SizedColumnTypes &outputTypes)
-453-
Programmer's Guide
{
// Sanity checks on input we've been given.
// Expected input:
//
(term_freq INTEGER) OVER(PBY term VARCHAR OBY doc_id INTEGER)
vector<size_t> argCols;
inputTypes.getArgumentColumns(argCols);
vector<size_t> pByCols;
inputTypes.getPartitionByColumns(pByCols);
vector<size_t> oByCols;
inputTypes.getOrderByColumns(oByCols);
if (argCols.size() != 1 || pByCols.size() != 1 || oByCols.size() != 1
||
!inputTypes.getColumnType(argCols.at(0)).isInt() ||
!inputTypes.getColumnType(pByCols.at(0)).isVarchar() ||
!inputTypes.getColumnType(oByCols.at(0)).isInt())
vt_report_error(0, "Function expects an argument (INTEGER) with "
"analytic clause OVER(PBY VARCHAR OBY INTEGER)");
// Output of this phase is:
//
(term VARCHAR, doc_id INTEGER, term_freq INTEGER, corp_freq
INTEGER).
outputTypes.addVarchar(inputTypes.getColumnType(
pByCols.at(0)).getStringLength(),"term");
outputTypes.addInt("doc_id");
// Number of times term appears within the document.
outputTypes.addInt("term_freq");
// Number of documents where the term appears in.
outputTypes.addInt("corp_freq");
}
virtual TransformFunction *createTransformFunction(ServerInterface
&srvInterface)
{ return vt_createFuncObj(srvInterface.allocator, InvertedIndexBuilder);
}
};
ForwardIndexPhase fwardIdxPh;
InvertedIndexPhase invIdxPh;
virtual void getPhases(ServerInterface &srvInterface,
std::vector<TransformFunctionPhase *> &phases)
{
fwardIdxPh.setPrepass(); // Process documents wherever they're originally
stored.
phases.push_back(&fwardIdxPh);
phases.push_back(&invIdxPh);
}
virtual void getPrototype(ServerInterface &srvInterface,
ColumnTypes &argTypes,
-454-
Developing and Using User Defined Functions
ColumnTypes &returnType)
{
// Expected input: (doc_id INTEGER, text VARCHAR).
argTypes.addInt();
argTypes.addVarchar();
// Output is: (term VARCHAR, doc_id INTEGER, term_freq INTEGER, corp_freq
INTEGER)
returnType.addVarchar();
returnType.addInt();
returnType.addInt();
returnType.addInt();
}
};
RegisterFactory(InvertedIndexFactory);
Most of the code in this example is similar to the the code in a TransformFunctionFactory
class:


Both TransformFunctionPhase subclasses implement the getReturnType function,
which describes the output of each stage. This is the similar to the getPrototype function
from the TransformFunctionFactory class. However, this function also lets you control
how the data is partitioned and ordered between each phase of your multi-phase UDTF.
The first phase calls SizedColumnTypes::addVarcharPartitionColumn (rather than
just addVarcharColumn) to set the phase's output table to be partitioned by the column
containing the extracted words. It also calls SizedColumnTypes::addOrderColumn to
order the output table by the document ID column. It calls this function instead of one of the
data-type-specific functions (such as addIntOrderColumn
../SDK/html/class_vertica_1_1_sized_column_types.htm#a07d4b8cebb4cac
d1892d4bb180910a0d) so it can pass the data type of the original column through to the
output column.
Note: Any order by column or partition by column set by the final phase of the UDTF in its
getReturnType function is ignored. Its output is returned to the initiator node rather than
partitioned and reordered then sent to another phase.
The MultiPhaseTransformFunctionFactory class implements the getPrototype
function, that defines the schemas for the input and output of the multi-phase UDTF. This
function is the same as the TransformFunctionFactory::getPrototype function.
The unique function implemented by the MultiPhaseTransformFunctionFactory class is
getPhases. This function defines the order in which the phases are executed. The fields that
represent the phases are pushed into this vector in the order they should execute.
The MultiPhaseTransformFunctionFactory.getPhase function is also where you flag the
first phase of the UDTF as operating on data stored locally on the node (called a "pre-pass" phase)
rather than on data partitioned across all nodes. Using this option increases the efficiency of your
multi-phase UDTF by avoiding having to move significant amounts of data around the HP Vertica
cluster.
-455-
Programmer's Guide
Note: Only the first phase of your UDTF can be a pre-pass phase. You cannot have multiple
pre-pass phases, and no later phase can be a pre-pass phase.
To mark the first phase as pre-pass, you call the TransformFunctionPhase::setPrepass
function of the first phase's TransformFunctionPhase instance from within the getPhase function.
Notes
 You need to ensure that the output schema of each phase matches the input schema
expected by the next phase. In the example code, each
TransformFunctionPhase::getReturnType implementation performs a sanity check
on its input and output schemas. Your TransformFunction subclasses can also perform
these checks in their processPartition function.
 There is no built-in limit on the number of phases that your multi-phase UDTF can have.
However, more phases use more resources. When running in fenced mode, HP Vertica may
terminate UDTFs that use too much memory. See UDF Resource Use (page 507).
Deploying and Using User Defined Transforms
To deploy a UDTF on your HP Vertica database:
1 Copy the UDF shared library file (.so) that contains your function to a node on your HP Vertica
cluster.
2 Connect to the node where you copied the library (for example, using vsql).
3 Use the CREATE LIBRARY statement to load the UDF library into HP Vertica. You pass this
statement the location of the UDF library file you copied to the node earlier. HP Vertica
distributes the library to each node in the cluster.
4 Use the CREATE TRANSFORM FUNCTION statement to add the function to the HP Vertica
catalog. This maps a SQL function name to the name of the UDF's factory class. If you are not
sure of the name of the UDF's factory class, you can list all of the UDFs in the library (see
Listing the UDFs Contained in a Library (page 542) for details).
The following example demonstrates loading the Tokenize UDTF that is included in th e SDK
examples directory. It assumes that the TransformFunctions.so library that contains the
function has been copied to the dbadmin user's home directory on the initiator node.
=> CREATE LIBRARY TransformFunctions AS
-> '/home/dbadmin/TransformFunctions.so';
CREATE LIBRARY
=> CREATE TRANSFORM FUNCTION tokenize
-> AS LANGUAGE 'C++' NAME 'TokenFactory' LIBRARY TransformFunctions;
CREATE TRANSFORM FUNCTION
=> CREATE TABLE T (url varchar(30), description varchar(2000));
CREATE TABLE
=> INSERT INTO T VALUES ('www.amazon.com','Online retail merchant and provider of
cloud services');
OUTPUT
-------1
(1 row)
-456-
Developing and Using User Defined Functions
=> INSERT INTO T VALUES ('www.hp.com','Leading provider of computer hardware and
imaging solutions');
OUTPUT
-------1
(1 row)
=> INSERT INTO T VALUES ('www.vertica.com','World''s fastest analytic database');
OUTPUT
-------1
(1 row)
=> COMMIT;
COMMIT
=> -- Invoke the UDT
=> SELECT url, tokenize(description) OVER (partition by url) FROM T;
url
|
words
-----------------+----------www.amazon.com | Online
www.amazon.com | retail
www.amazon.com | merchant
www.amazon.com | and
www.amazon.com | provider
www.amazon.com | of
www.amazon.com | c
www.amazon.com | loud
www.amazon.com | services
www.hp.com
| Leading
www.hp.com
| provider
www.hp.com
| of
www.hp.com
| computer
www.hp.com
| hardware
www.hp.com
| and
www.hp.com
| im
www.hp.com
| aging
www.hp.com
| solutions
www.vertica.com | World's
www.vertica.com | fastest
www.vertica.com | analytic
www.vertica.com | database
(22 rows)
UDTF Query Restrictions
A query that includes a UDTF cannot contain:




any statements other than the SELECT statement containing the call to the UDTF and a
PARTITION BY expression.
any other analytic function.
a call to another UDTF.
a TIMESERIES clause.
-457-
Programmer's Guide


a pattern matching clause.
a gap filling and interpolation (page 369) clause.
Partitioning by Data Stored on Nodes
UDTFs usually need to process data partitioned in a specific way. For example, a UDTF may
process a web server log file to determine how many hits were referred from each partner web
site. This UDTF needs to have its input partitioned by a referrer column, so that each instance of
the UDTF sees the hits generated by a particular partner so it can total the number of hits. When
you execute this UDTF, you supply a PARTITION BY clause to partition data in this way—each
node in the HP Vertica database partitions the the data it stores, sends some of these partitions off
to other nodes, and then consolidates the partitions it receives from other nodes runs an instance
of the UDTF to process them.
If your UDTF does not need to process data partitioned in a particular way, you can make its
processing much more efficient by eliminating the overhead of partitioning the data. Instead, each
instance of the UDTF processes just the data that is stored locally by the node on which it is
running. As long as your UDTF does not need to see data partitioned in any particular manner (for
example, a UDTF that parses data out of an Apache log file), you can tremendously speed up its
processing using this option.
You tell your UDTF to only process local data by using the PARTITION AUTO clause, rather than
specifying a column or expression to use to partition the data. For example, to call a UDTF that
parses a locally-stored Apache log file, you could use the following statement:
SELECT ParseLogFile('/data/apache/log*') OVER (PARTITION AUTO);
Using PARTITION AUTO to Process Local Data
UDTFs usually need to process data partitioned in a specific way. For example, a UDTF that
processes a web server log file to count the number of hits referred by each partner web site UDTF
needs to have its input partitioned by a referrer column. Each instance of the UDTF sees the hits
referred by a particular partner site so it can count them. When you execute this UDTF, you supply
a PARTITION BY clause to partition data in this way—each node in the HP Vertica database
partitions the the data it stores, sends some of these partitions off to other nodes, and then
consolidates the partitions it receives from other nodes runs an instance of the UDTF to process
them.
If your UDTF does not need its input data partitioned in a particular way, you can make its
processing much more efficient by eliminating the overhead of partitioning the data. Instead, you
have each instance of the UDTF process just the data that is stored locally by the node on which it
is running. As long as your UDTF does not need to see data partitioned in any particular manner
(for example, a UDTF that parses data out of an Apache log file), you can tremendously speed up
its processing using this option.
You tell your UDTF to only process local data by using the PARTITION AUTO clause, rather than
specifying a column or expression to use to partition the data. You need to supply a source table
that is replicated across all nodes and contains a single row (similar to the DUAL table). For
example, to call a UDTF that parses locally-stored Apache log files, you could use the following
statements:
=> CREATE TABLE rep (dummy INTEGER) UNSEGMENTED ALL NODES;
-458-
Developing and Using User Defined Functions
CREATE TABLE
=> INSERT INTO rep VALUES (1);
OUTPUT
-------1
(1 row)
=> SELECT ParseLogFile('/data/apache/log*') OVER (PARTITION AUTO) FROM rep;
Developing a user Defined Aggregate Function
Aggregate Functions perform an operation on a set of values and return one value. HP Vertica
provides standard built-in Aggregate Functions such as AVG, MAX, and MIN. User Defined
Aggregate Functions (UDAF) allow you to create custom aggregate functions specific to your
needs.
UDAF's perform operations on a set of rows and reads one input argument and returns one output
column.
Example code for User Defined Aggregates is available in
/opt/vertica/sdk/examples/AggregateFunctions. See Setting up a UDF
Development Environment (page 433) for details on setting up your environment and The HP
Vertica SDK (page 433) for details on building the examples.
User Defined Aggregate Function Requirements
User Defined Aggregates work similarly to the built in HP Vertica aggregate functions.
User Defined Aggregates:








Support a single input column (or set) of values.
Provide a single output column.
Supports automatic RLE decompression. RLE input is decompressed before it is sent to a
User Defined Aggregate.
Can be used with the GROUP BY and HAVING clauses. Only columns appearing in the
GROUP BY clause can be selected.
Correlated subquery with User Defined Aggregates is not supported.
Cannot be used in a query containing multiple distinct User Defined Aggregate functions.
Cannot be used in a query containing a single User Defined Aggregate and one or more
non-distinct User Defined Aggregate functions.
Must not allow an exception to be passed back to HP Vertica. Doing so could result in a
memory leak, because any memory allocated by the exception is not reclaimed. It is a best
practice to use a top-level try-catch block to catch any stray exceptions that may be thrown by
your code or any functions or libraries your code calls.
-459-
Programmer's Guide
UDAF Class Overview
You create your UDAF by subclassing two classes defined by the Vertica SDK:


Vertica::AggregateFunctionFactory
Vertica::AggregateFunction
The AggregateFunctionFactory Class
The AggregateFunctionFactory class specifies metadata information such as the argument and
return types of your aggregate function. It provides these methods for you to customize:

getPrototype() - Defines the number of parameters and data types accepted by the function.
There is a single parameter for aggregate functions.
 getIntermediateTypes() - Defines the intermediate variable(s) used by the function.
 getParameterType() - Defines the names and types of parameters that this function uses
(optional).
 getReturnType() - Defines the type of the output column.
HP Vertica uses this data when you call the CREATE AGGREGATE FUNCTION SQL statement
to add the function to the database catalog.
The AggregateFunctionFactory returns an AggregateFunction instance that HP Vertica can call to
process data.
The AggregateFunction Class
The AggregateFunction class provides these methods for you to customize:

initAggregate() - Initializes the class, defines variables, and sets the starting value for the
variables. This function must be idempotent.
 aggregate() - The main aggregation operation.
 combine() - If multiple instances of aggregate is run, then combine is called to combine all the
sub-aggregations into a final aggregation. Although this method may not be called, it must be
defined.
 terminate() - Terminates the function and returns the result as a column.
Important! The aggregate() function may be not operate on the complete input set all at once.
Depending on how the data is stored, multiple instances of aggregate may run on the data set.
For this reason, initAggregate() must be idempotent.
The AggregateFunction class also provides optional methods that you can implement to allocate
and free resources: setup and destroy. You should use these class functions to allocate and
deallocate resources that you do not allocate through the UDAF API (see Allocating Resources
(page 508) for details).
-460-
Developing and Using User Defined Functions
The ServerInterface Class
All of the class functions that you will define in your UDAF receive an instance of the
ServerInterface class as a parameter. This object is used by the underlying HP Vertica SDK code
to make calls back into the Vertica process. For example, the macro you use to instantiate a
member of your AggregateFunction subclass (vt_createFuncObj) needs a pointer to a class
function on this object to able able to ask HP Vertica to allocate the memory for the new object.
You generally will not interact with this object directly, but instead pass it along to HP Vertica SDK
function and macro calls.
Subclassing Aggregate Function
Example subclass of AggregateFunction
The following shows a subclass of AggregateFunction named ag_max that returns the highest
value from a column of numbers.
The initAggregate function in this example gets the first argument to the function as a vfloat and
sets the initial value to zero. Note that the example, as written for simplicity, does not take into
account negative input values. Any negative values are returned as 0.
virtual void initAggregate(ServerInterface &srvInterface,
IntermediateAggs &aggs)
{
vfloat &max = aggs.getFloatRef(0);
max = 0;
}
The aggregate function compares the current maximum with the maximum value it has seen so
far, and if the new value is higher, then the new value becomes the highest value.
void aggregate(ServerInterface &srvInterface,
BlockReader &arg_reader,
IntermediateAggs &aggs)
{
vfloat &max = aggs.getFloatRef(0);
do {
const vfloat &input = arg_reader.getFloatRef(0);
// if input is bigger than the current max, make the max = input
if (input > max) {
max = input;
}
} while (arg_reader.next());
}
The combine function takes as its input the intermediate results of other invocations of
aggregate method(s) and determines the overall max value from the complete input set.
-461-
Programmer's Guide
virtual void combine(ServerInterface &srvInterface,
IntermediateAggs &aggs,
MultipleIntermediateAggs &aggs_other)
{
vfloat
&myMax
= aggs.getFloatRef(0);
// Combine all the other intermediate aggregates
do {
const vfloat &otherMax
= aggs_other.getFloatRef(0);
// if input is bigger than the current max, make the max = input
if (otherMax > myMax) {
myMax = otherMax;
}
} while (aggs_other.next());
}
The terminate function is called when all input has been evaluated by the aggregate method. It
returns the max value to the result writer.
virtual void terminate(ServerInterface &srvInterface,
BlockWriter &res_writer,
IntermediateAggs &aggs)
{
// Metadata about the type (to allow creation)
const vfloat max = aggs.getFloatRef(0);
res_writer.setFloat(max);
}
Example Code:
-462-
Developing and Using User Defined Functions
class Max : public AggregateFunction
{
virtual void initAggregate(ServerInterface &srvInterface,
IntermediateAggs &aggs)
{
vfloat &max = aggs.getFloatRef(0);
max = 0;
}
void aggregate(ServerInterface &srvInterface,
BlockReader &arg_reader,
IntermediateAggs &aggs)
{
vfloat &max = aggs.getFloatRef(0);
do {
const vfloat &input = arg_reader.getFloatRef(0);
// if input is bigger than the current max, make the max = input
if (input > max) {
max = input;
}
} while (arg_reader.next());
}
virtual void combine(ServerInterface &srvInterface,
IntermediateAggs &aggs,
MultipleIntermediateAggs &aggs_other)
{
vfloat
&myMax
= aggs.getFloatRef(0);
// Combine all the other intermediate aggregates
do {
const vfloat &otherMax
= aggs_other.getFloatRef(0);
// if input is bigger than the current max, make the max = input
if (otherMax > myMax) {
myMax = otherMax;
}
} while (aggs_other.next());
}
virtual void terminate(ServerInterface &srvInterface,
BlockWriter &res_writer,
IntermediateAggs &aggs)
{
// Metadata about the type (to allow creation)
const vfloat max = aggs.getFloatRef(0);
res_writer.setFloat(max);
}
InlineAggregate()
};
-463-
Programmer's Guide
InlineAggregate() is called at the end of the class. You do not need to implement this function, but
it automatically optimizes aggregates and speeds up calculations when called on your code.
Subclassing AggregateFunctionFactory
Example subclass of AggregateFunctionFactory
The following shows a subclass of AggregateFunctionFactory named ag_maxFactory.
The getPrototype method allows you to define the variables that are sent to your aggregate
function and returned to HP Vertica after your aggregate function runs. The example below
accepts and returns a float from/to HP Vertica:
virtual void getPrototype(ServerInterface &srvfloaterface, ColumnTypes
&argTypes, ColumnTypes &returnType)
{
argTypes.addFloat();
returnType.addFloat();
}
The getIntermediateTypes method defines any intermediate variables that you use in your
aggregate function. Intermediate variables are used to pass data between multiple invocations of
an aggregate function ("max" in the example below) to combine results until a final result can be
computed:
virtual void getIntermediateTypes(ServerInterface &srvInterface, const
SizedColumnTypes &input_types, SizedColumnTypes &intermediateTypeMetaData)
{
intermediateTypeMetaData.addFloat("max");
}
The getParameterType method defines the name and types of parameters that the aggregate
function uses:
virtual void getParameterType(ServerInterface &srvInterface,
SizedColumnTypes &parameter_types)
{
parameter_types.addFloat();
}
The getReturnType method defines and assigns a value to the variable that is sent back to HP
Vertica when the aggregate function completes:
-464-
Developing and Using User Defined Functions
virtual void getReturnType(ServerInterface &srvfloaterface,
const SizedColumnTypes &input_types,
SizedColumnTypes &output_types)
{
output_types.addFloat("max");
}
Example Code:
class MaxFactory : public AggregateFunctionFactory
{
virtual void getPrototype(ServerInterface &srvfloaterface, ColumnTypes
&argTypes, ColumnTypes &returnType)
{
argTypes.addFloat();
returnType.addFloat();
}
virtual void getIntermediateTypes(ServerInterface &srvInterface, const
SizedColumnTypes &input_types, SizedColumnTypes &intermediateTypeMetaData)
{
intermediateTypeMetaData.addFloat("max");
}
virtual void getReturnType(ServerInterface &srvfloaterface,
const SizedColumnTypes &input_types,
SizedColumnTypes &output_types)
{
output_types.addFloat("max");
}
virtual AggregateFunction *createAggregateFunction(ServerInterface
&srvfloaterface)
{ return vt_createFuncObj(srvfloaterface.allocator, Max); }
};
User Defined Aggregate - Complete Example
This is a complete working example of a User Defined Aggregate:
-465-
Programmer's Guide
#include "Vertica.h"
#include <sstream>
#include <iostream>
using namespace Vertica;
using namespace std;
/****
* Example implementation of Aggregate "Max" Function
***/
class Max : public AggregateFunction
{
virtual void initAggregate(ServerInterface &srvInterface,
IntermediateAggs &aggs)
{
vfloat &max = aggs.getFloatRef(0);
max = 0;
}
void aggregate(ServerInterface &srvInterface,
BlockReader &arg_reader,
IntermediateAggs &aggs)
{
vfloat &max = aggs.getFloatRef(0);
do {
const vfloat &input = arg_reader.getFloatRef(0);
// if input is bigger than the current max, make the max = input
if (input > max) {
max = input;
}
} while (arg_reader.next());
}
virtual void combine(ServerInterface &srvInterface,
IntermediateAggs &aggs,
MultipleIntermediateAggs &aggs_other)
{
vfloat
&myMax
= aggs.getFloatRef(0);
// Combine all the other intermediate aggregates
do {
const vfloat &otherMax
= aggs_other.getFloatRef(0);
// if input is bigger than the current max, make the max = input
if (otherMax > myMax) {
myMax = otherMax;
}
} while (aggs_other.next());
}
virtual void terminate(ServerInterface &srvInterface,
BlockWriter &res_writer,
-466-
Developing and Using User Defined Functions
IntermediateAggs &aggs)
{
// Metadata about the type (to allow creation)
const vfloat max = aggs.getFloatRef(0);
res_writer.setFloat(max);
}
InlineAggregate()
};
class MaxFactory : public AggregateFunctionFactory
{
virtual void getIntermediateTypes(ServerInterface &srvInterface, const
SizedColumnTypes &input_types, SizedColumnTypes &intermediateTypeMetaData)
{
intermediateTypeMetaData.addFloat("max");
}
virtual void getPrototype(ServerInterface &srvfloaterface, ColumnTypes
&argTypes, ColumnTypes &returnType)
{
argTypes.addFloat();
returnType.addFloat();
}
virtual void getReturnType(ServerInterface &srvfloaterface,
const SizedColumnTypes &input_types,
SizedColumnTypes &output_types)
{
output_types.addFloat("max");
}
virtual AggregateFunction *createAggregateFunction(ServerInterface
&srvfloaterface)
{ return vt_createFuncObj(srvfloaterface.allocator, Max); }
};
RegisterFactory(MaxFactory);
Example usage:
dbadmin=> CREATE LIBRARY AggregateFunctions AS
'/opt/vertica/sdk/examples/build/AggregateFunctions.so';
CREATE LIBRARY
dbadmin=> create aggregate function ag_max as LANGUAGE 'C++' name 'MaxFactory'
library AggregateFunctions;
CREATE AGGREGATE FUNCTION
-467-
Programmer's Guide
dbadmin=> select * from example;_ i
rows)
_----_
1_
1_
8_
2_
3_
0_
5_ 13_ 21_(9
dbadmin=> select ag_max(i) from example;
ag_max
-------21
(1 row)
Developing a User Defined Analytic Function
User Defined Analytic Functions (UDAnFs) are User Defined Functions that are used for analytics.
See Using SQL Analytics (page 329) for an overview of HP Vertica's built-in analytics. Like User
Defined Scalar Functions (UDSFs), UDAnFs must output a single value for each row of data read.
Unlike UDSFs, the UDAnF's input reader and output reader can be advanced independently. This
feature lets you create UDAnF's where the output value is calculated over multiple rows of data.
By advancing the reader and writer independently, you can create functions similar to the built-in
analytic functions such as LAG, which uses data from prior rows to output a value for the current
row.
Note: Analytic functions are only supported in C++. They cannot be developed in R.
User Defined Analytic Function Requirements
User Defined Analytic Functions (UDAnFs) must meet several requirements:




They must produce a single output value for each row in the partition of data they are given to
process.
They must have 32 or less arguments.
Like all UDF's, they must not allow an exception to be passed back to HP Vertica. Doing so
could lead to issues such as memory leaks (caused by the memory allocated by the exception
never being freed). Your UDAnF should always contain a top-level try-catch block to catch any
stray exceptions caused by your code or libraries your code calls.
If they allocate resources on their own, they must properly free it. Even a single byte of
allocated memory that is not freed can become an issue in a UDAnF that is called over millions
of rows. This is especially true if your UDAnF runs in unfenced mode (directly within the HP
Vertica process) since it could destabilize the database. Instead of directly allocating memory,
your function should use the memory allocation macros in the HP Vertica SDK. See
Allocating Resources (page 508) for details.
UDAnF Class Overview
To create a UDAnF, you need to subclass two classes defined by the HP Vertica SDK:
-468-
Developing and Using User Defined Functions


Vertica::AnalyticFunction which carries out the processing you want your function to
perform.
Vertica::AnalyticFunctionFactory which defines the input and output data types of
your function. It instantiates your AnalyticFunction subclass when needed.
The AnalyticFunction class is where you define the processPartition function that
performs the analytic processing you want your function to perform. When a user calls your
UDAnF function in a SQL SELECT statement, HP Vertica breaks the data into partitions as
specified in the OVER clause of the query, and calls the processPartition function to process
them. If the query does not a partition by column, the
The input and output of the processPartition function are supplied by objects of the
Vertica::AnalyticPartitionReader and Vertica::AnalyticPartitionWriter
class. They define functions that you use to read the input data and write the output data for your
UDAnF.
In addition to processPartition, the AnalyticFunction class defines two optional class
functions that you can implement to allocate and free resources: setup and destroy. You use
these class functions to allocate and deallocate resources that you do not allocate through the
UDF API (see Allocating Resources (page 508) for details).
The ServerInterface Class
All of the class functions that you define in your UDAnF receive a ServerInterface object as a
parameter. It is used by the underlying HP Vertica SDK code to make calls back into the HP
Vertica process. For example, the macro you use to instantiate a member of your
AnalyticFunction subclass (vt_createFuncObj) needs a pointer to this object to able able
to ask HP Vertica to allocate the memory for the new object. You generally will not interact with this
object directly, but instead pass it along to HP Vertica SDK function and macro calls.
Subclassing AnalyticFunction
Your subclass of Vertica::AnalyticFunction is where your User Defined Analytic Function
(UDAnF) performs its processing. The one function in the class that you must implement,
processPartition, reads a partition of data, performs some sort of processing, and outputs
single value for each input row.
For example, a UDAnF that ranks rows based on how they are ordered has a
processPartition function that reads each row, determines how to rank the row, and outputs
the rank value. An example of running a rank function, named an_rank is:
=> SELECT * FROM hits;
site
|
date
| num_hits
-----------------+------------+---------www.example.com | 2012-01-02 |
97
www.vertica.com | 2012-01-01 |
343435
www.example.com | 2012-01-01 |
123
www.example.com | 2012-01-04 |
112
www.vertica.com | 2012-01-02 |
503695
www.vertica.com | 2012-01-03 |
490387
www.example.com | 2012-01-03 |
123
-469-
Programmer's Guide
(7 rows)
=> SELECT site,date,num_hits,an_rank() OVER (PARTITION BY site ORDER BY num_hits
DESC) AS an_rank FROM hits;
site
|
date
| num_hits | an_rank
-----------------+------------+----------+--------www.example.com | 2012-01-03 |
123 |
1
www.example.com | 2012-01-01 |
123 |
1
www.example.com | 2012-01-04 |
112 |
3
www.example.com | 2012-01-02 |
97 |
4
www.vertica.com | 2012-01-02 |
503695 |
1
www.vertica.com | 2012-01-03 |
490387 |
2
www.vertica.com | 2012-01-01 |
343435 |
3
(7 rows)
As with the built-in RANK analytic function, rows that have the same value for the ORDER BY
column (num_hits in this example) have the same rank, but the rank continues to increase, so that
next row that has a different ORDER BY key gets a rank value based on the number of rows that
preceded it.
HP Vertica calls the processPartition() function once for each partition of data. It supplies
the partition using an AnalyticPartitionReader object from which your function reads its
input data. In addition, there is a unique function on this object named isNewOrderByKey, which
returns a Boolean value indicating whether your function has seen a row with the same ORDER
BY key (or keys). This function is very useful for analytic functions (such as the example RANK
function) which need to to handle rows with identical ORDER BY keys differently than rows with
different ORDER BY keys.
Note: You can specify multiple ORDER BY columns in the SQL query you use to call your
UDAnF. The isNewOrderByKey function returns true if any of the ORDER BY keys are
different than the previous row.
Once your function has finished processing the row of data, you advance it to the next row of input
by calling the AnalyticPartitionReader::next function.
Your function writes its output value using a AnalyticPartitionWriter object that HP Vertica
supplies as a parameter to the processPartition function. This object has data-type-specific
functions to write the output value (such as setInt). After setting the output value, call the
AnalyticPartitionWriter::next function to advance to the next row in the output.
Note: You must be sure that your function produces a row of output for each row of input in the
partition. You must also not output more rows than are in the partition, otherwise the zygote
size process (if running in fenced mode) or HP Vertica itself could generate an out of bounds
error.
The following example code defines a AnalyticFunction subclass named Rank, which
implements the ranking function demonstrated earlier. It is based on example code distributed in
the examples directory of the SDK.
-470-
Developing and Using User Defined Functions
/**
* User defined analytic function: Rank - works mostly the same as SQL-99 rank
* with the ability to define as many order by columns as desired
*
*/
class Rank : public AnalyticFunction
{
virtual void processPartition(ServerInterface &srvInterface,
AnalyticPartitionReader &inputReader,
AnalyticPartitionWriter &outputWriter)
{
// Always use a top-level try-catch block to prevent exceptions from
// leaking back to Vertica or the fenced-mode side process.
try {
rank = 1; // The rank to assign a row
rowCount = 0; // Number of rows processed so far
do {
rowCount++;
// Do we have a new order by row?
if (inputReader.isNewOrderByKey()) {
// Yes, so set rank to the total number of rows that have been
// processed. Otherwise, the rank remains the same value as
// the previous iteration.
rank = rowCount;
}
// Write the rank
outputWriter.setInt(0, rank);
// Move to the next row of the output
outputWriter.next();
} while (inputReader.next()); // Loop until no more input
} catch(exception& e) {
// Standard exception. Quit.
vt_report_error(0, "Exception while processing partition: %s",
e.what());
}
}
private:
vint rank, rowCount;
};
In this example, the processPartition function does not actually read any of the data from the
input row, it just advances through the rows. It does not need to read data since just needs to
count the number of rows that have been read and determine whether those rows have the same
ORDER BY key as the previous row. If the current row is a new ORDER BY key, then the rank is
set to the total number of rows that have been processed. If the current row has the same ORDER
BY value as the previous row, then the rank remains the same.
Note that the function has a top-level try-catch block. All of your UDF functions should always have
one to prevent stray exceptions from being passed back to HP Vertica (if you run the function
unfenced) or the side process.
-471-
Programmer's Guide
Subclassing AnalyticFunctionFactory
Your subclass of the AnalyticFunctionFactory class provides the following metadata about
your UDAnF to HP Vertica:


The number and data types of your function's input arguments
Your function's output data type (and its width or precision, if it returns a variable-width data
type such as VARCHAR or a data type that has settable precision such as TIMESTAMP)
 The AnalyticFunction subclass that implements your function. HP Vertica calls your
factory class to instantiate members of this class when it needs to execute your UDAnF.
There are three required functions that your AnalyticFunctionFactory subclass must
implement:



getPrototype describes the input parameters and output value of your function. You set
these values by calling functions on two ColumnTypes objects that provided to the function
via parameters.
createAnalyticFunction supplies an instance of your AnalyticFunction factory that
HP Vertica can call to process a UDAnF function call.
getReturnType provides details about your function's output. This function is where you set
the width of the output value if your function returns a variable-width value (such as
VARCHAR) or the precision of the output value if it has a settable precision (such as
TIMESTAMP).
The following example code defines the AnalyticFunctionFactory that corresponds with the
example Rank class shown in Subclassing AnalyticFunction (page 469).
-472-
Developing and Using User Defined Functions
class RankFactory : public AnalyticFunctionFactory
{
virtual void getPrototype(ServerInterface &srvInterface,
ColumnTypes &argTypes, ColumnTypes &returnType)
{
returnType.addInt();
}
virtual void getReturnType(ServerInterface &srvInterface,
const SizedColumnTypes &inputTypes,
SizedColumnTypes &outputTypes)
{
outputTypes.addInt();
}
virtual AnalyticFunction *createAnalyticFunction(ServerInterface
&srvInterface)
{ return vt_createFuncObj(srvInterface.allocator, Rank); }
};
The first function defined by the RankFactory subclass, getPrototype, sets the data type of
the return value. Since the Rank UDAnF does not read input, it does not define any arguments by
calling functions on the ColumnTypes object passed in the argTypes parameter.
The next function is getReturnType. If your function returns a data type that needs to define a
width or precision, your implementation of the getReturnType function calls a function on the
SizedColumnType object passed in as a parameter to tell HP Vertica the width or precision. See
the SDK entry for SizedColumnTypes for a list of these functions. Rank returns a fixed-width
data type (an INTEGER) so it does not need to set the precision or width of its output; it just calls
addInt to report its output data type.
Finally, RankFactory defines the createAnalyticFunction function that returns an instance
of the AnalyticFunction class that HP Vertica can call. This code is mostly boilerplate. All you
need to do is add the name of your analytic function class in the call to vt_createFuncObj()
which takes care of allocating the object for you.
Deploying and Using User Defined Analytic Functions
To deploy a UDAnF on your HP Vertica database:
1 Copy the UDF shared library file (.so) that contains your function to a node on your HP Vertica
cluster.
2 Connect to the node where you copied the library (for example, using vsql).
3 Use the CREATE LIBRARY statement to load the UDF library into HP Vertica. You pass this
statement the location of the UDF library file you copied to the node earlier. HP Vertica
distributes the library to each node in the cluster.
-473-
Programmer's Guide
4 Use the CREATE ANALYTIC FUNCTION statement to add the function to the HP Vertica
catalog. This maps a SQL function name to the name of the UDF's factory class. If you are not
sure of the name of the UDF's factory class, you can list all of the UDFs in the library (see
Listing the UDFs Contained in a Library (page 542) for details).
The following example demonstrates loading and using the Rank UDAnF that is included in the
SDK examples directory. It assumes that the AnalyticFunctions.so library that contains the
function has been copied to the dbadmin user's home directory on the initiator node.
=> CREATE LIBRARY AnalyticFunctions AS '/home/dbadmin/AnalyticFunctions.so';
CREATE LIBRARY
=> CREATE ANALYTIC FUNCTION an_rank AS LANGUAGE 'C++'
-> NAME 'RankFactory' LIBRARY AnalyticFunctions;
CREATE ANALYTIC FUNCTION
=> SELECT * FROM hits;
site
|
date
| num_hits
-----------------+------------+---------www.example.com | 2012-01-02 |
97
www.vertica.com | 2012-01-01 |
343435
www.example.com | 2012-01-01 |
123
www.example.com | 2012-01-04 |
112
www.vertica.com | 2012-01-02 |
503695
www.vertica.com | 2012-01-03 |
490387
www.example.com | 2012-01-03 |
123
(7 rows)
=> SELECT site,date,num_hits,an_rank() over (partition by site order by num_hits
desc) from hits;
site
|
date
| num_hits | ?column?
-----------------+------------+----------+---------www.example.com | 2012-01-03 |
123 |
1
www.example.com | 2012-01-01 |
123 |
1
www.example.com | 2012-01-04 |
112 |
3
www.example.com | 2012-01-02 |
97 |
4
www.vertica.com | 2012-01-02 |
503695 |
1
www.vertica.com | 2012-01-03 |
490387 |
2
www.vertica.com | 2012-01-01 |
343435 |
3
(7 rows)
Notes
 UDAnFs do not support framing windows using ROWS (page 338).
 As with HP Vertica's built-in analytic functions, UDAnFs cannot be used with pattern matching
functions.
See Also
CREATE LIBRARY and CREATE ANALYTIC FUNCTION in the SQL Reference Manual.
-474-
Developing and Using User Defined Functions
Developing User Defined Load (UDL) Functions
The User Defined Load (UDL) feature allows you to create custom routines to load your data into
HP Vertica. You create custom libraries using the HP Vertica SDK to handle various steps in the
loading process. Loading is used by the COPY statement and is comprised of three stages, all of
which are optional:



User Defined Source: Obtaining the source of the data. For example, over HTTP or through
cURL.
User Defined Filter(s): Filtering the data through one or more filters. For example, unzipping a
file or converting UTF-16 to UTF-8, or by doing both in sequence.
User Defined Parser: Parsing the data. For example, reading the filtered data and parsing it
into tuples in HP Vertica.
Deploying User Defined Load functions:
You use the CREATE LIBRARY statement to load your compiled UDL library into HP Vertica. For
each function, you must use the appropriate CREATE statement to load the function into HP
Vertica. There is a CREATE statement for each type of UDL:



CREATE SOURCE
CREATE FILTER
CREATE PARSER
IMPORTANT! Installing an untrusted UDL function can compromise the security of the server.
UDx's can contain arbitrary code. In particular, UD Source functions can read data from any
arbitrary location. It is up to the developer of the function to enforce proper security limitations.
Superusers must not grant access to UDx's to untrusted users.
Conversely, you remove UDL libraries with DROP LIBRARY and remove UDL functions with the
following:



DROP SOURCE
DROP FILTER
DROP PARSER
Note: You cannot ALTER UDL functions.
UDL Requirements
User Defined Load Functions:






Are used by the COPY statement.
Are written in C++ using the HP Vertica SDK and compiled into a shared library.
Can have up to one source process. This single process can obtain data from multiple
sources.
Can have zero or more Filters.
Can have up to one Parser process.
Cannot run in fenced mode.
-475-
Programmer's Guide




Like all UDF's, they must not allow an exception to be passed back to HP Vertica. Doing so
could lead to issues such as memory leaks (caused by the memory allocated by the exception
never being freed). Your UDL should always contain a top-level try-catch block to catch any
stray exceptions caused by your code or libraries your code calls.
If your UDL functions allocate resources on their own, then they must properly free it. Even a
single byte of allocated memory that is not freed can become an issue in a UDL that is called
over millions of rows. Instead of directly allocating memory, your function should use the
memory allocation macros in the HP Vertica SDK. See Allocating Resources (page 508) for
details.
The header files that define the majority of classes and methods are VerticaUDx.h and
VerticaUDl.h. These header files, along with the main Vertica.h header file, are
available in /opt/vertica/sdk/include.
The SDK documentation is available in the SDK itself at /opt/vertica/sdk/doc and online: HP
Vertica SDK documentation.
UDL Source
Developing Source Functions for User Defined Load
UDL Source functions allow you to process a source of data using a method that is not built into
HP Vertica. For example, accessing the data from an HTTP source using cURL. Only a single
User Defined Source can be defined in a COPY statement, but that source function can pull data
from multiple sources.
A Source can optionally be used with UDFilters and a UDParser. The source can be obtained
using a UDSource function, passed through one or more UDFilters, and finally parsed by a
UDParser before being loaded.
You must implement a UDSource class and a SourceFactory class for sources.
The HP Vertica SDK provides example source functions in
/opt/vertica/sdk/examples/SourceFunctions.
Subclassing SourceFactory
About the Source Factory Class
The SourceFactory class performs initial validation and planning of the query and instantiates
objects to perform further initialization on each node once the query has been distributed.
SourceFactory Methods:
You implement the following methods in your SourceFactory class.
 plan() - The plan method is used to check parameters, populate the plan data, and assign the
work to one or more nodes.
-476-
Developing and Using User Defined Functions
When developing your plan() method you should check the parameters that have been passed
from the function call and provide a helpful error message if the arguments do not conform, but
it is not required. You can also optionally populate the NodeSpecifyingPlanContext object with
any information that must be passed to the other nodes doing the work from the initiator node.
Finally, you must specify which nodes the source(s) are obtained from. You can split up the
work so that one or multiple specific nodes load data, or specify that any node load the data.
 prepareUDSources() - This method directly instantiates all provided sources and returns a
vector of the sources.
 getParameterType() - The getParameterType() method allows you to define the name and
types of parameters that the function uses. HP Vertica uses this information to warn function
callers that certain parameters that they provide have no effect, or that certain parameters are
not being set and are reverting to default values. You should, as a best practice, define the
types and parameters for your function, but using this method is optional.
The name of the factory class is the value used for the NAME modifier in the in the CREATE
SOURCE statement.
After creating your SourceFactory, you must register it with RegisterFactory();
Example SourceFactory
The following example is provided is part of:
/opt/vertica/sdk/examples/SourceFunctions/cURL.cpp. It defines the factory
function for the curl source function.
-477-
Programmer's Guide
class CurlSourceFactory : public SourceFactory {
public:
virtual void plan(ServerInterface &srvInterface,
NodeSpecifyingPlanContext &planCtxt) {
std::vector<std::string> args =
srvInterface.getParamReader().getParamNames();
/* Check parameters */
if (args.size() != 1 || find(args.begin(), args.end(), "url") ==
args.end()) {
vt_report_error(0, "You must provide a single URL.");
}
/* Populate planData */
planCtxt.getWriter().getStringRef("url").copy(srvInterface.getParamReader().ge
tStringRef("url"));
/* Assign Nodes */
std::vector<std::string> executionNodes = planCtxt.getClusterNodes();
while (executionNodes.size() > 1) executionNodes.pop_back(); // Only run
on the first node in the list.
planCtxt.setTargetNodes(executionNodes);
}
virtual std::vector<UDSource*> prepareUDSources(ServerInterface
&srvInterface,
NodeSpecifyingPlanContext &planCtxt) {
std::vector<UDSource*> retVal;
retVal.push_back(vt_createFuncObj(srvInterface.allocator, CurlSource,
planCtxt.getReader().getStringRef("url").str()));
return retVal;
}
virtual void getParameterType(ServerInterface &srvInterface,
SizedColumnTypes &parameterTypes) {
parameterTypes.addVarchar(65000, "url");
}
};
RegisterFactory(CurlSourceFactory);
-478-
Developing and Using User Defined Functions
Subclassing UDSource
About the UDSource Class
The UDSource class is responsible for acquiring the data from an external source and producing
that data in a streaming manner. A wrapper is also provided for UDSource called
ContinuousUDSource. ContinuousUDSource provides an abstraction that allows you to treat the
input data as a continuous stream of data. This allows you to write the data from the source "at
will" instead of having to create an iterator to use the base UDSource method.
ContinuousUDSource is available in
/opt/vertica/sdk/examples/HelperLibraries/ContinuousUDSource.h.
UDSource Methods:
 setup() - Invoked before the first time that process() is called. Use this method to do things
such as open file handles.
 destroy() - Invoked after the last time that process() is called. Use this method to do things
such as close file handles.
 process() - Invoked repeatedly until it returns DONE or the query is canceled by the function
caller. On each invocation, process() acquires more data and writes the data to the DataBuffer
specified by 'output'.
Returns OUTPUT_NEEDED if this source has more data to produce or DONE if it has no more
data to produce.
 getSize() - Returns the estimates number of bytes that process() will return. This value is an
estimate only and is used to indicate the file size in the LOAD_STREAMS table. getSize() can
be called before setup is called. See the SDK documentation for additional important details
about the getSize() method.
ContinuousUDSource Functions:
The ContinuousUDSource wrapper allows you to write and process the data "at will" instead of
having to iterate through the data. An example of using ContinuousUDSource is provided in
/opt/vertica/sdk/examples/SourceFunctions/MultiFileCurlSource.cpp.



initialize() - Invoked before run(). You can optionally override this function to perform setup and
initialization.
run() - Processes the data. Use write() on the ContinuousWriter to write the data from the
source.
deinitialize() - Invoked after run() has returned. You can optionally override this function to
perform tear-down and destruction.
Functions that are already implemented that you use in your code:


yield() - use to yield control back to the server during idle or busy loops so the server can chec k
for status changes or query cancelations.
cw - A ContinuousWriter which is defined in
/opt/vertica/sdk/examples/HelperLibraries/CoroutineHelpers.h. Used to
write the data to the output data buffer.
-479-
Programmer's Guide
Example UDSource
The following example loads the source with the url_fread method in the helper library available in
/opt/vertica/sdk/examples/HelperLibraries/. It allows you to use cURL to open and
read in a file over HTTP. The data is loaded in chunks. If End Of File is received then the process()
method returns DONE, otherwise it returns OUTPUT_NEEDED and process() processes another
chunk of data. The functions included in the helper library (url_fread(), url_fopen, etc.) are
based on examples that come with the libcurl library. For example, see
http://curl.haxx.se/libcurl/c/fopen.html http://curl.haxx.se/libcurl/c/fopen.html.
For setup, the handle to the file is opened, again using a function from the help library.
For destroy, the handle to the file is closed using a function from the helper library.
class CurlSource : public UDSource {
private:
URL_FILE *handle;
std::string url;
virtual StreamState process(ServerInterface &srvInterface, DataBuffer
&output) {
output.offset = url_fread(output.buf, 1, output.size, handle);
return url_feof(handle) ? DONE : OUTPUT_NEEDED;
}
public:
CurlSource(std::string url) : url(url) {}
void setup(ServerInterface &srvInterface) {
handle = url_fopen(url.c_str(),"r");
}
void destroy(ServerInterface &srvInterface) {
url_fclose(handle);
}
};
UDL Filter
Developing Source Functions for User Defined Load
UDL Filter functions allow you to manipulate data obtained from a source in various ways. For
example, you could process a compressed file in a compression format not natively supported by
vertica, or take UTF-16 encoded data and transcode it to UTF-8 encoding, or even perform search
and replace operations on data before it is loaded into HP Vertica.
-480-
Developing and Using User Defined Functions
You can also pass data through multiple filters before it is loaded into HP Vertica. For instance,
you could unzip a file compressed with 7Zip, convert the content from UTF-16 to UTF-8, and finally
search and replace various text strings before loading the data.
Filters can optionally be used with UDSources and UDParsers. The source can be obtained using
a UDSource function, passed through one or more UDFilters, and finally parsed by a UDParser
before being loaded.
You must implement a UDFilter class and a FilterFactory class for your filter.
The HP Vertica SDK provides example filter functions in
/opt/vertica/sdk/examples/FilterFunctions.
Subclassing FilterFactory
About the Filter Factory Class:
The Filter Factory class performs initial validation and planning of the query and instantiates
objects to perform further initialization on each node once the query has been distributed.
FilterFactory Methods:
You implement the following methods in your FilterFactory class:

plan() - Like the UDSource and UDParser plan() methods, the UDFilter plan() method is used
to check parameters and populate the plan data. However, you cannot specify the nodes on
which the work is done. HP Vertica automatically selects the best nodes to complete the work
based on available resources.
When developing your plan() method you should check the parameters that have been passed
from the function call and provide a helpful error message if the arguments do not conform, but
it is not required. You can also optionally populate the NodeSpecifyingPlanContext object with
any information that must be passed to the other nodes doing the work from the initiator node.
Finally, you must specify which nodes the source(s) are obtained from. You can split up the
work so that one or multiple specific nodes load data, or specify that any node load the data.
 prepare() - This method is called on each node prior to Load operator execution. It creates
the function object using the vt_createFuncObj method.
 getParameterType() - The getParameterType() method allows you to define the name and
types of parameters that the function uses. HP Vertica uses this information to warn function
callers that certain parameters that they provide have no effect, or that certain parameters are
not being set and are reverting to default values. You should, as a best practice, define the
types and parameters for your function, but using this method is optional.
The name of the factory class is the value used for the NAME modifier in the in the CREATE
FILTER statement.
After creating your FilterFactory, you must register it with RegisterFactory();
-481-
Programmer's Guide
Example FilterFactory
The following example is provided as part of
/opt/vertica/sdk/examples/SourceFunctions/Iconverter.cpp. It defines the
factory class for the IConverter filter function.
-482-
Developing and Using User Defined Functions
class IconverterFactory : public FilterFactory
{
public:
virtual void plan(ServerInterface &srvInterface,
PlanContext &planCtxt) {
std::vector<std::string> args =
srvInterface.getParamReader().getParamNames();
/* Check parameters */
if (!(args.size() == 0 ||
(args.size() == 1 && find(args.begin(), args.end(),
"from_encoding") != args.end()) ||
(args.size() == 2
&& find(args.begin(), args.end(), "from_encoding") !=
args.end()
&& find(args.begin(), args.end(), "to_encoding") !=
args.end()))) {
vt_report_error(0, "Invalid arguments. Must specify either no
arguments, or 'from_encoding' alone, or 'from_encoding' and 'to_encoding'.");
}
/* Populate planData */
// By default, we do UTF16->UTF8, and x->UTF8
VString from_encoding =
planCtxt.getWriter().getStringRef("from_encoding");
VString to_encoding = planCtxt.getWriter().getStringRef("to_encoding");
from_encoding.copy("UTF-16");
to_encoding.copy("UTF-8");
if (args.size() == 2)
{
from_encoding.copy(srvInterface.getParamReader().getStringRef("from_encoding")
);
to_encoding.copy(srvInterface.getParamReader().getStringRef("to_encoding"));
}
else if (args.size() == 1)
{
from_encoding.copy(srvInterface.getParamReader().getStringRef("from_encoding")
);
}
if (!from_encoding.length()) {
vt_report_error(0, "The empty string is not a valid from_encoding
value");
}
if (!to_encoding.length()) {
vt_report_error(0, "The empty string is not a valid to_encoding
value");
}
-483-
Programmer's Guide
}
virtual UDFilter* prepare(ServerInterface &srvInterface,
PlanContext &planCtxt) {
return vt_createFuncObj(srvInterface.allocator, Iconverter,
planCtxt.getReader().getStringRef("from_encoding").str(),
planCtxt.getReader().getStringRef("to_encoding").str());
}
virtual void getParameterType(ServerInterface &srvInterface,
SizedColumnTypes &parameterTypes) {
parameterTypes.addVarchar(32, "from_encoding");
parameterTypes.addVarchar(32, "to_encoding");
}
};
RegisterFactory(IconverterFactory);
Subclassing UDFilter
About the UDFilter Class
The UDFilter class is responsible for reading raw input data from a source and preparing it to be
loaded into HP Vertica or processed by a parser. This preparation may involve decompression,
re-encoding, or any other sort of binary manipulation. A wrapper is also provided for UDFilter
called ContinuousUDFilter. ContinuousUDFilter provides an abstraction that allows you to treat
the input data as a continuous stream of data. This allows you to write the filtered data and
process it "at will" instead of having to create an iterator to use the base UDFilter method.
ContinuousUDFilter is available in
/opt/vertica/sdk/examples/HelperLibraries/ContinuousUDFilter.h.
UDFilter Methods:
 setup() - Invoked before the first time that process() is called.
Note: UDFilters must be restartable. If loading large numbers of files, a given UDFilter may be
re-used for multiple files. HP Vertica follows the worker-pool design pattern: At the start of
COPY execution, several Parsers and several Filters are instantiated per node by calling the
corresponding prepare() method multiple times. Each Filter/Parser pair is then internally
assigned to an initial Source (UDSource or internal). At that point, setup() is called; then
process() is called until it is finished; then destroy() is called. If there are still sources in the pool
waiting to be processed, then the UDFilter/UDSource pair will be given a second Source;
setup() will be called a second time, then process() until it is finished, then destroy(). This
repeats until all sources have been read.
 destroy() - Invoked after the last time that process() is called.
 process() - Invoked repeatedly until it returns DONE or the query is canceled by the function
caller. On each invocation, process() acquires more data and writes the data to the DataBuffer
specified by 'output'.
Returns:
 OUTPUT_NEEDED if this source has more data to produce.
 INPUT_NEEDED if it requires more data to continue working.
-484-
Developing and Using User Defined Functions


DONE if it has no more data to produce.
KEEP_GOING if it cannot proceed for an extended period of time. It will be called again. Do
not block indefinitely. If you do, then you prevent the user from canceling the query.
Process() must set `input.offset` to the number of bytes that were successfully read from the
`input` buffer, and that will not need to be re-consumed by a subsequent invocation of
process(). If 'input_state' == END_OF_FILE, then the last byte in 'input' is the last byte in the
input stream and returning INPUT_NEEDED does not result in any new input appearing.
process() should return DONE in this case as soon as this operator has finished producing all
output that it is going to produce.
process() must set `output.offset` to the number of bytes that were written to the `output`
buffer. This may not be larger than `output.size`. If it is set to 0, this indicates that process()
requires a larger output buffer.
ContinuousUDFilter Functions:
The ContinuousUDFilter wrapper allows you to write and process the data "at will" instead of
having to iterate through the data. An example of using ContinuousUDFilter is provided in
/opt/vertica/sdk/examples/FilterFunctions/SearchAndReplaceFilter.cpp.

initialize() - Invoked before run(). You can optionally override this function to perform setup and
initialization.
 run() - Processes the data. Use reserve() and seek(), or read() of the ContinuousReader to
read(), and reserve() on the ContinuousWriter and memcopy to write the data to the output
buffer.
 deinitialize() - Invoked after run() has returned. You can optionally override this function to
perform tear-down and destruction.
Functions that are already implemented that you use in your code:



yield() - use to yield control back to the server during idle or busy loops so the server can check
for status changes or query cancelations.
cr - A ContinuousReader which is defined in
/opt/vertica/sdk/examples/HelperLibraries/CoroutineHelpers.h. Used to
read from the data stream.
cw - A ContinuousWriter which is defined in
/opt/vertica/sdk/examples/HelperLibraries/CoroutineHelpers.h. Used to
write the filtered data to the output data buffer.
Example UDFilter
The following example shows how to convert encoding for a file from one type to another. The
example converts UTF-16 encoded data to UTF-8 encoded data. This example is available in the
SDK at /opt/vertica/sdk/examples/FilterFunctions/IConverter.cpp.
-485-
Programmer's Guide
class Iconverter : public UDFilter
{
private:
std::string fromEncoding, toEncoding;
iconv_t cd; // the conversion descriptor opened
uint converted; // how many characters have been converted
protected:
virtual StreamState process(ServerInterface &srvInterface, DataBuffer
&input, InputState input_state,
DataBuffer &output)
{
char *input_buf = (char *)input.buf + input.offset;
char *output_buf = (char *)output.buf + output.offset;
size_t inBytesLeft = input.size - input.offset, outBytesLeft = output.size
- output.offset;
// end of input
if (input_state == END_OF_FILE && inBytesLeft == 0)
{
// Gnu libc iconv doc says, it is good practice to finalize the
// outbuffer for stateful encodings (by calling with null inbuffer).
//
//
http://www.gnu.org/software/libc/manual/html_node/Generic-Conversion-Interface
.html
iconv(cd, NULL, NULL, &output_buf, &outBytesLeft);
// output buffer can be updated by this operation
output.offset = output.size - outBytesLeft;
return DONE;
}
size_t ret = iconv(cd, &input_buf, &inBytesLeft, &output_buf,
&outBytesLeft);
// if conversion is successful, we ask for more input, as input has not
reached EOF.
StreamState retStatus = INPUT_NEEDED;
if (ret == (size_t)(-1))
{
// seen an error
switch (errno)
{
case E2BIG:
// input size too big, not a problem, ask for more output.
retStatus = OUTPUT_NEEDED;
break;
case EINVAL:
// input stops in the middle of a byte sequence, not a problem,
ask for more input
retStatus = input_state == END_OF_FILE ? DONE : INPUT_NEEDED;
break;
case EILSEQ:
// invalid sequence seen, throw
-486-
Developing and Using User Defined Functions
// TODO: reporting
vt_report_error(1,
conversion", converted);
case EBADF:
// something wrong
vt_report_error(0,
default:
vt_report_error(0,
break;
}
}
else converted += ret;
the wrong byte position
"Invalid byte sequence when doing %u-th
with descriptor, throw
"Invalid descriptor");
"Uncommon Error");
// move position pointer
input.offset = input.size - inBytesLeft;
output.offset = output.size - outBytesLeft;
return retStatus;
}
public:
Iconverter(const std::string &from, const std::string &to)
: fromEncoding(from), toEncoding(to), converted(0)
{
// note "to encoding" is first argument to iconv...
cd = iconv_open(to.c_str(), from.c_str());
if (cd == (iconv_t)(-1))
{
// error when creating converters.
vt_report_error(0, "Error initializing iconv: %m");
}
}
~Iconverter()
{
// free iconv resources;
iconv_close(cd);
}
};
UDL Parser
Developing Parser Functions for User Defined Load
Parsers take a stream of bytes and pass a corresponding sequence of tuples to the HP Vertica
load process. UDL Parser functions can be used to parse data in formats not understood by the
HP Vertica built-in parser, or for data that require more specific control that the built-in parser
allows. For example, you could load a CSV file using a specific CSV library. Two CSV examples
are provided with the HP Vertica SDK.
-487-
Programmer's Guide
COPY supports a single UDL Parser that can be used in conjunction with a UDSource and zero or
more UDFilters.
You must implement a UDParser class and a ParserFactory class for your parser.
The HP Vertica SDK provides example parser functions in
/opt/vertica/sdk/examples/ParserFunctions. Included in the examples are two CSV
parsers; "Traditional CSV" and an RFC 4180 http://tools.ietf.org/html/rfc4180 CSV parser. The
traditional CSV parser uses the boost::tokenizer library to read the CSV output from common
programs such as Microsoft Excel. The RFC 4180 parser parses CSV files written to the RFC
4180 standard and uses libcsv.
Subclassing ParserFactory
About the ParserFactory Class:
The ParserFactory class performs initial validation and planning of the query and instantiates
objects to perform further initialization on each node once the query has been distributed.
ParserFactory Methods:
You implement the following methods in your ParserFactory Class:




plan() - Like the UDSource and UDFilter plan() methods, the UDParser plan() method is used
to check parameters and populate the plan data. However, you cannot specify the nodes on
which the work is done. HP Vertica automatically selects the best nodes to complete the work
based on available resources.
prepare() - This method is called on each node prior to Load operator execution. It creates
the function object using the vt_createFuncObj method.
getParserReturnType() - This method defines what the return types (and length/precision if
necessary) are for this UDX.
By default, HP Vertica uses the same output column types as the destination table. This
requires that the UDParser validate the expected output column types and emit appropriate
tuples. Users can use COPY expressions to perform typecasting and conversion if necessary.
Define the output types as follows:
 For CHAR/VARCHAR types, specify the max length.
 For NUMERIC types, specify the precision and scale.
 For Time/Timestamp types (with or without time zone), specify the precision, -1 means
unspecified.
 For IntervalYM/IntervalDS types, specify the precision and range.
 For all other types, no length/precision specification is required.
getParameterType() - The getParameterType() method allows you to define the name and
types of parameters that the function uses. HP Vertica uses this information to warn function
callers that certain parameters that they provide have no effect, or that certain parameters are
not being set and are reverting to default values. You should, as a best practice, define the
types and parameters for your function, but using this method is optional.
-488-
Developing and Using User Defined Functions
ParserFactory Class Example:
The following example is provided as part of
/opt/vertica/sdk/examples/ParserFunctions/BasicIntegerParser.cpp. It defines the
factory class for the BasicIntegerParser parser function.
class BasicIntegerParserFactory : public ParserFactory {
public:
virtual void plan(ServerInterface &srvInterface,
PerColumnParamReader &perColumnParamReader,
PlanContext &planCtxt) {
/* Check parameters */
// TODO: Figure out what parameters I should have; then make sure I have
them
/* Populate planData */
// Nothing to do here
}
virtual UDParser* prepare(ServerInterface &srvInterface,
PerColumnParamReader &perColumnParamReader,
PlanContext &planCtxt,
const SizedColumnTypes &returnType) {
return vt_createFuncObj(srvInterface.allocator, BasicIntegerParser);
}
virtual void getParserReturnType(ServerInterface &srvInterface,
PerColumnParamReader &perColumnParamReader,
PlanContext &planCtxt,
const SizedColumnTypes &argTypes,
SizedColumnTypes &returnType) {
// We only and always have a single integer column
returnType.addInt(argTypes.getColumnName(0));
}
};
RegisterFactory(BasicIntegerParserFactory);
Subclassing UDParser
About the UDParser Class
The UDParser Class is responsible for parsing an input stream into tuples/rows for insertion into
an HP Vertica table. A wrapper is also provided for UDParser called ContinuousUDParser.
ContinuousUDParser provides an abstraction that allows you to treat the input data as a
continuous stream of data. This allows you to read from the data and process it "at will" instead of
having to create an iterator to use the base UDParser method. ContinuousUDParser is available
in /opt/vertica/sdk/examples/HelperLibraries/ContinuousUDParser.h.
The name of the factory class is the value used for the NAME modifier in the in the CREATE
PARSER statement.
-489-
Programmer's Guide
After creating your ParserFactory, you must register it with RegisterFactory();
UDParser Methods:
 setup() - Invoked before the first time that process() is called.
UDParsers must be restartable. If loading large numbers of files, a given UDParser may be
re-used for multiple files. HP Vertica follows the worker-pool design pattern: At the start of
COPY execution, several Parsers and several Filters are instantiated per node by calling the
corresponding prepare() method multiple times. Each Filter/Parser pair is then internally
assigned to an initial Source (UDSource or internal). At that point, setup() is called; then
process() is called until it is finished; then destroy() is called. If there are still sources in the pool
waiting to be processed, then the UDFilter/UDSource pair will be given a second Source;
setup() will be called a second time, then process() until it is finished, then destroy(). This
repeats until all sources have been read.
 destroy() - Invoked after the last time that process() is called.
 process() - Invoked repeatedly during query execution until it returns DONE or until the query
is canceled by the user.
On each invocation, process() will be given an input buffer. It reads data from that buffer,
converting it to fields and tuples and writes those tuples via 'writer'. Once it has consumed as
much as it reasonably can (for example, once it has consumed the last complete row in the
input buffer), it should return INPUT_NEEDED to indicate that more data is needed, or DONE
to indicate that it has completed parsing this input stream and will not be reading more bytes
from it.
If 'input_state' == END_OF_FILE, then the last byte in 'input' is the last byte in the input stream.
Returning INPUT_NEEDED will not result in any new input appearing. process() should return
DONE in this case as soon as this operator has finished producing all output that it is going to
produce.
Note that 'input' may contain null bytes, if the source file contains null bytes. Note also that
'input' is NOT automatically null-terminated.
Returns:
 INPUT_NEEDED if this UDParser has more data to produce.
 DONE if it has no more data to produce.
 REJECT to reject a row (see "Row Rejection" below)
 getRejectedRecord() - returns information about the rejected data.
 writer - A member variable of the class. It is used to write parsed tuples to and is of the type
StreamWriter. This is the same API as PartitionWriter used in the User Defined Transforms
framework.
Row Rejection
When you want to reject some data, you need to do two things:

Create a getRejectedRecord() method on your Parser class that returns an object of type
Vertica::RejectedRecord, which contains the data that you want to reject, a string describing
the reason for rejection, the size of the data, and the terminator string. See RejectedRecord in
VerticaUDl.h for details or view the SDK Documentation.
-490-
Developing and Using User Defined Functions

When you encounter rejected data, have process() return REJECT. After process() returns
REJECT, HP Vertica calls getRejectedRecord() to process the rejected record before the
next call to process().
One simple way of fulfilling this is to include code in your parser class such as:
Vertica::RejectedRecord myRejRec;
Vertica::RejectedRecord getRejectedRecord() {
return myRejRec;
}
In your process() method, add code such as:
(...)
if (some rejection condition) {
RejectedRecord rr("Bad Record!", "foo data", 8, "\n");
myRejRec = rr;
return Vertica::REJECT;
}
(...)
That is just one simple approach. The only requirement is that there exist a getRejectedRecord()
function that can (and always will) be called after process() returns REJECT, and it returns the
data that HP Vertica needs to process the rejection.
ContinuousUDParser Functions:
The ContinuousUDP arser wrapper allows you to read and process the data stream "at will" instead of
having to iterate through the data.



initialize() - Invoked before run(). You can optionally override this function to perform setup and
initialization.
run() - Processes the data. Use reserve() and seek(), or read() to read the data and the
writer StreamWriter object to write data to HP Vertica.
deinitialize() - Invoked after run() has returned. You can optionally override this function to
perform tear-down and destruction.
Functions that are already implemented that you use in your code:



yield() - use to yield control back to the server during idle or busy loops so the server can check
for status changes or query cancelations.
cr - A ContinuousReader which is defined in
/opt/vertica/sdk/examples/HelperLibraries/CoroutineHelpers.h. Used to
read from the data stream.
crej - A ContinuousReader which is defined in
/opt/vertica/sdk/examples/HelperLibraries/CoroutineHelpers.h. Used to
manage rejected rows.
-491-
Programmer's Guide
UDParser Class Example:
The following example simply parses a single column of integers using the ContinuousUDParser
and writes them to tuples using the writer object. This example uses the ContinuousUDParser
wrapper.
class BasicIntegerParser : public ContinuousUDParser {
private:
// campaign for the conservation of keystrokes
char *ptr(size_t pos = 0) { return ((char*)cr.getDataPtr()) + pos; }
vint strToInt(const string &str) {
vint retVal;
stringstream ss;
ss << str;
ss >> retVal;
return retVal;
}
public:
virtual void run() {
// WARNING: This implementation is not trying for efficiency.
// It is trying to exercise ContinuousUDParser,
// and to be quick to implement.
// This parser assumes a single-column input, and
// a stream of ASCII integers split by non-numeric characters.
size_t pos = 0;
size_t reserved = cr.reserve(pos+1);
while (!cr.isEof() || reserved == pos+1) {
while (reserved == pos+1 && (*ptr(pos) >= '0' && *ptr(pos) <= '9'))
{
pos++;
reserved = cr.reserve(pos+1);
}
string st(ptr(), pos);
writer->setInt(0, strToInt(st));
writer->next();
while (reserved == pos+1 && !(*ptr(pos) >= '0' && *ptr(pos) <= '9'))
{
pos++;
reserved = cr.reserve(pos+1);
}
cr.seek(pos);
pos = 0;
reserved = cr.reserve(pos+1);
}
}
};
-492-
Developing and Using User Defined Functions
Compiling Your UDF
g++ is the only supported compiler for compiling User Defined Function libraries (see Setting up a
UDF Development Environment (page 433) for details). You should compile your UDF code on
the same version of Linux that you use on your HP Vertica cluster.
There are several requirements for compiling your library:




You must pass the -shared and -fPIC flags to the linker. The easiest method is to just pass
these flags to g++ when you compile and link your library.
You should also use the -Wno-unused-value flag to suppress warnings when macro
arguments are not used. Otherwise, you may get "left-hand operand of comma has no effect"
warnings.
You must compile sdk/include/Vertica.cpp and link it into your library. The easiest way
to do this is to include it in the g++ command to compile your library. This file contains support
routines that help your UDF communicate with HP Vertica. Supplying this file as C++ source
rather than a library limits library compatibility issues.
Add the HP Vertica SDK include directory in the include search path using the g++ -I flag.
The following command line compiles a UDF contained in a single source file named MyUDF.cpp
into a shared library named MyUDF.so:
g++ -D HAVE_LONG_INT_64 -I /opt/vertica/sdk/include -Wall -shared -Wno-unused-value \
-fPIC -o MyUDF.so MyUDF.cpp /opt/vertica/sdk/include/Vertica.cpp
The above command line assumes that the HP Vertica SDK directory is located at
/opt/vertica/sdk/include (the default location).
Note: HP only supports UDF development on 64-bit architectures. If you must compile your
UDF code on a 32-bit system, add the flag -D__Linux32__ to your compiler command line.
Once you have debugged your UDF and are ready to deploy it, you should recompile using the
-O3 flag to enable compiler optimization.
You can add additional source files to your library by adding them to the command line. You can
also compile them separately and then link them together on your own.
Note: The examples subdirectory in the HP Vertica SDK directory contains a make file that you
can use as starting point for your own UDF project.
Handling External Libraries
If your UDF code relies on additional libraries (either ones you have developed, or provided by
third-parties) you have two options on how you link them to your UDF library:

Statically link them into your UDF. This is the best option, since your UDF library will not rely on
any external files. Since HP Vertica takes care of distributing your library to each node in your
cluster, bundling the additional library into your UDF library eliminates any additional work to
deploy your UDF.
-493-
Programmer's Guide

Dynamically link the library to your UDF. You may need to use dynamic linking for some
third-party libraries that do not allow static linking. In this case, you will need to manually install
this external library on each of your HP Vertica nodes. This increases the maintenance you
need to perform. It also adds a new step when adding new nodes to the cluster, since you
need remember to install the library before adding the node. In addition, you need to ensure
the version of the library is the same on each node.
Handling Different Numbers and Types of Arguments
Usually, your UDFs accept a set number of arguments that are a specific data type (called its
signature). You can create UDFs that handle multiple signatures, or even accept all arguments
supplied to them by the user, using either of these techniques:

Overloading your UDF by creating multiple factory classes, each of which defines a unique
function signature. This technique is best if your UDF just needs to accept a few different
signatures (for example, accepting two required and one optional argument).
 Using the special "Any" argument type that tells HP Vertica to send all arguments that the user
supplies to your function. Your UDF decides whether it can handle the arguments or not.
The following topics explain each of these techniques.
User Defined Function Overloading
You may want your UDF to accept several different signatures (sets of arguments). For example,
you might want your UDF to accept:



One or more optional arguments.
One or more argument that can be one of several data types.
Completely distinct signatures (either all INTEGER or all VARCHAR, for example).
You can create a function with this behavior by creating several factory classes each of which
accept a different signature (the number and data types of arguments), and associate a single
SQL function name with all of them. You can use the same SQL function name to refer to multiple
factory classes as long as the signature defined by each factory is unique. When a user calls your
UDF, HP Vertica matches the number and types of arguments supplied by the user to the
arguments accepted by each of your function's factory classes. If one matches, HP Vertica uses it
to instantiate a function class to process the data.
Multiple factory classes can instantiate the same function class, so you can re-use one function
class that is able to process multiple sets of arguments and then create factory classes for each of
the function signatures. You can also create multiple function classes if you want.
-494-
Developing and Using User Defined Functions
The following example code demonstrates creating a User Defined Scalar Function (UDSF) that
adds two or three integers together. The Add2or3ints class is prepared to handle two or three
arguments. The processBlock function checks the number of arguments that have been
passed to it, and adds all two or three of them together. It also exits with an error message if it
has been called with less than 2 or more than 3 arguments. In theory, this should never happen,
since HP Vertica only calls the UDSF if the user's function call matches a signature on one of the
factory classes you create for your function. In practice, it is a good idea to perform this sanity
checking, in case your (or someone else's) factory class inaccurately reports a set of arguments
your function class cannot handle.
-495-
Programmer's Guide
#include "Vertica.h"
using namespace Vertica;
using namespace std;
// a ScalarFunction that accepts two or three
// integers and adds them together.
class Add2or3ints : public Vertica::ScalarFunction
{
public:
virtual void processBlock(Vertica::ServerInterface &srvInterface,
Vertica::BlockReader &arg_reader,
Vertica::BlockWriter &res_writer)
{
const size_t numCols = arg_reader.getNumCols();
// Ensure that only two or three parameters are passed in
if ( numCols < 2 || numCols > 3)
vt_report_error(0, "Function only accept 2 or 3 arguments, "
"but %zu provided", arg_reader.getNumCols());
// Add two integers together
do {
const vint a = arg_reader.getIntRef(0);
const vint b = arg_reader.getIntRef(1);
vint c = 0;
// Check for third argument, add it in if it exists.
if (numCols == 3)
c = arg_reader.getIntRef(2);
res_writer.setInt(a+b+c);
res_writer.next();
} while (arg_reader.next());
}
};
// This factory accepts function calls with two integer arguments.
class Add2intsFactory : public Vertica::ScalarFunctionFactory
{
virtual Vertica::ScalarFunction
*createScalarFunction(Vertica::ServerInterface
&srvInterface)
{ return vt_createFuncObj(srvInterface.allocator, Add2or3ints); }
virtual void getPrototype(Vertica::ServerInterface &srvInterface,
Vertica::ColumnTypes &argTypes,
Vertica::ColumnTypes &returnType)
{
// Accept 2 integer values
argTypes.addInt();
argTypes.addInt();
returnType.addInt();
}
};
RegisterFactory(Add2intsFactory);
-496-
Developing and Using User Defined Functions
// This factory defines a function that accepts 3 ints.
class Add3intsFactory : public Vertica::ScalarFunctionFactory
{
virtual Vertica::ScalarFunction
*createScalarFunction(Vertica::ServerInterface
&srvInterface)
{ return vt_createFuncObj(srvInterface.allocator, Add2or3ints); }
virtual void getPrototype(Vertica::ServerInterface &srvInterface,
Vertica::ColumnTypes &argTypes,
Vertica::ColumnTypes &returnType)
{
// accept 3 integer values
argTypes.addInt();
argTypes.addInt();
argTypes.addInt();
returnType.addInt();
}
};
RegisterFactory(Add3intsFactory);
The example has two ScalarFunctionFactory classes, one for each signature that the
function accepts (two integers and three integers). There is nothing unusual about these factory
classes, except that their implementation of
ScalarFunctionFactory::createScalarFunction both create Add2or3ints objects.
The final step is to bind the same SQL function name to both factory classes. You can assign
multiple factories to the same SQL function, as long as the signatures defined by each factory's
getPrototype implementation are different.
=> CREATE LIBRARY add2or3IntsLib AS '/home/dbadmin/Add2or3Ints.so';
CREATE LIBRARY
=> CREATE FUNCTION add2or3Ints as NAME 'Add2intsFactory' LIBRARY add2or3IntsLib
FENCED;
CREATE FUNCTION
=> CREATE FUNCTION add2or3Ints as NAME 'Add3intsF actory' LIBRARY add2or3IntsLib
FENCED;
CREATE FUNCTION
=> SELECT add2or3Ints(1,2);
add2or3Ints
------------3
(1 row)
=> SELECT add2or3Ints(1,2,4);
add2or3Ints
------------7
(1 row)
-497-
Programmer's Guide
=> SELECT add2or3Ints(1,2,3,4); -- Will generate an error
ERROR 3467: Function add2or3Ints(int, int, int, int) does not exist, or
permission is denied for add2or3Ints(int, int, int, int)
HINT: No function matches the given name and argument types. You may
need to add explicit type casts
The error message in response to the final call to the add2or3Ints function was generated by HP
Vertica, since it could not find a factory class associated with add2or3Ints that accepted four
integer arguments. To expand add2or3Ints further, you could create another factory class that
accepted this signature, and either change the Add2or3ints ScalarFunction class or create a
totally different class to handle adding more integers together. However, adding more classes to
accept a each variation in the arguments quickly becomes overwhelming. In that case, you should
consider creating a polymorphic UDF (see Creating a Polymorphic UDF (page 498) for more
information).
Creating a Polymorphic UDF
Polymorphic UDFs accept any number and type of argument that the user supplies. HP Vertica
does not check the number or types of argument that the user passes to the UDF—it just passes
the UDF all of the arguments supplied by the user. It is up to your polymorphic UDF's main
processing function (for example, processBlock in User Defined Scalar Functions) to examine
the number and types of arguments it received and determine if it can handle them.
Note: User Defined Transform Functions (UDTFs) can have an unlimited number of
arguments. All other UDFs except UDTFs are limited to a maximum number of 32 arguments.
Polymorphic UDFs are more flexible than using multiple factory classes for your function (see
User Defined Function Overloading (page 494)), since you function can determine at run time if
it can process the arguments rather than accepting specific sets of arguments. However, your
polymorphic function needs to perform more work to determine whether it can process the
arguments that it has been given.
Your polymorphic UDF declares it accepts any number of arguments in its factory's
getPrototype function by calling the addAny function on the ColumnTypes object that defines
its arguments. This "any parameter" argument type is the only one that your function can declare.
You cannot define required arguments and then call addAny to declare the rest of the signature as
optional. If your function has requirements for the arguments it accepts, your process function
must enforce them.
The following example shows an implementation of a ScalarFunction that adds together two or
more integers.
-498-
Developing and Using User Defined Functions
#include "Vertica.h"
using namespace Vertica;
using namespace std;
// Adds two or more integers together.
class AddManyInts : public Vertica::ScalarFunction
{
public:
virtual void processBlock(Vertica::ServerInterface &srvInterface,
Vertica::BlockReader &arg_reader,
Vertica::BlockWriter &res_writer)
{
// Always catch exceptions to prevent causing the side process or
// Vertica itself from crashing.
try
{
// Find the number of arguments sent.
size_t numCols = arg_reader.getNumCols();
// Make sure at least 2 arguments were supplied
if (numCols < 2)
vt_report_error(0, "Function expects at least 2 integer
parameters");
}
// Make sure all types are ints
const SizedColumnTypes &inTypes = arg_reader.getTypeMetaData();
for (int param=0; param < (int)numCols; param++) {
const VerticaType &t = inTypes.getColumnType(param);
if (!t.isInt())
{
string typeDesc = t.getPrettyPrintStr();
// Report that the user supplied a non-integer value.
vt_report_error(0, "Function expects all arguments to be "
"INTEGER. Argument %d was %s", param+1,
typeDesc.c_str());
}
}
do
{ // total up the arguments and write out the total.
vint total = 0;
int x;
// Loop over all params, adding them up.
for (x=0; x<(int)numCols; x++) {
total += arg_reader.getIntRef(x);
}
res_writer.setInt(total);
res_writer.next();
} while (arg_reader.next());
catch(exception& e) {
// Standard exception. Quit.
vt_report_error(0, "Exception while processing partition: [%s]",
e.what());
}
-499-
Programmer's Guide
}
};
// Defines the AddMany function.
class AddManyIntsFactory : public Vertica::ScalarFunctionFactory
{
// Return the function object to process the data.
virtual Vertica::ScalarFunction *createScalarFunction(
Vertica::ServerInterface &srvInterface)
{ return vt_createFuncObj(srvInterface.allocator, AddManyInts); }
// Define the number and types of arguments that this function accepts
virtual void getPrototype(Vertica::ServerInterface &srvInterface,
Vertica::ColumnTypes &argTypes,
Vertica::ColumnTypes &returnType)
{
argTypes.addAny(); // Must be only argument type.
returnType.addInt();
}
};
RegisterFactory(AddManyIntsFactory);
Most of the work in the example is done by the ScalarFunction.processBlock function. It
performs two checks on the arguments that have been passed in through the BlockReader
object:


Ensures there are at least two arguments
Checks the data type of all arguments to ensure they are all integers
Once the checks are performed, the example processes the block of data by looping over the
arguments and adding them together.
You assign a SQL name to your polymorphic UDF using the same statement you use to assign
one to a non-polymorphic UDF. The following demonstration shows how you load and call the
polymorphic function from the example.
=> CREATE LIBRARY addManyIntsLib AS '/home/dbadmin/AddManyInts.so';
CREATE LIBRARY
=> CREATE FUNCTION addManyInts AS NAME 'AddManyIntsFactory' LIBRARY addManyIntsLib
FENCED;
CREATE FUNCTION
=> SELECT addManyInts(1,2);
addManyInts
------------3
(1 row)
=> SELECT addManyInts(1,2,3,40,50,60,70,80,900);
addManyInts
------------1206
-500-
Developing and Using User Defined Functions
(1 row)
=> SELECT addManyInts(1); -- Too few parameters
ERROR 3412: Failure in UDx RPC call InvokeProcessBlock(): Error calling
processBlock() in User Defined Object [addManyInts] at
[AddManyInts.cpp:51], error code: 0, message: Exception while processing
partition: [Function expects at least 2 integer parameters]
=> SELECT addManyInts(1,2.232343); -- Wrong data type
ERROR 3412: Failure in UDx RPC call InvokeProcessBlock(): Error
calling processBlock() in User Defined Object [addManyInts] at
[AddManyInts.cpp:51], error code: 0, message: Exception while
processing partition: [Function expects all arguments to be INTEGER.
Argument 2 was Numeric(7,6)]
Notice that the errors returned by last two calls to the function were generated by the
processBlock function. It is up to your UDF to ensure that the user supplies the correct number
and types of arguments to your function and exit with an error if it cannot process them.
Polymorphic UDFs and Schema Search Paths
If a user does not supply a schema name as part of a function call, HP Vertica searches each
schema in the schema search path for a function whose name and signature match the function
call. See Setting Schema Search Paths in the Administrator's Guide for more information about
schema search paths.
Since polymorphic functions do not have a specific signature associated with them, HP Vertica
initially skips them when searching for a function to handle the function call. If none of the schemas
in the search path contain a function whose name and signature match the function call, HP
Vertica searches the schema search path again for a polymorphic function whose name matches
the function name in the function call.
This behavior gives precedence to functions whose signature exactly matches the function call. It
allows you to create a "catch all" polymorphic function that is called only if none of the
non-polymorphic functions with the same name have matching signatures.
This behavior may cause confusion if your users expect the the first polymorphic function in the
schema search path to handle a function call. To avoid confusion, you should:


Avoid using the same name for different functions. You should always uniquely name
functions unless you intend to create an overloaded function with multiple signatures.
When you cannot avoid having functions with the same name in different schemas, always
supply the schema name as part of the function call. Using the schema name prevents
ambiguity and ensures that HP Vertica uses the correct function to process your function calls.
-501-
Programmer's Guide
UDF Parameters
Parameters let you define arguments for your UDFs that remain constant across all of the rows
processed by the SQL statement that calls you UDF. Typically, your UDFs accept arguments that
come from columns in a SQL statement. For example, in the following SQL statement, the
arguments a and b to the add2ints UDSF change value for each row processed by the SELECT
statement:
=> SELECT a, b, add2ints(a,b) AS 'sum' FROM example;
a | b | sum
---+----+----1 | 2 |
3
3 | 4 |
7
5 | 6 | 11
7 | 8 | 15
9 | 10 | 19
(5 rows)
Parameters remain constant for all the rows your UDF processes. You can also make parameters
optional so that if the user does not supply it, your UDF uses a default value. For example, the
following example demonstrates calling a UDSF named add2intsWithConstant that has a single
parameter value named constant whose value is added to each the arguments supplied in each
row of input:
=> SELECT a, b, add2intsWithConstant(a, b USING PARAMETERS constant=42)
-> AS 'a+b+42' from example;
a | b | a+b+42
---+----+-------1 | 2 |
45
3 | 4 |
49
5 | 6 |
53
7 | 8 |
57
9 | 10 |
61
(5 rows)
Note: When calling a UDF with parameters, there is no comma between the last argument and
the USING PARAMETERS clause.
The topics in this section explain how develop UDFs that accept parameters.
Defining the Parameters Your UDF Accepts
You define the parameters that your UDF accepts in its factory class
(ScalarFunctionFactory, AggregateFunctionFactory, etc.) by implementing the
getParameterType function. This function is similar to the getReturnType function: you call
data-type-specific functions on a SizedColumnTypes object that is passed in as a parameter.
Each function call sets the name, data type, and width or precision (if the data type requires it) of
the parameter.
-502-
Developing and Using User Defined Functions
The following code fragment demonstrates adding a single parameter to the add2ints UDSF
example. The getParameterType function defines a single integer parameter that is named
constant.
class Add2intsWithConstantFactory : public ScalarFunctionFactory
{
// return an instance of Add2ints to perform the actual addition.
virtual ScalarFunction *createScalarFunction(ServerInterface &interface)
{
// Calls the vt_createFuncObj to create the new Add2ints class instance.
return vt_createFuncObj(interface.allocator, Add2intsWithConstant);
}
// Report the argument and return types to Vertica
virtual void getPrototype(ServerInterface &interface,
ColumnTypes &argTypes,
ColumnTypes &returnType)
{
// Takes two ints as inputs, so add ints to the argTypes object
argTypes.addInt();
argTypes.addInt();
// returns a single int.
returnType.addInt();
}
// Defines the parameters for this UDSF. Works similarly to defining
// arguments and return types.
virtual void getParameterType(ServerInterface &srvInterface,
SizedColumnTypes &parameterTypes)
{
// One INTEGER parameter named constant
parameterTypes.addInt("constant");
}
};
RegisterFactory(Add2intsWithConstantFactory);
See the HP Vertica SDK entry for SizedColumnTypes for a full list of the data-type-specific
functions you can call to define parameters.
Getting Parameter Values in UDFs
Your UDF uses the parameter values it declared in its factory class (see Defining the Parameters
Your UDF Accepts (page 502)) in its function class's process function (for example,
processBlock or processPartition). It gets its parameter values from a ParamReader
object, which is available from the ServerInterface object that is passed to your process
function. Reading parameters from this object is similar to reading argument values from
BlockReader or PartitionReader objects: you call a data-type-specific function with the
name of the parameter to retrieve its value. For example:
-503-
Programmer's Guide
// Get the parameter reader from the ServerInterface to see if
// there are supplied parameters
ParamReader paramReader = srvInterface.getParamReader();
// Get the value of an integer parameter named constant
const vint constant = paramReader.getIntRef("constant");
Note: String data values do not have any of their escape characters processed before they are
passed to your function. Therefore, your function may need to process the escape sequences
itself if it needs to operate on unescaped character values.
Testing Whether the User Supplied Parameter Values
Unlike arguments, HP Vertica does not immediately return an error if a user's function call does
not include a value for a parameter defined by your UDF's factory class. This means that your
function can attempt to read a parameter value that the user did not supply. If it does so, HP
Vertica returns a non-existent parameter error to the user, and the query containing the function
call is canceled. This behavior is fine if you want a parameter to be required by your UDF—just
attempt to access its value. If the user didn't supply a value, HP Vertica reports the resulting error
about a missing parameter to the user.
If you want your parameter to be optional, you can test whether the user supplied a value for the
parameter before attempting to access its value. Your function determines if a value exists for a
particular parameter by calling the ParamReader::containsParameter function with the
parameter's name. If this function returns true, your function can safely retrieve the value. If this
function returns false, your UDF can use a default value or change its processing in some other
way to compensate for not having the parameter value. As long as your UDF does not try to
access the non-existent parameter value, HP Vertica does not generate an error or warning about
missing parameters.
Note: If the user passes your UDF a parameter that it has not defined, HP Vertica issues a
warning that the parameter is not used. It still executes the SQL statement, ignoring the
parameter.
The following code fragment demonstrates using the parameter value that was defined in the
example shown in Defining the Parameters Your UDF Accepts (page 502). The
Add2intsWithConstant class defines a function that adds two integer values. If the user
supplies it, the function also adds the value of the optional integer parameter named constant.
-504-
Developing and Using User Defined Functions
/**
* A UDSF that adds two numbers together with a constant value.
*
*/
class Add2intsWithConstant : public ScalarFunction
{
public:
// Processes a block of data sent by Vertica
virtual void processBlock(ServerInterface &srvInterface,
BlockReader &arg_reader,
BlockWriter &res_writer)
{
try
{
// The default value for the constant parameter is 0.
vint constant = 0;
// Get the parameter reader from the ServerInterface to see if
// there are supplied parameters
ParamReader paramReader = srvInterface.getParamReader();
// See if the user supplied the constant parameter
if (paramReader.containsParameter("constant"))
// There is a parameter, so get its value.
constant = paramReader.getIntRef("constant");
// While we have input to process
do
{
// Read the two integer input parameters by calling the
// BlockReader.getIntRef class function
const vint a = arg_reader.getIntRef(0);
const vint b = arg_reader.getIntRef(1);
// Add arguments plus constant
res_writer.setInt(a+b+constant);
// Finish writing the row, and advance to the next
// output row
res_writer.next();
// Continue looping until there are no more input rows
}
while (arg_reader.next());
}
catch (exception& e)
{
// Standard exception. Quit.
vt_report_error(0, "Exception while processing partition: %s",
e.what());
}
}
};
Using Parameters in the Factory Class
-505-
Programmer's Guide
In addition to using parameters in your UDF function class, you can also access the parameters in
the factory class. You may want to access the parameters to let the user control the input or output
values of your function in some way. For example, your UDF can have a parameter that lets the
user choose to have your UDF return a single or double-precision value. The process of accessing
parameters in the factory class is the same as accessing it in the function class: get a
ParamReader object from the ServerInterface::getParamReader function, them read the
parameter values.
Calling UDFs with Parameters
You pass parameters to a UDF by adding a USING PARAMETERS clause in the function call after
the last argument. There is no comma between the last argument and the USING PARAMETERS
clause. After the USING PARAMETERS clause you add one or more parameter definitions which
contains the parameter name, followed by an equal sign, then the parameter's value. Multiple
parameter definitions are separated by commas.
Note: Parameter values can be a constant expression (for example 1234 + SQRT(5678)).
You cannot use volatile functions (such as RANDOM) in the expression, since they do not
return a constant value. If you do supply a volatile expression as a parameter value, HP Vertica
returns an incorrect parameter type warning, and tries to run the UDF without the parameter
value. If the UDF requires the parameter, it returns its own error which cancels the query.
The following example demonstrates calling the add2intsWithConstant UDSF example from
Defining the Parameters Your UDF Accepts (page 502) and Getting Parameter Values in
UDFs (page 503):
=> SELECT a, b, add2intsWithConstant(a, b USING PARAMETERS constant=42)
-> AS 'a+b+42' from example;
a | b | a+b+42
---+----+-------1 | 2 |
45
3 | 4 |
49
5 | 6 |
53
7 | 8 |
57
9 | 10 |
61
(5 rows)
Multiple parameters are separated by commas. The following example calls a version of the
tokenize UDTF that has parameters to limit the shortest allowed word and force the words to be
output in uppercase.
=> SELECT url, tokenize(description USING PARAMETERS
-> minLength=4, uppercase=true) OVER (partition by url) FROM T;
url
|
words
-----------------+----------www.amazon.com | ONLINE
www.amazon.com | RETAIL
www.amazon.com | MERCHANT
www.amazon.com | PROVIDER
www.amazon.com | CLOUD
www.amazon.com | SERVICES
www.hp.com
| LEADING
-506-
Developing and Using User Defined Functions
www.hp.com
www.hp.com
www.hp.com
www.hp.com
www.hp.com
www.vertica.com
www.vertica.com
www.vertica.com
www.vertica.com
(16 rows)
|
|
|
|
|
|
|
|
|
PROVIDER
COMPUTER
HARDWARE
IMAGING
SOLUTIONS
WORLD'S
FASTEST
ANALYTIC
DATABASE
The add2intsWithConstant UDSF's constant parameter is optional; calling it without the parameter
does not return an error or warning:
=> SELECT a,b,add2intsWithConstant(a, b) AS 'sum' FROM example;
a | b | sum
---+----+----1 | 2 |
3
3 | 4 |
7
5 | 6 | 11
7 | 8 | 15
9 | 10 | 19
(5 rows)
Calling a UDF with incorrect parameters does generate a warning, but the query still runs:
=> SELECT a, b, add2intsWithConstant(a, b USING PARAMETERS wrongparam=42)
-> AS 'result' from example;
WARNING 4332: Parameter wrongparam was not registered by the function and cannot
be coerced to a definite data type
a | b | result
---+----+-------1 | 2 |
3
3 | 4 |
7
5 | 6 |
11
7 | 8 |
15
9 | 10 |
19
(5 rows)
UDF Resource Use
Your UDFs consume at least a small amount of memory by instantiating classes and creating local
variables. This basic memory usage by UDFs is small enough that you do not need to be
concerned about it.
If your UDF needs to allocate more than one or two megabytes of memory for data structures, or
requires access additional resources such as files, you must inform HP Vertica about its resource
use. HP Vertica can then ensure that the resources your UDF requires are available before
running a query that uses it. Even moderate memory use (10MB per invocation of a UDF, for
example) can become an issue if there are many simultaneous queries that call it.
-507-
Programmer's Guide
Allocating Resources
You have two options for allocating memory and file handles for your User Defined Functions
(UDFs):

Use HP Vertica SDK macros to allocate resources. This is the best method, since it uses HP
Vertica's own resource manager, and guarantees that resources used by your UDF are
reclaimed. See Allocating Resources with the SDK Macros (page 509).
 Allocate resources in your UDFs yourself using standard C++ methods (instantiating objects
using new, allocating memory blocks using malloc, etc.). You must manually free these
resources before your UDF exits.
Note: You must be extremely careful if you choose to allocate your own resources in your UDF.
Failing to free resources properly will have significant negative impact, especially if your UDF is
running in unfenced mode.
Whichever method you choose, you usually allocate resources in a function named setup
(subclassed from (Vertica::UDXObject::setup) in your function class. This function is
called after your UDF function object is instantiated, but before HP Vertica calls it to process data.
If you allocate memory on your own in setup function, you must free it in a function named
destroy (subclassed from Vertica::UDXObject::destroy) in your function class. This
functions is called after your UDF has performed all of its processing. This function is also called if
your UDF returns an error (see Handling Errors (page 514)).
Note: Always use the setup and destroy functions to allocate and free resources instead
your own constructors and destructors. The memory for your UDF object is allocated from one
of HP Vertica's own memory pools. HP Vertica always calls your UDF's destroy function before
the it deallocates the object's memory. There is no guarantee that your UDF's destructor is will
be called before the object is deallocated. Using the destroy function ensures that your UDF
has a chance to free its allocated resources before it is destroyed.
The following code fragment demonstrates allocating and freeing memory using a setup and
destroy function.
-508-
Developing and Using User Defined Functions
class MemoryAllocationExample : public ScalarFunction
{
public:
uint64* myarray;
// Called before running the UDF to allocate memory used throughout
// the entire UDF processing.
virtual void setup(ServerInterface &srvInterface, const SizedColumnTypes
&argTypes)
{
try
{
// Allocate an array. This memory is directly allocated, rather than
// letting Vertica do it. Remember to properly calculate the amount
// of memory you need based on the data type you are allocating.
// This example divides 500MB by 8, since that's the number of
// bytes in a 64-bit unsigned integer.
myarray = new uint64[1024 * 1024 * 500 / 8];
}
catch (std::bad_alloc &ba)
{
// Always check for exceptions caused by failed memory
// allocations.
vt_report_error(1, "Couldn't allocate memory :[%s]", ba.what());
}
}
// Called after the UDF has processed all of its information. Use to free
// any allocated resources.
virtual void destroy(ServerInterface &srvInterface, const SizedColumnTypes
&argTypes)
{
// srvInterface.log("RowNumber processed %d records", *count_ptr);
try
{
// Properly dispose of the allocated memory.
delete[] myarray;
}
catch (std::bad_alloc &ba)
{
// Always check for exceptions caused by failed memory
// allocations.
vt_report_error(1, "Couldn't free memory :[%s]", ba.what());
}
}
Allocating Resources with the SDK Macros
The HP Vertica SDK provides three macros to allocate memory:
-509-
Programmer's Guide



vt_alloc allocates a block of memory to fit a specific data type (vint, struct, etc.).
vt_allocArray allocates a block of memory to hold an array of a specific data type.
vt_allocSize allocates an arbitrarily-sized block of memory.
All of these macros allocate their memory from memory pools managed by HP Vertica. The main
benefit of allowing HP Vertica to manage your UDF's memory is that the memory is automatically
reclaimed after your UDF has finished. This ensures there is no memory leaks in your UDF.
You do not free any of the memory you allocate through any of these macros. The memory is
automatically reclaimed by the HP Vertica process when the UDF has finished running.
Attempting to free this memory will result in run-time errors.
Informing HP Vertica of Resource Requirements
When you run your UDF in fenced mode, HP Vertica monitors its use of memory and file handles.
If your UDF uses more than a few megabytes of memory or any file handles, it should tell HP
Vertica about its resource requirements. Knowing the resource requirements of your UDF allows
HP Vertica to determine whether it can run the UDF immediately or needs to queue the request
until enough resources become available to run it.
Determining how much memory your UDF requires can be difficult in some case. For example, if
your UDF extracts unique data elements from a data set, and there is potentially no bounds on
the number of data items. In this case, a useful technique is to run your UDF in a test environment
and monitor its memory use on a node as it handles several differently-sized queries, then
extrapolate its memory use based on the worst case scenario it may face in your production
environment. In all cases, it's usually a good idea to add a safety margin to the amount of memory
you tell HP Vertica your UDF uses.
Note: The information on your UDF's resource needs that you pass to HP Vertica is used when
planning the query execution. There is no way to change the amount of resources your UDF
requests from HP Vertica while the UDF is actually running.
Your UDF informs HP Vertica of its resource needs by implementing the
getPerInstanceResources function in its factory class (see
Vertica::UDXFactory::getPerInstanceResources in the SDK documentation). If your
UDF's factory class implements this function, HP Vertica calls it to determine the resources your
UDF requires.
The getPerInstanceResources function receives an instance of the Vertica::VResources
struct, which contains fields setting the amount of memory and the number of file handles your
UDF may need to use. Your implementation of this function sets these fields based on the
maximum resources your UDF may consume for each instance of the UDF function. So, if your
UDF's ProcessBlock function creates a data structure that uses at most 100MB of memory,
your UDF should set the VResources.scratchMemory field to at least 104857600 (the number
of bytes in 100MB). Rounding up to a number like 115000000 (just under 110MB) is a good idea.
-510-
Developing and Using User Defined Functions
The following ScalarFunctionFactory class demonstrates calling
getPerInstanceResources to inform HP Vertica about the memory requirements of the
MemoryAllocationExample class shown in Allocating Resources (page 508). It tells HP
Vertica that the UDSF requires 510MB of memory (which is a bit more than the UDSF actually
allocates, to be on the safe size).
class MemoryAllocationExampleFactory : public ScalarFunctionFactory
{
virtual Vertica::ScalarFunction
*createScalarFunction(Vertica::ServerInterface
&srvInterface)
{
return vt_createFuncObj(srvInterface.allocator,
MemoryAllocationExample);
}
virtual void getPrototype(Vertica::ServerInterface &srvInterface,
Vertica::ColumnTypes &argTypes,
Vertica::ColumnTypes &returnType)
{
argTypes.addInt();
argTypes.addInt();
returnType.addInt();
}
// Tells Vertica the amount of resources that this UDF uses.
virtual void getPerInstanceResources(ServerInterface &srvInterface,
VResources &res)
{
res.scratchMemory += 1024LL * 1024 * 510; // request 510MB of memory
}
};
Setting Memory Limits for Fenced Mode UDFs
HP Vertica calls fenced-mode UDF's implementation of
Vertica::UDXFactory::getPerInstanceResources to determine if there are enough free
resources to run the query containing the UDF (see Informing HP Vertica of Resource
Requirements (page 510)). Since these reports are not generated by actual memory use, they
can be inaccurate. Once started by HP Vertica, a UDF could allocate far more memory or file
handles than it reported it needs.
The FencedUDxMemoryLimitMB configuration parameter lets you create an absolute memory
limit for UDFs. Any attempt by a UDF to allocate more memory than this limit results in a
bad_alloc exception. For more information on configuration parameters, see Configuration
Parameters in the Administrator's Guide. For an example of setting FencedUDxMemoryLimitMB,
see How Resource Limits Are Enforced (page 512).
-511-
Programmer's Guide
How Resource Limits Are Enforced
Before running a query, HP Vertica determines how much memory it requires to run. If the query
contains a fenced-mode UDF which implements the getPerInstanceResources function in its
factory class, HP Vertica calls it to determine the amount of memory the UDF needs and adds this
to the total required for the query. Based on these requirements, HP Vertica decides how to
handle the query:

If the total amount of memory required (including the amount that the UDFs report that they
need) is larger than the session's MEMORYCAP or resource pool's MAXMEMORYSIZE
setting, HP Vertica rejects the query. For more information about resource pools, see
Resource Pool Architecture in the Administrator's Guide.
 If the amount of memory is below the limit set by the session and resource pool limits, but there
is currently not enough free memory to run the query, HP Vertica queues it until enough
resources become available.
 If there is enough free resources to run the query, HP Vertica executes it.
Note: HP Vertica has no other way to determine the amount of resources a UDF requires other
than the values it reports using the getPerInstanceResources function. A UDF could use
more resources than it claims, which could cause performance issues for other queries that are
denied resources. You can set an absolute limit on the amount of memory UDFs can allocate.
See Setting Memory Limits for Fenced Mode UDFs (page 511) for more information.
If the process executing your UDF attempts to allocate more memory than the limit set by the
FencedUDxMemoryLimitMB configuration parameter, it receives a bad_alloc exception. For more
information about FencedUDxMemoryLimitMB, see Setting Memory Limits for Fenced Mode
UDFs (page 511).
Below is the output of loading a UDSF that consumes 500MB of memory, then changing the
memory settings to cause out of memory errors. The MemoryAllocationExample UDSF in the
following example is just the Add2Ints UDSF example altered as shown in Allocating Resources
(page 508) and Informing HP Vertica of Resource Requirements (page 510) to allocate 500MB
of RAM.
=> CREATE LIBRARY mylib AS '/home/dbadmin/MemoryAllocationExample.so';
CREATE LIBRARY
=> CREATE FUNCTION usemem AS NAME 'MemoryAllocationExampleFactory' LIBRARY mylib
-> FENCED;
CREATE FUNCTION
=> SELECT usemem(1,2);
usemem
-------3
(1 row)
The following statements demonstrate setting the session's MEMORYCAP to lower than the
amount of memory that the UDSF reports it uses. This causes HP Vertica to return an error before
it executes the UDSF.
=> SET SESSION MEMORYCAP '100M';
-512-
Developing and Using User Defined Functions
SET
=> SELECT usemem(1,2);
ERROR 3596: Insufficient resources to execute plan on pool sysquery [Request
exceeds session memory cap: 520328KB > 102400KB]
=> SET SESSION MEMORYCAP = default;
SET
The resource pool can also prevent a UDF from running if it requires more memory than is
available in the pool. The following statements demonstrate the effect of creating and using a
resource pool that has too little memory for the UDSF to run. Similar to the session's
MAXMEMORYCAP limit, the pool's MAXMEMORYSIZE setting prevents HP Vertica from
executing the query containing the UDSF.
=> CREATE RESOURCE POOL small MEMORYSIZE '100M' MAXMEMORYSIZE '100M';
CREATE RESOURCE POOL
=> SET SESSION RESOURCE POOL small;
SET
=> CREATE TABLE ExampleTable(a int, b int);
CREATE TABLE
=> INSERT /*+direct*/ INTO ExampleTable VALUES (1,2);
OUTPUT
-------1
(1 row)
=> SELECT usemem(a, b) FROM ExampleTable;
ERROR 3596: Insufficient resources to execute plan on pool small [Request Too
Large:Memory(KB) Exceeded: Requested = 523136, Free = 102400 (Limit = 102400, Used
= 0)]
=> DROP RESOURCE POOL small CASCADE; --Dropping the pool resets the session's pool
DROP RESOURCE POOL
Finally, setting the FencedUDxMemoryLimitMB configuration parameter to lower than the UDF
actually allocates results in the UDF throwing an exception. This is a different case than either of
the previous two examples, since the query actually executes. The UDF's code needs to catch and
handle the exception. In this example, it uses the vt_report_error macro to report the error
back to HP Vertica and exit.
=> SELECT set_config_parameter('FencedUDxMemoryLimitMB','300');
set_config_parameter
---------------------------Parameter set successfully
(1 row)
=> SELECT usemem(1,2);
ERROR 3412: Failure in UDx RPC call InvokeSetup(): Error calling setup() in User
Defined Object [usemem] at [MemoryAllocationExample.cpp:32], error code: 1,
message: Couldn't allocate memory :[std::bad_alloc]
=> SELECT set_config_parameter('FencedUDxMemoryLimitMB','-1');
set_config_parameter
---------------------------Parameter set successfully
(1 row)
=> SELECT usemem(1,2);
usemem
--------
-513-
Programmer's Guide
3
(1 row)
See Also
 SET SESSION RESOURCE_POOL
 SESSION MEMORYCAP
 SET_CONFIG_PARAMETER
Handling Errors
If your UDF encounters some sort of error, it can report it back to HP Vertica using the
vt_report_error macro. When called, this macro halts the execution of the UDF and causes
the statement that called the function to fail. The macro takes two parameters: an error number
and a error message string. Both the error number and message appear in the error that HP
Vertica reports to the user. The error number is not defined by HP Vertica. You can use whatever
value that you wish.
For example, the following ScalarFunction class divides two integers. To prevent division by
zero, it tests the second parameter. If it is zero, the function reports the error back to HP Vertica.
/*
* Demonstrate reporting an error
*/
class Div2ints : public ScalarFunction
{
public:
virtual void processBlock(ServerInterface &srvInterface,
BlockReader &arg_reader,
BlockWriter &res_writer)
{
// While we have inputs to process
do
{
const vint a = arg_reader.getIntRef(0);
const vint b = arg_reader.getIntRef(1);
if (b == 0)
{
vt_report_error(1,"Attempted divide by zero");
}
res_writer.setInt(a/b);
res_writer.next();
}
while (arg_reader.next());
}
};
Loading and invoking the function demonstrates how the error appears to the user.
-514-
Developing and Using User Defined Functions
=> CREATE LIBRARY Div2IntsLib AS '/home/dbadmin/Div2ints.so';
CREATE LIBRARY
=> CREATE FUNCTION div2ints AS LANGUAGE 'C++' NAME 'Div2intsInfo' LIBRARY
Div2IntsLib;
CREATE FUNCTION
=> SELECT div2ints(25, 5);
div2ints
---------5
(1 row)
=> SELECT * FROM MyTable;
a | b
----+--12 | 6
7 | 0
12 | 2
18 | 9
(4 rows)
=> SELECT * FROM MyTable WHERE div2ints(a, b) > 2;
ERROR: Error in calling processBlock() for User Defined Scalar Function
div2ints at Div2ints.cpp:21, error code: 1, message: Attempted divide by zero
Handling Cancel Requests
You can cancel a query that calls your UDF (usually, by by pressing CTRL+C in vsql). How HP
Vertica handles the cancelation of the query and your UDF depends on whether your UDF is
running in fenced or unfenced mode:

If your UDF is running in unfenced mode, HP Vertica either stops the function when it
requests a new block of input or output, or waits until your function completes running and
discards the results.
 If your UDF is running in fenced mode, HP Vertica kills the zygote process that is running your
function if it continues processing past a timeout.
See Fenced Mode for more information about running functions in fenced mode.
To give you more control over what happens to your function when the user cancels its query, the
HP Vertica SDK includes an API for some UDFs to handle cancelation. Any function class that
inherits from the Vertica::UDXObjectCancelable class can test whether the query calling it
has been canceled using a function named isCanceled. Your function can also implement a
callback function named cancel that HP Vertica calls when the function's query is canceled.
Currently, the two classes that inherit from UDXObjectCancelable are TransformFunction
and AnalyticFunction.
The topics in this section explain how to use the cancel API.
-515-
Programmer's Guide
Exiting When the Calling Query Has Been Canceled
The processPartition function in your User Defined Transform Function (UDTF) or Analytic
Function (UDAnF) can call Vertica::UDXObjectCancelable.isCancled to determine if
the user has canceled the query that called it. If isCanceled returns true, the query has been
canceled and your processPartition function should exit immediately to prevent it from
wasting CPU time. If your UDF is not running in fenced mode, HP Vertica cannot halt your
function, and has to wait for it to finish. If it is running in fenced mode, HP Vertica can eventually kill
the side process running it, but not until it has wasted some processing time.
How often your processPartition function calls isCanceled depends on how much
processing it performs on each row of data. Calling isCanceled does add some overhead to
your function, so you shouldn't call it too often. For transforms that do not perform lengthy
processing, you could check for cancelation every 100 or 1000 rows or so. If your
processPartition performs extensive processing for each row, you may want to check
isCanceled every 10 or so rows.
The following code fragment shows how you could have the StringTokenizer UDTF example
check whether its query has been canceled:
// The primary class for the StringTokenizer UDTF.
class StringTokenizer : public TransformFunction {
// Called for each partition in the table. Recieves the data from
// The source table and
virtual void processPartition(ServerInterface &srvInterface,
PartitionReader &inputReader,
PartitionWriter &outputWriter) {
try {
// Loop through the input rows
int rowCount = 0; // Count the number of rows processed.
do {
rowCount++; // Processing a new row of data
// Check for cancelation every 100 rows.
if (rowCount % 100 == 0)
{
if (isCanceled()) // See if query has been canceled
{
// Log cancelation
srvInterface.log("Got canceled!");
return; // Exit out of UDTF immediately.
}
}
// Rest of the function here
.
.
.
This example checks for cancelation after processing 100 rows in the partition of data. If the query
has been canceled, the example logs a message, then returns to the caller to exit the function.
-516-
Developing and Using User Defined Functions
Note: You need to strike a balance between adding overhead to your functions by calling
isCanceled and having your functions waste CPU time by running after their query has been
canceled (usually, a rare event). For functions such as StringTokenizer which have a low
overall processing cost, it usually does not make sense to test for cancelation. The cost of
adding overhead to all function calls outweigh the amount of resources wasted by having the
function run to completion or having its zygote process killed by HP Vertica on the rare
occasions that its query is canceled.
Implementing the Cancel Callback Function
Your User Defined Transform Function (UDTF) or Analytic Function (UDAnF) can implement a
cancel callback function that HP Vertica calls if the query that called the function has been
canceled. You usually implement this function to perform an orderly shutdown of any additional
processing that your UDF spawned. For example, you can have your cancel function shut down
threads that your UDF has spawned or signal a third-party library that it needs to stop processing
and exit. Your cancel function should leave your UDF's function class ready to be destroyed,
since HP Vertica calls the UDF's destroy function after the cancel function has exited.
Notes
 If your UDTF or UDAnF does not implement cancel, HP Vertica assumes your UDF does not
need to perform any special cancel processing, and calls the function class's destroy
function to have it free any resources (see UDF Resource Use (page 507)).
 Your cancel function is called from a different thread than the thread running your UDF's
processPartition function.


The call to the cancel function is not synchronized in any way with your UDF's
processPartition function. If you need your processPartition function to exit before
your cancel function performs some action (killing threads, for example) you need to have
the two function synchronize their actions.
If your cancel function runs for too long, HP Vertica kills the side process running your
function, if it is running in fenced mode.
UDF Debugging Tips
You must thoroughly debug your UDF before deploying it to a production environment. The
following tips can help you get your UDF is ready for deployment.
Use a Single Node For Initial Debugging
You can attach to the HP Vertica process using a debugger such as gdb to debug your UDF code.
Doing this in a multi-node environment, however, is very difficult. Therefore, consider setting up a
single-node HP Vertica test environment to initially debug your UDF.
-517-
Programmer's Guide
Write Messages to the HP Vertica Log
You can write to log files using the ServerInterface.log function. Every function in your UDF
receives an instance of the ServerInterface object, so you can call the log function from anywhere
in your UDF. The function acts similarly to printf, taking a formatted string, and an optional set of
values and writing the string to a log file. Where the message is written depends on whether your
function runs in fenced mode or unfenced mode:


Functions running in unfenced mode write their messages into the vertica.log file in the
catalog directory.
Functions running in fenced mode write their messages into a log file named
UDxLogs/UDxFencedProcesses.log in the catalog directory.
To help identify your function's output, HP Vertica adds the SQL function name bound to your UDF
function (see Deploying and Using UDSFs (page 443) for an explanation) to the log message.
The following code fragment shows how you can add a call to srvInterface.log in the
Add2ints example code's processBlock function to log its input values:
const vint a = arg_reader.getIntRef(0);
const vint b = arg_reader.getIntRef(1);
srvInterface.log("got a: %d and b: %d", (int) a, (int) b);
This code generates an entries in the log file for each row the . For example:
11-05-06
11-05-06
11-05-06
11-05-06
11-05-06
14:37:20.838
14:37:20.838
14:37:20.838
14:37:20.838
14:37:20.838
nameless:0x3f3a210
nameless:0x3f3a210
nameless:0x3f3a210
nameless:0x3f3a210
nameless:0x3f3a210
[UserMessage]
[UserMessage]
[UserMessage]
[UserMessage]
[UserMessage]
<UDx>
<UDx>
<UDx>
<UDx>
<UDx>
Add2ints
Add2ints
Add2ints
Add2ints
Add2ints
-
got
got
got
got
got
a:
a:
a:
a:
a:
1
2
3
1
5
and
and
and
and
and
b:
b:
b:
b:
b:
2
2
2
4
2
See Monitoring the Log Files in the Administrator's Guide for details on viewing the HP Vertica log
files.
Developing a User Defined Function in R
HP Vertica supports User Defined Functions written in the R programming language
http://www.r-project.org/. R is a free, open source programming language used for statistical
computing.
The topics in this section guide you through developing a User Defined Function in R.
User Defined Functions in R Notes and Limitations


You must first install The R Language Pack for HP Vertica before creating R functions
inside of HP Vertica.
User Defined Functions developed in R always run in Fenced Mode in a process outside of
the main HP Vertica process.
-518-
Developing and Using User Defined Functions



You can create Scalar Functions and Transform Functions using the R language. Other UDx
types are not supported with the R language.
NULL values in HP Vertica are translated to R NA values
http://cran.r-project.org/doc/manuals/R-intro.html#Missing-values when sent to the R
function. R NA values are translated into HP Vertica null values when returned from the R
function to HP Vertica.
R supports different data types than those available in HP Vertica, so data types are mapped
between the two systems. This table details the data-type mapping for R:
HP Vertica Data Type
R Data Type
Boolean
logical
Date/Time:
DATE, DATETIME,
SMALLDATETIME, TIME,
TIMESTAMP, TIMESTAMPZ,
TIMETZ
numeric
Approximate Numeric:
DOUBLE PRECISION, FLOAT,
REAL
numeric
Exact Numeric:
BIGINT, DECIMAL, INT,
NUMERIC, NUMBER, MONEY
numeric
BINARY, VARBINARY
character
CHAR, VARCHAR
character
Installing/Upgrading the R Language Pack for HP Vertica
To use R with HP Vertica, install the RPM (or Debian .deb) R language package that matches your
server version. The R language pack includes the R runtime and associated libraries for
interfacing with HP Vertica.
This topic details:



HP Vertica R Language Pack Prerequisites
Installing the HP Vertica R Language Pack
Upgrading the the HP Vertica R Language Pack
Important: You must install the R Language Pack on each node in the cluster.
-519-
Programmer's Guide
Note: The version of R has been updated in HP Vertica 6.1.2 to use R 3.0. When
upgrading from HP Vertica 6.1.0/6.1.1 (which uses R 2.14) to 6.1.2 or later, see the upgrade
instructions at the end of this topic if you have installed additional R packages. If you do not
follow the upgrade instructions then your additional R packages may not work.
HP Vertica R Language Pack Prerequisites
The R Language Pack RPM requires libgfortran.so.1, which may not be installed by default on
your system. Install the RPM that contains libgfortran.so.1. See the table below to determine how
to install libgfortran.so.1:
Linux Version
How to Install libgfortran
RHE L 6 and CentOS 6
Install the compat-libgfortran-41 RPM with the command: yum
install compat-libgfortran-41
RHE L 5 and CentOS 5
Install the libgfortran RPM with the command: yum install
libgfortran
Other supported platforms that use yum, You can det ermine the RPM needed for libgfortran.so.1 with the
such as Suse11.
command:
yum whatprovides /usr/lib64/libgfortran.so.1
Typical



packages that include libgfortran.so.1 include:
libgfortran-41-<any_minor_version>.rpm
compat-libgfortran-41-<any_minor_version>.rpm;
gcc41-fortran.rpm
To Install the HP Vertica R Language Pack:
1 Download the R language package by going to the myVertica portal http://my.vertica.com/,
clicking the downloads tab, and selecting the vertica-R-lang_-version.rpm (or .deb) file for your
server version. The R language package version must match your server version to three
decimal points. For example, if your server version is 6.1.1, then the R Language Pack version
must also be 6.1.1.
2 Install the package as root or using sudo:
 RPM: rpm -Uvh vertica-R-lang_-version.rpm
 Debian: dpkg -i vertica-R-lang_-version.deb
The installer puts the R binary in /opt/vertica/R. The installer also adds the file
vertica-udx-R.conf to /etc/ld.so.conf.d/. This file is removed if you uninstall the
package.
To Upgrade the HP Vertica R Language Pack:
When upgrading from HP Vertica 6.1.0 or 6.1.1 (which uses R 2.14) to 6.1.2 or later (which uses R
3.0), any R packages that you have manually installed may not work with R 3.0 and may have to
be reinstalled. If you don't update your package(s), then R returns an error if the package cannot
be used in R 3.0: Error: package ‘[package name]’ was built before R 3.0.0: please re-install it.
Instructions for upgrading these packages are below.
Note: The R packages provided in the R Language Pack are automatically upgraded and do
not need to be reinstalled.
-520-
Developing and Using User Defined Functions
1 You must uninstall the R_lang RPM before upgrading the server RPM. Any additional R
packages that you manually installed remain in /opt/vertica/R and are not removed when
you uninstall the RPM.
2 Upgrade your server RPM as detailed in Upgrading HP Vertica to a New Version.
3 After the server RPM has been updated, install the new R Language Pack on each host as
detailed above in To install the HP Vertica R Language Pack.
4 If you have installed additional R packages, on each node:
a) as root run /opt/vertica/R/bin/R and issue the command:
update.packages(checkBuilt=TRUE)
b) Select a CRAN mirror from the list displayed.
c) You are prompted to update each package that has an update available for it. Do NOT
update:
 Rcpp
 Rinside
You can optionally update any other packages installed with the R Language Pack. You
must update any packages that you manually installed and are not compatible with R 3.0.
The packages you selected to be updated are installed. Quit R with the command: quit()
Note: HP Vertica UDx functions written in R do not need to be compiled and you do not need to
reload your HP Vertica-R libraries and functions after an upgrade.
R Packages
The HP Vertica R Language Pack includes the following R packages in addition to the default
packages bundled with R:




Rcpp
RInside
IpSolve
lpSolveAPI
You can install additional R packages not included in the HP Vertica R Language Pack by using
one of two methods. You must install the same packages on all nodes.
1) By using R Language Pack R binary at the command line and using the install.packages() R
command. For example:
# /opt/vertica/R/bin/R
...
> install.packages("Zelig");
2) By running the following command:
/opt/vertica/R/bin/R CMD INSTALL <path-to-package-tgz>
The install places the packages in: /opt/vertica/R/library.
-521-
Programmer's Guide
Using the HP Vertica SDK R Examples
HP Vertica provides example R functions in
/opt/vertica/sdk/examples/RFunctions/RFunctions.R
You can load the examples into HP Vertica with the command:
CREATE LIBRARY rLib AS '/opt/vertica/sdk/examples/RFunctions/RFunctions.R'
LANGUAGE 'R';
You can then load the example functions included in the library. For example:
CREATE FUNCTION mul AS LANGUAGE 'R' NAME 'mulFactory' LIBRARY rLib;
Creating R Functions
To create a user defined R function, you define:



An R Factory Function.
A main R function.
Optionally, an outtypecallback function to define the type(s) and precision of the values
returned to HP Vertica and a parametertype callback function to define parameter names and
types.
About the R Factory Function
User Defined Functions in R require a single factory function. The factory function wraps all of the
information required by HP Vertica to load the user-defined R function. The factory function allows
you to define the following:
Parameter
Description
name
The name of the primary R function in this User Defined Function.
udxtype
The type of User Defined function. Can be either "scalar" or
"transform".
intype
The type(s) of arguments accepted by the function. See the supported
types below.
Note: User Defined Scalar Functions (UDSFs) can have a maximum
limit of 32 arguments. User Defined Scalar Functions (UDSFs) have
no limitation on the number of arguments.
-522-
Developing and Using User Defined Functions
outtype
The type(s) of arguments returned by the function. See the supported
types below.
outtypecallback
(optional) The callback function to call before sending the data back to
HP Vertica. It defines the types and precision that the main function
returns.
parametertypecallback
(optional) The callback function to send parameter types and names
to HP Vertica.
volatility
Indicate whether the function returns the same output given the same
input, can be one of three types.
strictness
Indicate whether the function always returns null when any of its input
arguments is null.
Factory Function - Supported Data Types
The following data types are supported in the factory function:
















boolean
int
float
real
char
varchar
date
datetime
smalldatetime
time
timestamp
timestamptz
timetz
numeric
varbinary
binary
Example Factory Function
The following example is a factory function for a scalar function written in R named mul that takes
two floats and returns a float. It also defines an outtypecallback function named mulReturnType.
-523-
Programmer's Guide
mulFactory <- function()
{
list(name=mul,udxtype=c("scalar"),intype=c("float","float"),
outtype=c("float"), outtypecallback=mulReturnType)
}
About the Main R Function
The main function (which is defined in the name parameter of the factory function) is the first
function called when HP Vertica runs your function. The main function takes as input exactly one
data frame and returns a single data frame. The factory function converts the intype arguments
into a data frame for the primary function. For example, intype=c("float","float") is converted by the
factory function into a two-dimensional matrix. The primary function returns a data frame, and the
data frame is converted into the type defined by the outtype parameter of the factory function.
The following main function corresponds to the factory function defined earlier. It takes a data
frame as input (converted from two floats) and multiplies the two values in each row of the data
frame, then returns a data frame, which is converted to a single column of floats and passed back
to HP Vertica.
mul<- function(x)
{
pr <- x[,1] * x[,2]
pr
}
About the outtypecallback Function
You can optionally create an outtypecallback function to define the type(s), length/precision, and
scale of the data being returned to HP Vertica. You return up to a four-column matrix:
1
2
3
4
data type
length/precision
scale
name of the column in the output (optional)
If any of the columns are left blank (or the outtypecallback function is omitted entirely), then HP
Vertica uses default values.
When creating the outtypecallback function, you define one row for each value returned, using the
same order as the outtypes that were defined in the factory function.
For example, if your function returns a data frame with three columns, containing INT,
VARCHAR(24) and VARCHAR(2) data types, you could define the outtype as:
-524-
Developing and Using User Defined Functions
ret[1,1]
ret[1,4]
ret[2,1]
ret[2,2]
ret[3,1]
ret[3,2]
=
=
=
=
=
=
"int"
"product"
"varchar"
"24"
"varchar"
"2"
The following outtypecallback example mimics the defaults used by HP Vertica. If the
outtypecallback function were omitted, then HP Vertica provides these settings as defaults.
mulReturnType <- function(x)
{
ret = data.frame(datatype = rep(NA,1), length = rep(NA,1), scale =
rep(NA,1), name = rep(NA,1))
ret[1,1] = "float"
ret[1,4] = "product"
ret
}
Deploying the Function into HP Vertica
 To deploy a Scalar Function, see Deploying and Using UDSFs (page 443).
 To deploy a Transform Function, see Deploying and Using User Defined Transforms.
Also see CREATE LIBRARY and CREATE FUNCTION in the SQL Reference Manual.
-525-
Programmer's Guide
Example R Scalar Function
##########
# Example: Multiplication
# Filename: mul.R
##########
###
# @brief multiplies col1 and col2 of the input data frame.
###
mul<- function(x)
{
pr <- x[,1] * x[,2]
pr
}
mulFactory <- function()
{
list(name=mul,udxtype=c("scalar"),intype=c("float","float"),
outtype=c("float"), outtypecallback=mulReturnType)
}
mulReturnType <- function(x)
{
ret = data.frame(datatype = rep(NA,1), length = rep(NA,1), scale =
rep(NA,1), name = rep(NA,1))
ret[1,1] = "float"
ret[1,4] = "Multiplied"
ret
}
Example Usage:
dbadmin=> select * from twocols;
x | y
---+--1 | 1
2 | 2
3 | 3
4 | 4
5 | 5
6 | 6
7 | 7
8 | 8
9 | 9
(9 rows)
dbadmin=> CREATE LIBRARY mulLib AS '/home/dbadmin/mul.R' LANGUAGE 'R';
CREATE LIBRARY
dbadmin=> CREATE FUNCTION mul AS NAME 'mulFactory' LIBRARY mulLib;
-526-
Developing and Using User Defined Functions
dbadmin=> select mul(x,y) from twocols;
Multiplied
----1
4
9
16
25
36
49
64
81
(9 rows)
Setting Null Input and Volatility Behavior for R Functions
Starting in version 6.1, HP Vertica supports defining volatility and null-input settings for functions
written in R. Both settings aid in the performance of your R function:


Volatility settings describe the behavior of the function to the HP Vertica optimizer. For
example, if you have identical rows of input data and you know the function is immutable, then
you can define the function as IMMUTABLE. This tells the HP Vertica optimizer that it can
return a cached value for subsequent identical rows on which the function is called rather than
having the function run on each identical row.
Null input setting determine how to respond to rows that have null input. For example, you can
choose to return null if any inputs are null rather than calling the function and having the
function deal with a NULL input.
Volatility Settings
To indicate your function's volatility, set the volatility parameter of your R factory function to one of
the following values:
Value
Description
VOLA TILE
Repeated calls to the function with the
same input parameters always result in
different values. HP Vertica always calls
volatile functions for eac h invocation.
IMMUTABLE
Calls to the function wit h the same input
parameters always results in the same
output.
-527-
Programmer's Guide
STABLE
Repeated calls to the function with the
same input within the same statement
returns the same output. For ex ample, a
function that returns the current user name
would be stable since the user cannot
change within a statement, but could
change bet ween statements.
DEFAULT_VOLA TILITY
The default volatility. This is the same as
VOLA TILE.
If you do not define a volatility, then the function is considered to be VOLA TILE.
The following example sets the volatility to STABLE in the mulFactory function:
mulFactory <- function()
{
list(name=mulwithparams,udxtype=c("scalar"),intype=c("float","flo
at"), outtype=c("float"), volatility=c("stable"),
parametertypecallback=mulparams )
}
Null Input Behavior
To indicate how y our function reacts to NULL input, set the strictness paramet er of your R factory function to
one of the following values:
Value
Description
CALLE D_ON_NULL_INPUT
The function must be called, even if one
or more input values are NULL.
RE TURN_NULL_ON_NULL_INP
UT
The function always returns a NULL
value if any of its inputs are NULL.
STRICT
A synonym for
RE TURN_NULL_ON_NULL_INPUT
DEFAULT_S TRICTNESS
The default strictness setting. This is
the same as
CALLE D_ON_NULL_INPUT.
If you do not define a null input behavior, then the function is called on every row of data regardless of the
presence of NULLS.
The following example sets the null input behavior to S TRICT in the mulFactory function:
mulFactory <- function()
{
list(name=mulwithparams,udxtype=c("scalar"),intype=c("float","flo
at"), outtype=c("float"), strictness=c("strict"),
parametertypecallback=mulparams )
}
-528-
Developing and Using User Defined Functions
Using Parameters in R
Starting in version 6.1, HP Vertica supports using parameters in functions written in R. Parameters
are passed to the R function with the USING PARAMETERS keyword followed by a series of
key-value pairs. The function accesses parameters as the second argument it receives, which is a
list of key-value pairs.
You also must specify a field in the factory function called parametertypecallback. This field
points to the callback function that defines the parameters expected by the function. The callback
function simply defines a four-column data frame:
1
2
3
4
data type
length/precision
scale
name of the column in the output (optional)
If any of the columns are left blank (or the outtypecallback function is omitted entirely), then HP
Vertica uses default values.
Syntax
SELECT
...]])
SELECT
...]])
<function-name>(<argument> [, ...] [USING PARAMETERS <key>=<value> [,
FROM foo;
<function-name>(<argument> [, ...] [USING PARAMETERS <key>=<value> [,
OVER (...) FROM foo;
Example Usage
In version 6.0, if you created a kmeans function, you would have to hard code the value of K (the
number of clusters in to which to group the data points) into your function. In version 6.1 and later
you can, for example, specify a parameter for the value of K and assign a value to K when you call
the function from within HP Vertica. For example:
SELECT kmeans(geonameid, latitude, longitude, country_code USING PARAMETERS k =
5) OVER (ORDER BY country_code) FROM geotab_kmeans;
Your R function receives everything before 'USING PARAMETERS' as the first dataframe, and the
parameters are contained in the second parameter, which is a list.
-529-
Programmer's Guide
Complete R Example
mykmeansparams <- function(x, y)
{
# error check and get the number of clusters to be formed
# The y argument contains the parameters from the USING PARAMETERS clause in
your SQL function call.
if(!is.null(y[['k']]))
k=as.numeric(y[['k']])
else
stop("Expected parameter k")
# get the number of columns in the input data frame
cols = ncol(x)
# run the kmeans algorithm
cl <- kmeans(x[,2:cols-1], k)
# get the cluster information from the result of above
Result <- cl$cluster
#return result to vertica
Result <- data.frame(VCol=Result1)
Result
}
# call back function to return parameter types
kmeansParameters <- function()
{
params <- data.frame(datatype=rep(NA, 1), length=rep(NA,1),
scale=rep(NA,1),name=rep(NA,1) )
param[1,1] = "int"
param[1,4] = "k"
param
}
# Function that tells vertica the name of the actual R function, the parameter types
and the return types
keansFactoryParams <- function()
{
list(name=mykmeansparams,udxtype=c("transform"),
intype=c("int","float","float","float","float","varchar"), outtype=c("int"),
parametertypecallback=kmeansParameters, volatility=c("stable"),
strict=c("called_on_null_input"))
}
Polymorphic Functions in R
Polymorphism in R functions allow you to easily modify how the function is used on different
datasets. Polymorphic functions in R can accept any number and type of argument that the user
supplies. HP Vertica does not check the number or types of argument that the user passes to the
function - it just passes all of the arguments supplied by the user. It is up to your
polymorphic-function's main function to examine the number and types of arguments it received
and determine if it can handle them.
-530-
Developing and Using User Defined Functions
Note: Transform functions written in R can have an unlimited number of arguments. Scalar
functions written in R are limited to a maximum number of 32 arguments.
Polymorphic functions are more flexible than using multiple factory classes for your function (see
User Defined Function Overloading (page 494)), since your function can determine at run time
if it can process the arguments rather than accepting specific sets of arguments. However, your
polymorphic function needs to perform more work to determine whether it can process the
arguments that it has been given.
Declare the Function as Polymorphic
Your polymorphic R function declares it accepts any number of arguments in its factory's function
by specifying "any" as the argument to the intype parameter and optionally the outtype parameter.
If you define "any" argument for intype or outtype, then it is the only type that your function can
declare for the respective parameter. You cannot define required arguments and then call "any" to
declare the rest of the signature as optional. If your function has requirements for the arguments it
accepts, your process function must enforce them.
For example, this is a factory function that declares that the function is polymorphic:
RFactory <- function()
{
list(
name=RFunction,
udxtype=c("transform"),
intype=c("any"),
outtype=c("any"),
outtypecallback=ReturnType
)
}
Define the outtypecallback for Polymorphic Functions
The outtypecallback method is used to indicate the argument types and sizes it has been called
with, and is expected to indicate the types and sizes that the function returns. The outtypecallback
method can also be used to check for unsupported types and/or number of arguments. For
example, the function may require only integers, with no more than 10 of them. The
outtypecallback function takes in two arguments, the first argument is a four-column dataframe
containing the input types (data type, length/precision, scale, name of the column (optional)), and
the second argument is a named list containing the parameters passed to the function.
-531-
Programmer's Guide
# Determine the return types based on the input types and sizes
polyTopKReturnType <- function(x,y)
{
ret <- NULL
for( i in 2:nrow(x))
{
rbind(ret,x[i,]) -> ret
}
ret
}
K-means Example
The examples below uses the popular Iris Flower Dataset
http://en.wikipedia.org/wiki/Iris_flower_data_set to demonstrate how the R k-means algorithm
clusters the data and how you can use the polymorphic properties of the function to run kmeans
against different columns.
-532-
Developing and Using User Defined Functions
#The k-means ploymorphic algorithm
# Input: A dataframe with 5 columns
# Output: A dataframe with one column that tells the cluster to which each data
point belongs
mykmeansPoly <- function(x,y)
{
# get the number of clusters to be formed
if(!is.null(y[['k']]))
k=as.numeric(y[['k']])
else
stop("Expected parameter k")
# get the number of columns in the input data frame
cols = ncol(x)
# run the kmeans algorithm
cl <- kmeans(x[,2:cols-1], k)
# get the cluster information from the result of above
Result <- cl$cluster
#return result to vertica
Result <- data.frame(VCol=Result)
Result
}
# Function that tells vertica the name of the actual R function, and the polymorphic
parameters
kmeansFactoryPoly <- function()
{
list(name=mykmeansPoly,udxtype=c("transform"), intype=c("any"),
outtype=c("int"),parametertypecallback=kmeansParameters)
}
# call back function to return parameter types
kmeansParameters <- function()
{
params <- data.frame(datatype=rep(NA, 1), length=rep(NA,1),
scale=rep(NA,1),name=rep(NA,1) )
params[1,1] = "int"
params[1,4] = "k"
params
}
To Use the k-means Example
1 Create a table to hold the Iris data:
create table iris (sl FLOAT, sw FLOAT, pl FLOAT, pw FLOAT, spec
VARCHAR(32));
2 Obtain the Iris Flower Dataset http://en.wikipedia.org/wiki/Iris_flower_data_set and copy
it into HP Vertica:
-533-
Programmer's Guide
3
4
5
6
7
wget
http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris
.data
copy iris from '/home/dbadmin/iris.data' DELIMITER ',' EXCEPTIONS
'/home/dbadmin/iris_exceptions.txt';
Copy and paste the example into a file named kmean.r in the dbadmin home directory. Using
vsql, create the library:
create library rlib2 as '/home/dbadmin/kmean.r' language 'R';
Create the function:
create transform function mykmeansPoly as language 'R' name
'kmeansFactoryPoly' library rlib2;
Run the function:
select spec, mykmeansPoly(sl,sw,pl,pw,spec USING PARAMETERS k = 3)
over(partition by spec) as kmean from iris;
Run the function with different number of arguments:
select spec, mykmeansPoly(sl,pl,spec USING PARAMETERS k = 3)
over(partition by spec) as kmean from iris;
To clean up the example library, function, and table:
drop transform function mykmeansPoly();
drop library rlib2;
drop table iris;
Top K per Partition Example
Below is another example of a polymorphic function. This function computes the Top K per
Partition:
-534-
Developing and Using User Defined Functions
# Polymorphic Top K per partition
polyTopK<- function(x,y)
{
# check if the input is valid
if(ncol(x) < 2 )
{
stop("Function takes at least 2 arguments, the first of which must be 'k'")
}
k = x[1,1]
if(!check.integer(k))
{
stop("First argument must be an integer (the 'k' value)")
}
ret = NULL
j = ncol(x)
df = convert.factors.to.strings.in.dataframe(x)
# output first k and break
for(i in 1:nrow(x))
{
if( i > k)
{
break
}
rbind(ret,df[i,2:j])->ret
}
# return the top k tuples
ret
}
# Determine the return types based on the input types and sizes
polyTopKReturnType <- function(x,y)
{
ret <- NULL
for( i in 2:nrow(x))
{
rbind(ret,x[i,]) -> ret
}
ret
}
polyTopKFactory <- function()
{
list(
name=polyTopK,
udxtype=c("transform"),
intype=c("any"),
outtype=c("any") ,
outtypecallback=polyTopKReturnType
)
}
-535-
Programmer's Guide
# check if a number is an integer
check.integer <- function(N){
!length(grep("[^[:digit:]]", as.character(N)))
}
# convert factors in a data frame to strings
convert.factors.to.strings.in.dataframe <- function(dataframe)
{
class.data <- sapply(dataframe, class)
factor.vars <- class.data[class.data == "factor"]
for (colname in names(factor.vars))
{
dataframe[,colname] <- as.character(dataframe[,colname])
}
return (dataframe)
}
To Use the top-k Example
1 Copy and paste the example into a file named polyTopK.r in the dbadmin home directory.
Using vsql, create the library:
create library rlib3 as '/home/dbadmin/polyTopK.R' language 'R';
2 Create the function:
create transform function polyTopK as language 'R' name
'polyTopKFactory' library rlib3;
3 Run the function:
select spec, polyTopK(3,sl,sw,pl,pw) over(partition by spec) from
iris;
4 Run the function with different number of arguments:
select spec, polyTopK(3,sl,pl) over(partition by spec) from iris;
Logistic Regression in R using GLM
The following UDx function provides an example of how to use GLM as a logistical regression
function in HP Vertica using R.
-536-
Developing and Using User Defined Functions
The birthweight_glm.R example uses the Low Infant Birthweight dataset available as part of
the R "MASS" library. The example computes odds, ratios and confidence intervals of a low
birthweight condition based on the value of the predictors.
The dataset contains the following information:
Column
Details (value of 1 = true unless otherwise noted)
low
Whether birth weight is less than 2.5kg.
age
Age of mother in years.
lwt
The mother's weight in lbs.
race
Mothers race (1 = white, 2 = black, 3 = other).
smoke
Whether the mother smoked during pregnancy.
ht
History of hy pert ension.
ui
Presence of uterine irritability.
ft v
Number of physician visits during the first trimester.
ptl
Number of previous premature labors.
bwt
Birth weight in grams.
-537-
Programmer's Guide
Logistic Regression GLM Example (birthweight_glm.R)
glmBirthweight <- function(birthwt)
{
names(birthwt)<-c('low','age','lwt','race','smoke','ptl','ht','ui','ftv');
# Add race2 to clarify what the integer values mean
birthwt$race2<-factor(birthwt$race,labels=c("white","black","other"));
# Logistic Regression model predicts the low birthwt condition given the
predictors
bwt<-glm(low~age+lwt+race2+smoke+ptl+ht+ui+ftv,data=birthwt,family=binomial);
# Gather and format the model output
sum.coef<-summary(bwt)$coef;
qq<-data.frame(sum.coef);
# compute odds ratios
qq$odds<-exp(qq$Estimate);
# compute confidence intervals
qq$upper.ci<-exp(qq$odds+1.96*qq$Std..Error);
qq$lower.ci<-exp(qq$odds-1.96*qq$Std..Error);
# Add the predictors to the dataframe for printing
qq$Parameters<-row.names(qq);
row.names(qq)<-NULL;
# return the object
qq
}
glmBirthweightFactory <- function()
{
list(name=glmBirthweight,udxtype=c("transform")
,intype=c("int","int","int","int","int","int","int","int","int")
,outtype=c("float","float","float","float","float","float","float","varchar")
,outnames=c("Estimate","Std..Error","z.value","Pr.z",
"odds","upper.ci","lower.ci","Parameters"))
}
Running the Example
You can obtain the example data from the web at:
http://www.statlab.uni-heidelberg.de/data/linmod/birthweight.data
http://www.statlab.uni-heidelberg.de/data/linmod/birthweight.data
For example:
wget http://www.statlab.uni-heidelberg.de/data/linmod/birthweight.data
In VSQL, create the table for the dataset and load the data:
-538-
Developing and Using User Defined Functions
create table births_ex(low int,age int,lwt int,race int,smoke int,ptl int,ht
int,ui int,ftv int,bwt int);
copy births_ex2 from '/home/dbadmin/infant_birthweight.dat';
Then copy birthweight_glm.R to dbadmin's home directory, and create the library and function:
create library BirthweightLib as '/home/dbadmin/birthweight_glm.R' language 'R';
create transform function glmBirthweight as language 'R' name
'glmBirthweightFactory' library BirthweightLib;
Finally, run the function:
SELECT glmBirthweight(low,age,lwt,race,smoke,ptl,ht,ui,ftv) OVER() FROM births_ex;
Estimate
|
Std..Error
|
z.value
|
Pr.z
|
odds
|
upper.ci
|
lower.ci
| Parameters
---------------------+---------------------+-------------------+--------------------+------------------+------------------+-------------------+------------0.48062320498266 |
1.19688757611052 | 0.401560860498295 | 0.688007239426341 |
1.61708186170698 | 52.6139614264981 | 0.482479045833905 | (Intercept)
-0.0295490268887551 | 0.0370307976710382 | -0.79795815232642 | 0.424894775128354 |
0.97088327708793 | 2.8390333954452 | 2.45543249224841 | age
-0.0154242839432244 | 0.00691924796727926 | -2.22918502359866 | 0.0258015965896862 |
0.984694061080653 | 2.71354469870095 | 2.64093318816631 | lwt
1.27225979472379 |
0.527357257587404 | 2.41251974144477 | 0.0158426794613584 |
3.5689084570941 | 99.7361981679225 | 12.6200678340669 | race2black
0.880495922911246 |
0.440777655513404 | 1.99759654759648 | 0.045760417849372 |
2.41209562333513 | 26.4702067818639 | 4.70286274760947 | race2other
0.938845698827358 |
0.402146887474267 | 2.33458402407114 | 0.0195651619400098 |
2.55702813359408 | 28.3672704897164 | 5.86393121310429 | smoke
0.543337030596766 |
0.345403019680752 | 1.57305234649935 | 0.115706697836339 |
1.72174279475664 | 11.0092157041215 | 2.84269593896983 | ptl
1.86330286760883 |
0.697533131006852 |
2.6712750760946 | 0.0075563695198027 |
6.44498860020803 | 2470.44490524196 | 160.424537944263 | ht
0.767648144937079 |
0.45931793280184 |
1.6712784111311 | 0.0946666986379542 |
2.15469276497994 | 21.2202499748911 | 3.50583813238038 | ui
0.0653018343579497 |
0.172393823524508 | 0.378794512604253 |
0.70484046257016 |
1.06748117825926 | 4.07704749133413 | 2.07422851417719 | ftv
(10 rows)
Updating UDF Libraries
There are two cases where you need to update libraries that you have already deployed:


When you have upgraded HP Vertica to a new version that contains changes to the SDK API.
For your libraries to work with the new server version, you need to recompile them with new
version of the SDK. See UDF Library Compatibility with New Server Versions (page 540)
for more information.
When you have made changes to your UDF functions and you want to deploy these changes.
Before updating your UDF library, you need to determine if you have changed the signature of
any of the functions contained in the library. If you have, you need to drop the functions from
the HP Vertica catalog before you update the library.
-539-
Programmer's Guide
UDF Library Compatibility with New Server Versions
When you compile your UDF library, you link it with the HP Vertica SDK version that accompanies
the version of the HP Vertica server you have installed on your cluster. When you upgrade your
HP Vertica database to a new server version, the UDF libraries defined in your HP Vertica catalog
may be incompatible with the new server if the HP Vertica SDK changed between the two server
versions. Libraries linked against an old version of the SDK do not work with a server that uses a
new version of the SDK.
Note: Since the R language is interpreted, UDFs written in R are not linked to the HP Vertica
SDK. Therefore, changes to the HP Vertica SDK between HP Vertica versions will not
necessarily make your R-based UDF libraries incompatible with a new server version. They will
only need to be updated if the APIs in the SDK that your UDFs call actually change (i.e. change
the number or data types of their arguments).
Changes to the SDK usually only take place between major version upgrades (from version 6.0
and version 6.1, for example). Usually, there is no change to the SDK between minor server
updates (for example, from version 6.1.x.0 to 6.1.x.1).
Note: In rare cases, there may be changes to the SDK between minor versions of the HP
Vertica server. You should review the HP Vertica release notes before performing an upgrade
to determine if your old UDF libraries will need to be recompiled to work with the new server
version.
If you upgrade your HP Vertica server to a new version and there was no change to the SDK
between the two versions of HP Vertica, you UDF libraries will continue to work without you having
to take any further steps. If the SDK of the new server version is incompatible with the old SDK,
you need to recompile and redeploy the UDF library before your UDFs will work. In the meantime,
any attempt to use UDFs defined in the incompatible library result in an error message:
ERROR 2858: Could not find
HINT: This usually happens
with the wrong SDK version,
or function. Try recreating
function definition
due to missing or corrupt libraries, libraries built
or due to a concurrent session dropping the library
the library and function
If your UDF library is incompatible with the new version of the new version of the HP Vertica
server, you must:
1 Recompile your UDF library using the new version of the HP Vertica SDK. See Compiling
Your UDF (page 493) for more information.
2 Deploy the new version of your library. See Deploying A New Version of Your UDF Library
(page 541).
-540-
Developing and Using User Defined Functions
Determining if a UDF Signature Has Changed
You need to be careful when making changes to UDF libraries that contain functions you have
already deployed in your HP Vertica database. When you deploy a new version of your UDF
library, HP Vertica does not ensure that the signatures of the functions that are defined in the
library match the signature of the function that is already defined in the HP Vertica catalog. If you
have changed the signature of a UDF in the library then update the library in the HP Vertica
database, calls to the altered UDF will produce errors.
Making any of the following changes to a UDF alters its signature:





Changing the number of arguments accepted or the data type of any argument accepted by
your function (not including polymorphic functions).
Changing the number or data types of any return values or output columns.
Changing the name of the factory class that HP Vertica uses to create an instance of your
function code.
Changing the null handling or volatility behavior of your function.
Removed the function's factory class from the library completely.
The following changes do not alter the signature of your function, and do not require you to drop
the function before updating the library:

Changing the number or type of arguments handled by a polymorphic function. HP Vertica
does not process the arguments the user passes to a polymorphic function.
 Changing the the name, data type, or number of parameters accepted by your function. The
parameters your function accepts are not determined by the function signature. Instead, HP
Vertica passes all of the parameters the user included in the function call, and your function
processes them at runtime. See UDF Parameters (page 502) for more information about
parameters.
 Changing any of the internal processing performed by your function.
 Adding new UDFs to the library.
After you drop any functions whose signatures have changed, you load the new library file, then
re-create your altered functions. If you have not made any changes to the signature of your UDFs,
you can just update the library file in your HP Vertica database without having to drop or alter your
function definitions. As long as the UDF definitions in the HP Vertica catalog match the signatures
of the functions in your library, function calls will work transparently after you have updated the
library. See Deploying A New Version of Your UDF Library (page 541).
Deploying A New Version of Your UDF Library
You need to deploy a new version of your UDF library if:


You have made changes to the library that you now want to roll out to your HP Vertica
database.
You have upgraded your HP Vertica to a new version whose SDK is incompatible with the
previous version.
-541-
Programmer's Guide
The process of deploying a new version of your library is similar to deploying it initially.
1 If you are deploying a UDF library developed in C++, you must compile it with the current
version of the HP Vertica SDK. See Compiling Your UDF (page 493) for details.
2 Copy your UDF library's Linux shared object (.so) file or R source file to a host in your HP
Vertica database.
3 Connect to the host using vsql.
4 If you have changed the signature of any of the functions in the shared library, you must drop
these functions using DROP FUNCTION. If you are unsure whether any of the signatures of
your functions have changed, see Determining if a UDF Signature Has Changed (page
541).
Note: If all of the function signatures in your library have changed, you may find it more
convenient to drop the library using the DROP LIBRARY statement with the CASCADE option
to drop the library and all of the functions that reference it. This can save you the time it would
take to drop each function individually. You can then reload the library and recreate all of the
functions using the same process you used to deploy the library in the first place. See
CREATE LIBRARY in the SQL Reference Manual.
5 Use the ALTER FUNCTION statement to update the UDF library definition with the file you
copied in step 1. For example, if you want to update the library named ScalarFunctions with a
file named ScalarFunctions-2.0.so in the dbadmin user's home directory, you could use
the command:
ALTER LIBRARY ScalarFunctions AS
'/home/dbadmin/ScalarFunctions-2.0.so';
Once you have updated the UDF library definition to use the new version of your shared
library, the functions defined in the UDF library begin using the new shared library file without
any further changes.
6 If you had to drop any functions in step 3, recreate them using the new signature defined by the
factory classes in your library. See CREATE FUNCTION Statements in the SQL Reference
Manual.
Listing the UDxs Contained in a Library
Once a library has been loaded using the CREATE LIBRARY statement, you can find the UDFs
and UDLs it contains by querying the USER_LIBRARY_MANIFEST system table:
=> CREATE LIBRARY ScalarFunctions AS '/home/dbadmin/S calarFunctions.so';
CREATE LIBRARY
=> \x
Expanded display is on.
=> SELECT * FROM USER_LIBRARY_MANIFEST WHERE lib_name = 'ScalarFunctions';
-[ RECORD 1 ]------------------schema_name | public
lib_name
| ScalarFunctions
lib_oid
| 45035996273792402
obj_name
| RemoveSpaceFactory
obj_type
| Scalar Function
arg_types
| Varchar
return_type | Varchar
-[ RECORD 2 ]------------------schema_name | public
lib_name
| ScalarFunctions
-542-
Developing and Using User Defined Functions
lib_oid
obj_name
obj_type
arg_types
return_type
-[ RECORD 3
schema_name
lib_name
lib_oid
obj_name
obj_type
arg_types
return_type
| 45035996273792402
| Div2intsInfo
| Scalar Function
| Integer, Integer
| Integer
]------------------| public
| ScalarFunctions
| 45035996273792402
| Add2intsInfo
| Scalar Function
| Integer, Integer
| Integer
The obj_name column lists the factory classes contained in the library. These are the names you
use to define UDFs and UDLs in the database catalog using statements such as CREATE
FUNCTION and CREATE SOURCE.
-543-
Using the Hadoop Connector
Hadoop is a software platform for performing distributed data processing. HP has created an
interface between HP Vertica and Hadoop that lets you take advantage of both platform's
strengths. Using the interface, a Hadoop application can access and store data in an HP Vertica
database.
You use the Hadoop Connector when you want Hadoop to be able to read data from and write
data to HP Vertica. If you want HP Vertica to import data stored in a Hadoop Distributed File
System (HDFS), use the HDFS Connector or the Secure HDFS Connector instead. See Using
the Hadoop Distributed File System Connectors (page 575).
Prerequisites
Before you can use HP Vertica's Hadoop Connector, you must install and configure Hadoop and
be familiar with developing Hadoop applications. For details on installing and using Hadoop,
please see the Apache Hadoop Web site http://hadoop.apache.org/.
See the Supported Platforms document at http://www.vertica.com/documentation
(http://www.vertica.com/documentation) for a list of the versions of Hadoop and Pig that the
connector supports.
How Hadoop and HP Vertica Work Together
Hadoop and HP Vertica share some common features—they are both platforms that use clusters
of hosts to store and operate on very large sets of data. Their key difference is the type of data
they work with best. Hadoop is best suited for tasks involving unstructured data such as natural
language content. HP Vertica works best on structured data that can be loaded into database
tables.
HP Vertica's Hadoop Connector lets you use these two platforms together to take advantage of
their strengths. With the connector, you can use data from HP Vertica in a Hadoop job, and store
the results of a Hadoop job in an HP Vertica database. For example, you can use Hadoop to
extract and process keywords from a mass of unstructured text messages from a social media
site, turning the material into structured data that is then loaded into an HP Vertica database. Once
you have loaded the data into HP Vertica, you can perform many different analytic queries on the
data far faster than by using Hadoop alone.
For more on how Hadoop and HP Vertica work together, see the HP Vertica Map Reduce Web
page http://www.vertica.com/mapreduce.
-544-
Using the Hadoop Connector
Note: HP Vertica supports two different programming interfaces—user defined functions
(UDFs), and external procedures—which can handle complex processing tasks that are hard to
resolve using SQL. You should consider using either of these interfaces instead of Hadoop if
possible, since they do not require the overhead of maintaining a Hadoop cluster.
Since code written to use either user defined functions or external procedures runs on the HP
Vertica cluster, they can impact the performance of your database. In general, if your
application has straightforward operation that is hard to do in SQL, you should consider using
UDFs or external procedures rather than resorting to Hadoop. If your application needs to
perform significant data extraction and processing, Hadoop is often a better solution, because
the processing takes place off of the HP Vertica cluster.
Hadoop and HP Vertica Cluster Scaling
Nodes in the Hadoop cluster connect directly to HP Vertica nodes when retrieving or storing data,
allowing large volumes of data to be transferred in parallel. If the Hadoop cluster is larger than the
HP Vertica cluster, this parallel data transfer can negatively impact the performance of the HP
Vertica database.
To avoid performance impacts on your HP Vertica database, you should ensure that your Hadoop
cluster cannot overwhelm your HP Vertica cluster. The exact sizing of each cluster depends on
how fast your Hadoop cluster generates data requests and the load placed on the HP Vertica
database by queries from other sources. A good rule of thumb to follow is for your Hadoop cluster
to be no larger than your HP Vertica cluster.
Hadoop Connector Features
HP Vertica's Hadoop Connector:




gives Hadoop access to data stored in HP Vertica.
lets Hadoop store its results in HP Vertica. The Connector will create a table for the Hadoop
data if it does not already exist.
lets applications written in Apache Pig http://pig.apache.org/ access and store data in HP
Vertica.
works with Hadoop streaming http://wiki.apache.org/hadoop/HadoopStreaming.
The Hadoop Connector runs on each node in the Hadoop cluster, so the Hadoop nodes and HP
Vertica nodes communicate with each other directly. Direct connections allow data to be
transferred in parallel, dramatically increasing processing speed.
The Connector is written in Java, and is compatible with all platforms supported by Hadoop.
Note: To prevent Hadoop from potentially inserting multiple copies of data into HP Vertica, the
Hadoop Connector disables Hadoop's speculative execution feature.
Hadoop Connector Installation Procedure
Follow these steps to install HP Vertica's Hadoop Connector:
-545-
Programmer's Guide
If you have not already done so, download the Hadoop Connector package from myVertica
portal http://my.vertica.com/. Be sure to download the package that is compatible with your
version of Hadoop. You can find your Hadoop version by issuing the following command on your
Hadoop node:
# hadoop version
You will also need a copy of the HP Vertica JDBC driver, which you can also download from the
myVertica portal http://my.vertica.com/.
You need to perform the following steps on each node in your Hadoop cluster:
1 Copy the Hadoop Connector .zip archive you downloaded to a temporary loca