DB2 z/OS Text
Search at Westfield
Insurance
About Westfield Insurance
• Property and casualty insurance for over 167 years
• One of the largest non-public companies in Ohio
• Property and Casualty Insurance and Banking
• Over 3.9 billion in assets/1.4 billion in written premium
• Network of more that 1,200 leading independent agencies
• 2500 employees in 42 Service Offices servicing 49 states
About Westfield Insurance
For Additional Information Please Visit Our Websites at:
www.westfieldinsurance.com
www.westfield-bank.com
System Info
• z/OS 1.13 operating system on IBM BC12 hardware
• DB2 z/OS V10
• 4 subsystems
• No data sharing
• Red Hat Enterprise Linux server release 6.6
• 2 CPU – 2.70GHz
• 8GB memory
• 60GB Disk
Objectives
• Understand the capabilities of Text Search
• Understand what is required to install and maintain Text
Search
• Learn about Westfield’s experiences and usage of Text Search
Topics
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Text Search Overview
Install on linux/Windows servers
Install additional software on DB2 z/OS
Configure on DB2 z/OS
The Search Application
Define searchable columns
Index build and update
Westfield’s Production Use – Hybrid
Text Search administration – DB2 z/OS
Text Search administration – Linux/Windows
Overview
TOPIC 1
Documentation
• “IBM Text Search for DB2 for z/OS Installation, Administration,
and Reference” manual
• In the same list as the other DB2 V10 manuals
What is DB2 Text Search?
•
•
•
•
DB2 SQL functions to provide advanced search capabilities
Search Text (CHAR & VARCHAR), XML, and other file types
Indexing and search engine on Linux or Windows server(s)
Flexible search criteria
• AND, OR, NOT, Wildcards, Synonyms
• Scoring (some hits are better than others)
• Integrated into DB2 optimizer
CONTAINS Function
SELECT
FROM
WHERE
‘
‘
<column list>
<table>
CONTAINS( <search column>,
<search terms> ‘,
<optional parms> ’ ) = 1
SEARCH TERMS (the common ones)
Terms:
One or more words (not case sensitive)
Operators:
AND, OR, NOT
Wildcards:
* matches any char(s) at beginning, middle, or end
Phrases:
enclose phrase in double quotes
Grouping:
normal use of paren’s
OPTIONAL PARMS
RESULTLIMIT=n
limits the number of rows returned
SYNONYM=ON|OFF
CONTAINS – More cool stuff
Fuzzy search
Match similar terms
Ex:
analyze~
Proximity search
Find terms within a specified number of words
Ex:
“ rock roll* ”~3
matches: analyze, analytics, analysis, etc
will find:
“rock and roll”
“rolling rock”
but not:
“to be a rock and not to roll”
Single Wildcard
?
Sample CONTAINS
SELECT *
FROM MYTABLE
WHERE CONTAINS
(MYCOLUMN,
‘abcd* 1299 xyzz’,
‘RESULTLIMIT = 100’ ) = 1
SCORE
•
•
•
•
•
Same syntax and terms as CONTAINS
Measure how well a row matches the search terms
Returns a value between 0 and 1
Can be used to ORDER BY
Can weight certain terms (see manual)
Sample SCORE
SELECT
SCORE(MYCOLUMN, ‘abcd 1299 xyzz’ )
,MYCOLUMN
FROM MYTABLE
WHERE CONTAINS(MYCOLUMN, ‘abcd 1299 xyzz’ ) = 1
ORDER BY 1 DESC
DB2 z/OS
Text Search
Admin DB
Linux/Windows
server
ROWID, search text
CALL
SYSPROC.
SYSTS_UPDATE
(<parms>)
Collection file BLOB
ROWID, search text
Collection file
master table
Search
Table
Text search
Collection file
DB2 z/OS
Linux/Windows
server
Text Search
Admin DB
Search terms
SELECT …
FROM …
WHERE
CONTAINS(srch_col
,search terms) =1
List of ROWID’s
ROWID, data
Collection file
master table
Search
Table
Text search
Collection file
Install on Linux or
Windows
Topic 2
Install on LINUX or Windows
•
•
•
•
Download from IBM website
Install on linux VM
AND/OR
Install on zlinux ZVM
• Uses hipersockets (Avoids network)
• Westfield’s “Primary” server
• OR install on Windows
Initialize the server
cd /opt/IBM/ECMTextSearch/bin
sh configTool.sh configureHTTPListener -configPath ../config
-adminHTTPPort 8191
Get authentication token and encryption key
sh configTool.sh printToken -configPath ../config
Set option to autostart
See manual for details.
Install on DB2 z/OS
Topic 3
Required DB2 objects are (or could be) already defined. If not, run DB2 install
job specified in the manual.
Databases / tables
SYSIBMTA (DB for Admin tables)
SYSIBMTS.SYSTEXTCOLUMNS
SYSIBMTS. SYSTEXTCONFIGURATION
SYSIBMTS.SYSTEXTCONNECTINFO
SYSIBMTS.SYSTEXTDEFAULTS
SYSIBMTS.SYSTEXTINDEXES
SYSIBMTS.SYSTEXTLOCKS
SYSIBMTS.SYSTEXTSERVERHISTORY
SYSIBMTS.SYSTEXTSERVERS
SYSIBMTS.SYSTEXTSTATUS
SYSIBMTS.SYSTLOB1
SYSIBMTS (DB for Text Index tables when defined)
Stored Procedures
SYSPROC. SYSTS_ALTER
SYSPROC.SYSTS_CREATE
SYSPROC.SYSTS_DROP
SYSPROC.SYSTS_RESTORE
SYSPROC.SYSTS_START
SYSPROC.SYSTS_STOP
SYSPROC.SYSTS_TAKEOVER
SYSPROC.SYSTS_UPDATE
Function
SYSFUN. SYSTS_ENCRYPT
Install on DB2 z/OS
• Install DB2 Accessory Suite - FMID H2AF210 (plus J2AG210)
• New load modules in SDSNLOAD
• Create java runtime dataset
• Use DB2 install job in SDSNSAMP
• TSO ISHELL to find USS directories
• /usr/lpp/db2/*
• /usr/lpp/java/*
• Used by Java encryption function
Install on DB2 z/OS
• Create WLM environment for Java Function
• Alter SYSFUN. SYSTS_ENCRYPT to use it
• Create WLM environment for Admin Stored Procedures
• Alter SYSROC.SYSTS* to use it
Java WLM started task example
//DB2PWLMJ PROC RGN=0K,APPLENV=XXXXXXXX,DB2SSN=DB2P
//IEFPROC EXEC PGM=DSNX9WLM,REGION=&RGN,TIME=NOLIMIT,
//
PARM='&DB2SSN,1,&APPLENV'
//STEPLIB DD DISP=SHR,DSN=CEE.SCEERUN
//
DD DISP=SHR,DSN=<hlq>.SDSNEXIT
//
DD DISP=SHR,DSN=<hlq>.SDSNLOAD
//
DD DISP=SHR,DSN=<hlq>.SDSNLOD2
//JAVAENV DD DISP=SHR,DSN=<hlq>.JSPENV
//SYSPRINT DD SYSOUT=*
//CEEDUMP DD SYSOUT=*
Admin Stored Procedure WLM started task example
//DB2PWLM1 PROC RGN=0K,APPLENV=XXXXXXXX,DB2SSN=DB2P,NUMTCB=8
//IEFPROC EXEC PGM=DSNX9WLM,REGION=&RGN,TIME=NOLIMIT,
//
PARM='&DB2SSN,&NUMTCB,&APPLENV'
//STEPLIB DD DISP=SHR,DSN=<hlq>.SDSNLOAD
//
DD DISP=SHR,DSN=<hlq>.SDSNLOD2
• Create surrogate RACF/ACF2 UserID
• Performs DB2 z/OS tasks initiated from the linux/windows
server(s)
• Define security
• Must be able to SELECT, INSERT, UPDATE, DELETE Text Search
‘catalog’ tables (SYSIBMTS.*)
Configure on DB2
z/OS
Topic 4
Configuration &
Administration
• Done by using DB2 stored procedures
• Some stored procedures need parms, so need a way to
execute a stored procedure w/parms
• DB2 Connect Command Editor
• Custom REXX code
•
Refer to manual for more options and details
Define the Text Search servers
• Use the Authentication Token and Encryption Key created
during the server install(s)
zLinux definition
INSERT INTO SYSIBMTS.SYSTEXTSERVERS
(SERVERNAME,SERVERPORT,SERVERAUTHTOKEN, SERVERMASTERKEY)
VALUES
(‘< ip_addr of hipersockets>', 8191, ‘<auth token>', ’<encryption_key>');
VM Linux definition
INSERT INTO SYSIBMTS.SYSTEXTSERVERS
(SERVERNAME,SERVERPORT,SERVERAUTHTOKEN, SERVERMASTERKEY)
VALUES
(‘< server name >', 8191, ‘<auth token>', ’<encryption_key>');
Define “connect back”
Information
• Use surrogate ID created earlier
INSERT INTO SYSIBMTS.SYSTEXTCONNECTINFO
(DB2HOSTNAME,DB2SERVICEPORT,DB2UID)
VALUES
(‘<mainframe name>', '446', ‘<surrogate ID>');
Encrypt Password for
surrogate ID
• Use stored procedure defined earlier
• Run once for each server
UPDATE SYSIBMTS.SYSTEXTSERVERS
SET DB2ENCRYPTEDPW =
SYSFUN.SYSTS_ENCRYPT(‘<password>',SERVERMASTERKEY)
WHERE SERVERNAME = ‘<server name>’
DB2 Configuration
• CALL SYSPROC.SYSTS_START();
• Initial connection to server(s) and populates remaining columns
in SYSTEXTSERVERS
• Assigns SERVERID in SYSIBMTS.SYSTEXTSERVERS
Westfield’s Search
Application
- a twisted path
Topic 5
Search Application
• New Claims Management System
• Policy data replicated from Legacy system to DB2 z/OS tables
• Policy information needed for claims (coverages, limits,
deductibles, etc.)
• Claimant may have only partial policy identification
information when claim is initiated
• Need ability to search to find the correct policy to start a claim
Policy Search Requirements
• Policy data going back 7 years
• Search policies based on names and addresses as stored in the
legacy system
• Search policy holder names or ‘other insured’ names (like
drivers)
• Search business names
• Search any part of the address
• Anything can include a wildcard after 2 or 3 characters
• 22 million rows
Sample Search Criteria
First Name
Last Name
Business Name
Street
City
State
ZIP
Agency
__________________
ADD*______________
__________________
__________________
CLE*______________
__________________
__________________
__________________
POLICY_ID
Policy Document Data
A001
Gomez and Morticia Addams 0001 Cemetery Lane
Cleveland Oh 44251 Lurch Insurance Agency
A001
Pugsley Addams 0001 Cemetery Lane Cleveland Oh
44251 Lurch Insurance Agency
A001
Wednesday Addams 0001 Cemetery Lane Cleveland
Oh 44251 Lurch Insurance Agency
A075
Robert McGee c/o Bobbys Plumbing LLC 215 Main St
Salinas Ca Joplin Co Insurance
A567
County of Summit Board of Confusing Names 1777
Cleveland Ave Akron Oh Lurch Insurance Agency
A700
Stephen G Cleveland Whitehouse Pennsylvania Ave
Washington DC Capitol Insurance Agency
A800
Andrew Johnson 1865 Woodshed Ave Columbia TN
Lurch Insurance Agency
…….
< 22 million more rows >
Policy Search – Option 1
• Each searchable piece of data in its own column and lots of
regular DB2 Indexes
• First name, last name, city, state, zip, street name, address
number, company name, agency name, etc.
• Many indexes
CREATE TABLE POLICY_SEARCH
POLICY_ID
CHAR(10)
FIRST_NAME
CHAR(30)
LAST_NAME
CHAR(30)
STREET
CHAR(30)
CITY
CHAR(20)
STATE
CHAR(2)
ZIP
CHAR(10)
BUSINESS_NAME VARCHAR(250)
AGENCY_NAME
CHAR(100)
etc.
etc.
CREATE INDEX
LAST_NAME, FIRST_NAME, STREET, CITY
CREATE INDEX
LAST_NAME, STREET, CITY, FIRST_NAME
CREATE INDEX
STREET, CITY, LAST_NAME, FIRST_NAME
CREATE INDEX
CITY, LAST_NAME, STREET
Etc.
Etc.
Etc.
CREATE_INDEX
BUSINESS_NAME
POLICY LAST_NA
_ID
ME
FIRST_NAM
E
STREET
CITY
A001
Addams
Gomez
Cemetery
Cleveland
A001
Addams
Morticia
Cemetery
Cleveland
A001
Addams
Pugsley
Cemetery
Cleveland
A001
Addams
Wednesday
Cemetery
Cleveland
A075
McGee
Robert
Main
Salinas
Robert McGee c/o
Bobbys
Plumbing....
Cleveland
Akron
County of Summit
Board of …..
Washing…
A567
A700
Cleveland Stephen
Pennsyl…
A800
Johnson
Woodshed Columbia
Andrew
BUSINESS
Nightmare Search #1
First Name
Last Name
Business Name
Street
City
State
ZIP
Agency
__________________
JOHN*_____________
__________________
WOOD*____________
COL*_____________
__________________
__________________
__________________
SQL
SELECT *
WHERE
AND
AND
FROM POLICY_SEARCH
LAST_NAME
LIKE ‘JOHN%’
CITY
LIKE ‘COL%’
STREET
LIKE ‘WOOD%’
---------------------------------------------------------------------------STATS
POLICY_SEARCH
LAST_NAME
CITY
STREET
22 million rows
‘JOHN%’
‘COL%’
‘WOOD%’
101K rows
555K rows
237K rows
• How would the DB2 optimizer choose to do this?
• Multiple Indexes with huge RID lists?
• Index scans?
• Index filtering?
• BUT… that is not an unreasonable search request
• 30 row result set
Nightmare #2
• Any search of company name
• Company Name on policy is not always the common name
• Policies are legal documents and include strange stuff
First Name
Last Name
Business Name
Street
City
State
ZIP
Agency
__________________
__________________
Bobbys Plumbing____
__________________
__________________
__________________
__________________
__________________
POLICY LAST_NA
_ID
ME
FIRST_N STREET
AME
CITY
A001
Addams
Gomez
Cemetery
Cleveland
A001
Addams
Morticia Cemetery
Cleveland
A001
Addams
Pugsley
Cemetery
Cleveland
A001
Addams
Wednes
Cemetery
Cleveland
A075
McGee
Robert
Main
Salinas
Robert McGee c/o
Bobbys Plumbing…..
A567
Cleveland
Akron
County of Summit …..
A700
Cleveland Stephen Pennsyl…
Washing…
BUSINESS
SQL:
SELECT * FROM POLICY_SEARCH
WHERE COMPANY_NAME like ‘%BOBBYS PLUMBING%’
RESULT:
Index Scan of 22 million indexes entries
First Name
Last Name
Business Name
Street
City
State
ZIP
Agency
__________________
__________________
Summit County Board *
__________________
__________________
__________________
__________________
__________________
POLICY LAST_NA
_ID
ME
FIRST_N STREET
AME
CITY
A001
Addams
Gomez
Cemetery
Cleveland
A001
Addams
Morticia Cemetery
Cleveland
A001
Addams
Pugsley
Cemetery
Cleveland
A001
Addams
Wednes
Cemetery
Cleveland
A075
McGee
Robert
Main
Salinas
Robert McGee c/o
Bobbys Plumbing…..
A567
Cleveland
Akron
County of Summit …..
A700
Cleveland Stephen Pennsyl…
Washing…
BUSINESS
SQL:
SELECT * FROM POLICY_SEARCH
WHERE COMPANY_NAME like ‘%SUMMIT COUNTY BOARD%’
RESULT:
Index Scan of 22 million indexes entries
And no rows returned
Policy Search – Option 1
•
•
•
•
Business view of searching <> DBA view
Need lots of rules on search criteria entered by users
Poor performance for some searches
Policies not found for some searches
Policy Search – Option 2
• DB2 Text Search
• Heard of it, but can’t find anyone using it
• Skeptical
•
•
•
•
•
•
•
•
Does it work?
Is it for document searches only?
How does an application use it?
Response time for large searches?
Availability?
Reliability?
Data updates?
Administration?
CREATE TABLE POLICY_SEARCH
POLICY_ID
CHAR(10)
…
SEARCH_TEXT
VARCHAR(2000)
…
ROWID_COL
ROWID
POLICY_ID
SEARCH_TEXT
A001
Gomez and Morticia Addams 0001 Cemetery Lane
Cleveland Oh 44251 Lurch Insurance Agency
A001
Pugsley Addams 0001 Cemetery Lane Cleveland Oh
44251 Lurch Insurance Agency
A001
Wednesday Addams 0001 Cemetery Lane Cleveland
Oh 44251 Lurch Insurance Agency
A075
Robert McGee c/o Bobbys Plumbing LLC 215 Main St
Salinas Ca Joplin Co Insurance
A567
County of Summit Board of Confusing Names 1777
Cleveland Ave Akron Oh Lurch Insurance Agency
A700
Stephen G Cleveland Whitehouse Pennsylvania Ave
Washington DC Capitol Insurance Agency
A800
Andrew Johnson 1865 Woodshed Ave Columbia TN
Lurch Insurance Agency
…….
< 22 million more rows >
First Name
Last Name
Business Name
Street
City
State
ZIP
Agency
__________________
add*______________
__________________
cem*_____________
__________________
__________________
__________________
__________________
SELECT *
FROM POLICY_SEARCH
WHERE CONTAINS(SEARCH_TEXT,
‘add* cem*’,
‘RESULTLIMIT = 100’ ) = 1
POLICY_ID SEARCH_TEXT
A001
Gomez Morticia Addams 0001 Cemetery Lane
Cleveland Oh 44251 Lurch Insurance Agency
A001
Pugsley Addams 0001 Cemetery Lane Cleveland
Oh 44251 Lurch Insurance Agency
A001
Wednesday Addams 0001 Cemetery Lane
Cleveland Oh 44251 Lurch Insurance Agency
All these run in < 0.25 second
SELECT *
FROM POLICY_SEARCH
WHERE CONTAINS(SEARCH_TEXT,
‘JOHN* WOOD* COL*’
‘RESULTLIMIT = 100’ ) = 1
SELECT *
FROM POLICY_SEARCH
WHERE CONTAINS(SEARCH_TEXT,
‘BOBBYS PLUMBING’
‘RESULTLIMIT = 100’ ) = 1
SELECT *
FROM POLICY_SEARCH
WHERE CONTAINS(SEARCH_TEXT,
‘SUMMIT COUNTY BOARD’
‘RESULTLIMIT = 100’ ) = 1
Define Searchable
Columns
Topic 6
Security Requirements
• Text Search Administration (Create, Update, Drop, Restore)
•
•
•
•
DBADM on database SYSIBMTS
SEL,INS,UPD,DEL on all tables in SYSIBMTA
DBADM on the database with the table containing the text
EXECUTE on SYSIBMTS.* Packages
• Connect-Back ID
• SEL,INS,UPD,DEL on all tables in SYSIBMTA & SYSIBMTS
TIP
• If you use non-native DB2 security (ACF2), execute a native
GRANT DBADM ON DATABASE <db with Search Text Table> TO
<administrator-id>
• Text Search stored procedures query the DB2 catalog to see if the
userid has native DBADM
Define Searchable Columns
Example:
Table MYSCHEMA.POLICY_SEARCH
POLICY_ID
CHAR(10)
…
SEARCH_TEXT
VARCHAR(2000)
…
ROWID_COL
ROWID
• Column SEARCH_TEXT contains policy holder names, addresses, additional insured
names, organization names, policy number, etc
• The table must contain a ROWID column, and there must be an index on ROWID
Define the Text Search Index
CALL SYSPROC.SYSTS_CREATE
(‘MYSCHEMA', 'POLICY_SEARCH_TX1',
‘MYSCHEMA.POLICY_SEARCH(SEARCH_TEXT)',
'INDEX CONFIGURATION(SERVER 21)' );
The Stored Procedure does
this:
Inserts into SYSIBMTS.SYSTEXTINDEXES
INDEXID INDEXSCHEMA INDEXNAME
COLLECTIONNAME
SERVERID
-----+---------+---------+---------+---------+---------+---------+----------------------------------141 schema1
POL_BAS_SRCH_TX1 MVSDB2T_141_2014_02_
21
101 schema1
POL_BAS_SRCH2_TX1 MVSDB2T_101_2013_10_
1
122 MYSCHEMA POLICY_SEARCH_TX1 MVSDB2T_122_2013_12_
21
103 Schema2
POL_BAS_SRCH3_TX1 MVSDB2T_103_2013_11_
21
Call the stored procedure now and
you’ll also get …
These NEW TABLES
SYSIBMTS.EVENTS_122
SYSIBMTS.INDEX_122
SYSIBMTS.STAGING_122
(event log)
(text ‘index’ BLOB)
(ROWIDS with changed data)
Plus NEW ROWS in these tables
SYSIBMTS.SYSTEXTCOLUMNS
SYSIBMTS.SYSTEXTCONFIGURATION
SYSIBMTS.SYSTEXTLOCKS
SYSIBMTS.SYSTEXTSERVERHISTORY
But wait … there’s more
CREATE TRIGGER
DSNIBMTS.ISTAGING_122
AFTER INSERT
ON MYSCHEMA.POLICY_SEARCH
REFERENCING
NEW AS N
FOR EACH ROW
MODE DB2SQL
BEGIN ATOMIC INSERT INTO SYSIBMTS.STAGING_122 (OPERATION,
SEQID,
RID) VALUES ('I', GENERATE_UNIQUE () , CAST (N."ROWID_COL"
AS CHAR (40) FOR BIT DATA) ) ; END
AND …
CREATE TRIGGER AFTER UPDATE …
AND …
CREATE TRIGGER AFTER DELETE …
Search index initial
build and update
Topic 7
INITIAL LOAD and TEXT INDEX CREATION
1. LOAD data into POLICY_SEARCH
2. CALL SYSPROC.SYSTS_UPDATE
(‘MYSCHEMA’, ‘POLICY_SEARCH_TX1’, ‘ ‘)
Initial build of the Text Search
“index”
• What SYSPROC.SYSTS_UPDATE does:
1.
2.
3.
4.
5.
6.
Fetches a row from POLICY_SEARCH
Sends it to the ‘owning’ linux server
Adds to the index structure in the server collection file
Repeats 1- 3 until all rows are fetched
Sends the file back to DB2 a chunk at a time
DB2 inserts into the BLOB in SYSIBMTS.INDEX_[n]
DB2 z/OS
Text Search
Admin DB
Linux/Windows
server
ROWID, search text
CALL
SYSPROC.
SYSTS_UPDATE
(<parms>)
Collection file BLOB
ROWID, search text
Collection file
master table
Search
Table
Text search
Collection file
TIP
The stored procedure contains this SQL (in DB2 V10) :
SELECT U. "ROWID_COL" ,
'I' ,
U. "SEARCH_TEXT"
FROM “MYSCHEMA" . "POLICY_SEARCH" U
ORDER BY U. "ROWID_COL" ASC
SKIP LOCKED DATA
FOR READ ONLY
This may be a huge amount of data to sort, and may result in -904
Abends for sort file space in DSNDB07
*** Define table as VOLITILE and do not execute RUNSTATS
(will use the index on ROWID_COL and avoid the sort)
Build the Text Search Index
• Calling the stored procedure can take a long time, depending
on the amount of data in the source table
• This runs 6-10 hours for 22 million rows in POLICY_SEARCH
• A native call to a stored procedure ties up your session until it
is complete
• Cannot end your session and go home
• So …
Build Text Search Index
• Use home-grown REXX exec for updating indexes
• Batch job can run unattended
• Therefore can be scheduled
• Provides additional diagnostics
Subsequent data updates to the
DB2 Search Table
• Text Search indexes are NOT updated automatically at SQL
INSERT, UPDATE, and DELETE like regular DB2 Indexes
• Inserts, updates, and deletes are staged via the triggers and
staging table
• Searches are operational while inserts, updates and deletes
are pending
• But be aware of what will happen….
DB2 z/OS
Linux/Windows
server
Text Search
Admin DB
Search terms
SELECT …
FROM …
WHERE
CONTAINS(srch_col
,search terms) =1
List of ROWID’s
ROWID, data
Collection file
master table
Search
Table
Text search
Collection file
Staged Deletes
• The delete has already taken place in the DB2 table
• The Text Search server may return the ROWID of the deleted
row
• That ROWID will not be found, but the CONTAINS function
does not care
• Returns the rows of other ROWID’s
• The delete ‘happens’ immediately to Text Search
Staged Inserts
• The insert has already taken place in the DB2 table (new
ROWID)
• The Text Search file does not have the ROWID, so will not
return it in any search
• Inserts do not ‘happen’ until the Text Search Index is updated
Staged Updates of the Search
column
• The update has already occurred in the DB2 table
• The Text Search file does not have the updates, so will return
ROWID’s based on the original values
• Inaccurate or misleading results until the Text Search Index
update occurs
Incremental Index Updates
• CALL SYSPROC.SYSTS_UPDATE(‘MYSCHEMA',
'POLICY_SEARCH_TX1', ' ');
• What is does:
1.
2.
3.
4.
5.
6.
Fetches a row from SYSIBMTS.STAGING_[n]
Sends it to the ‘owning’ linux server
Alters the index structure in the linux file
Repeats 1-3 until all rows are fetched
Sends the file back to DB2 a chunk at a time
DB2 inserts into the BLOB in SYSIBMTS.INDEX_[n]
DB2 z/OS
Text Search
Admin DB
Linux/Windows
server
ROWID, search text
CALL
SYSPROC.
SYSTS_UPDATE
(<parms>)
Collection file BLOB
ROWID, search text
Collection file
master table
Search
Table
Text search
Collection file
Incremental Index Updates
• The Text Search Index experiences no outage during the
incremental update
• SQL using the CONTAINS function are unaffected
• Can also set up automated incremental updates (see manual)
Production ‘Hybrid’
Implementation
Topic 8
Text Search Challenge for
Policy Search
• Text search works great for searching for words within text
• All words are equal
• For normal document searches, that is all we need
• For Westfield policy searches there is a real problem with
context
The Grover Cleveland Dilemma
• I want to search for Grover Cleveland’s policy
• Grover is not his first name and is not on the document
• Cleveland is not a common last name, so this should be a good
search
POLICY_ID SEARCH_TEXT
A001
Gomez Morticia Addams 0001 Cemetery Lane
Cleveland Oh 44251 Lurch Insurance Agency
A001
Pugsley Addams 0001 Cemetery Lane Cleveland
Oh 44251 Lurch Insurance Agency
A001
Wednesday Addams 0001 Cemetery Lane
Cleveland Oh 44251 Lurch Insurance Agency
A075
Robert McGee c/o Bobbys Plumbing LLC 215 Main
St Salinas Ca Joplin Co Insurance
A567
County of Summit Board of Confusing Names
1777 Cleveland Ave Akron Oh Lurch Insurance
Agency
A700
Stephen G Cleveland Whitehouse Pennsylvania
Ave Washington DC Capitol Insurance Agency
A800
Andrew Johnson 1865 Woodshed Ave Columbia
TN Lurch Insurance Agency
…….
< 22 million MORE rows >
First Name
Last Name
Business Name
Street
City
State
ZIP
Agency
__________________
CLEVELAND________
__________________
__________________
__________________
__________________
__________________
__________________
SELECT *
FROM POLICY_SEARCH
WHERE CONTAINS(SEARCH_TEXT,
‘CLEVELAND’,
‘RESULTLIMIT = 5000’ ) = 1
POLICY_ID SEARCH_TEXT
A001
Gomez Morticia Addams 0001 Cemetery Lane
Cleveland Oh 44251 Lurch Insurance Agency
A001
Pugsley Addams 0001 Cemetery Lane Cleveland
Oh 44251 Lurch Insurance Agency
A001
Wednesday Addams 0001 Cemetery Lane
Cleveland Oh 44251 Lurch Insurance Agency
A075
Robert McGee c/o Bobbys Plumbing LLC 215 Main
St Salinas Ca Joplin Co Insurance
A567
County of Summit Board of Confusing Names
1777 Cleveland Ave Akron Oh Lurch Insurance
Agency
A700
Stephen G Cleveland Whitehouse Pennsylvania
Ave Washington DC Capitol Insurance Agency
A800
Andrew Johnson 1865 Woodshed Ave Columbia
TN Lurch Insurance Agency
…….
< 22 million MORE rows >
• All text is searched equally
• I get flooded with policies from the city of Cleveland, from
Cleveland Ave, businesses with Cleveland in the name, etc
• False hits
• I can’t find the policy I asked for
• This is an extreme example, but very prevalent
• I want the best of both options:
• The speed and flexibility of Text Search
• The context provided with standard DB2 indexes on separate
columns
• Three major approaches to resolve were tested
• Nice improvements
• But not good enough
Approach #4 - Tags
• Add 2 char prefix to all words
•
•
•
•
•
•
•
cx (city)
lx (last name)
fx (first name)
ax (address – all parts)
Etc
Search now has context
Build separate ‘indexes’ within the text
POLICY_ID SEARCH_TEXT
A001
fxGomez fxMorticia lxAddams ax0001 axCemetery
axLane cxCleveland sxOh zx44251 yxLurch
yxInsurance yxAgency
A001
fxPugsley lxAddams ax0001 axCemetery axLane
cxCleveland sxOh zx44251 ….
A001
fxWednesday lxAddams …. cxCleveland ….
A075
…. oxBobbys oxPlumbing oxLLC ….
A567
oxCounty oxof oxSummit oxBoard …. ax1777
axCleveland axAve cxAkron sxOh ….
A700
fxStephen mxG lxCleveland axWhitehouse
axPennsylvania axAve cxWashington cxDC …
A800
fxAndrew lxJohnson ax1865 axWoodshed ….
…….
< 22 million MORE rows >
First Name
Last Name
Business Name
Street
City
State
ZIP
Agency
__________________
CLEVELAND________
__________________
__________________
__________________
__________________
__________________
__________________
SELECT *
FROM POLICY_SEARCH
WHERE CONTAINS(SEARCH_TEXT,
‘lxCLEVELAND’,
‘RESULTLIMIT = 100’ ) = 1
POLICY_ID SEARCH_TEXT
A700
fxStephen mxG lxCleveland axWhitehouse
axPennsylvania axAve cxWashington cxDC …
Policy Search Misc Info
• Westfield Policy Search SQL has 13 tables joined along with
the CONTAINS, including table expressions
• Explain the SQL and validate that search is first (not inside a
loop)
• Use RESULTLIMIT (5000 for Policy Search) to limit output and
govern performance
• Only a few Westfield Policy Search queries a day run over 1
second elapsed time – normally very large policies or
ambiguous search terms
Future testing at Westfield
• XML searches
• Synonyms
• XML File on the Linux/Windows server
Text Search
Administration –
DB2 z/OS
Topic 9
Restore Search Index
Copies search file from SYSIBMTS.INDEX_[n] on DB2 to the linux host server collection
file. The collection on the original server is not deleted.
This has three purposes:
1) After a DB2 recover of both the source table and the associated SYSIBMTS.INDEX*
table.
2) To move the hosting from one linux server to another. Search is fully functional
during this process.
3) Recover a corrupted or lost collection file on the linux server
CALL SYSPROC.SYSTS_RESTORE(‘<TS schema>', ‘<TS Name>', ‘<Server ID>');
EXAMPLE:
CALL SYSPROC.SYSTS_RESTORE(‘MYSCHEMA', 'POLICY_SEARCH_TX1', '2');
New Server
DB2 z/OS
Text Search
Admin DB
Collection file BLOB
Copy of
Collection File
CALL
SYSPROC.
SYSTS_RESTORE
(<parms>)
Original Server
normal searches
Collection file
master table
Search
Table
Text search
Collection file
Drop Search Index
Delete associated rows from SYSIBMTS.* tables
Drop associated SYSIBMTS.* tables
Drop triggers from source table
Delete collection file from current linux server.
*** Does not delete from non-current server(s) if any.
CALL SYSPROC.SYSTS_DROP(‘<TS schema>', ‘<TS name>');
EXAMPLE:
CALL SYSPROC.SYSTS_DROP(‘MYSCHEMA', 'POLICY_SEARCH_TX1');
Takeover
Similar to RESTORE, but DB2 will find the server to move to automatically. There are
circumstances where this occurs automatically, for example when the host server is not
available during UPDATE.
CALL SYSPROC.SYSTS_TAKEOVER(‘<TS server>', ‘<TS name>');
EXAMPLE
CALL SYSPROC.SYSTS_TAKEOVER(‘MYSCHEMA', ‘POLICY_SEARCH_TX1');
Stop and Start
These stored procedures have no parms, so can also be run in SPUFI, DSNTEP*, etc.
The START is required after a new linux/windows server is added.
CALL SYSPROC.SYSTS_STOP();
CALL SYSPROC.SYSTS_START();
Text Search
Administration Linux
Topic 8
Administration - Linux
•
Check for free space
•
•
•
•
•
Collection files can be large
Check status
Delete unnecessary collections
Start and stop Text Search processes
Refer to manual for more options and details
TIP
• To avoid the requirement to be ROOT
give user r-x to:
../config/authentication.xml
../config/key.txt
Check Dasd Utilization
df –h
Filesystem
Size
/dev/dasdc1
504M
tmpfs
3.9G
/dev/mapper/system_vg-home_lv
1008M
/dev/mapper/system_vg-opt_lv
6.0G
/dev/mapper/system_vg-tmp_lv
3.0G
/dev/mapper/system_vg-usr_lv
6.0G
/dev/mapper/system_vg-var_lv
2.0G
/dev/mapper/system_vg-varlog_lv 3.0G
/dev/mapper/app_vg-app_lv
75G
Used Avail Use% Mounted on
258M 221M 54% /
0 3.9G
0% /dev/shm
34M 924M
4% /home
555M 5.1G 10% /opt
68M 2.8G
3% /tmp
1.7G 4.0G 30% /usr
227M 1.7G 12% /var
183M 2.7G
7% /var/log
12G
60G 16% /opt/IBM/ECMTextSearch
Check Status
sh adminTool.sh status -configPath ../config
CollectionName
IndexSize
Base
18,069B
MVSDB2T_103_2013_11_01_10_28_53_790971 4,128.061M
MVSDB2S_101_2014_01_27_14_29_23_687286 316,405.345K
MVSDB2T_141_2014_02_10_16_05_53_622496 70,737.354K
MVSDB2S_83_2013_12_23_13_38_32_679888
344,130.403K
MVSDB2S_82_2013_12_20_16_48_57_446100
366,729B
MVSDB2T_122_2013_12_05_16_46_53_159071 4,247.799M
MVSDB2S_23_2013_01_03_10_42_43_451278
273,449.291K
MVSDB2S_42_2013_01_10_11_18_29_609767
1,580.502M
NumOfDocuments
0
21,143,757
1,862,444
351,263
1,806,168
2,136
21,179,586
1,862,444
9,522,998
Delete Collections
sh adminTool.sh delete -configPath ../config -collectionName
MVSDB2T_103_2013_11_01_10_28_53_790971
But if that fails (IQQD0060 message) …
cd ../config/collections
ls –l
drwxr-xr-x
drwxr-xr-x
drwxr-xr-x
drwxr-xr-x
drwxr-xr-x
drwxr-xr-x
drwxr-xr-x
drwxr-xr-x
drwxr-xr-x
3
3
3
3
3
3
3
3
3
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
4096
4096
4096
4096
4096
4096
4096
4096
4096
Jan
Jan
Jan
Jan
Jan
Jan
Jan
Jan
Feb
10
27
27
27
24
24
27
27
10
07:47
14:30
13:43
13:45
10:35
11:19
17:43
17:18
16:06
Base
MVSDB2S_101_2014_01_27_14_29_23_687286
MVSDB2S_23_2013_01_03_10_42_43_451278
MVSDB2S_42_2013_01_10_11_18_29_609767
MVSDB2S_82_2013_12_20_16_48_57_446100
MVSDB2S_83_2013_12_23_13_38_32_679888
MVSDB2T_103_2013_11_01_10_28_53_790971
MVSDB2T_122_2013_12_05_16_46_53_159071
MVSDB2T_141_2014_02_10_16_05_53_622496
rm –rf MVSDB2T_103_2013_11_01_10_28_53_790971
Start and Stop Text Search
sh startup.sh
sh shutdown.sh
Download PDF
Similar pages