Annex A
DATA MASKING (DM) SOLUTION
Statement of Requirements
1. Requirements
The Office of Chief Elector Officer of Canada (OCEO) has a requirement to “mask” sensitive data
for use in development, testing, user acceptance and pre-production purposes. The Chief Elector
Officer of Canada (CEOC) is also referred by its more popular name Elections Canada (EC)
The Data masking is defined as the replacement of sensitive date with realistic but not real data.
These requirements include the following elements:
a)
b)
c)
d)
e)
f)
g)
h)
Licensed data masking software
Licensed Software Maintenance
Licenses Software Support Services
Media and Documentation
Optional Software Maintenance
Optional Software Support Services
Optional Training
Optional Professional Services
2. Goals
The goal of the data masking solution is to enable EC to develop capability to “mask”
sensitive data used for development, testing, user acceptance and pre-production purposes.
The solution must work with-in the parameters and conditions as set out in this document.
These include both the business and technical parameters
3. Specific Objectives
The objectives of the data masking solution are:
1. “Mask” sensitive data used for development, testing, and user acceptance and preproduction purposes.
2. Provide data masking solution that is repeatable.
3. Preserve referential integrity of data in masked data.
4. Do not establish a one to one translation while masking data.
5. Produce masked data which is human readable.
6. Provide both built-in data masking rules for common masking processes, as well as
7.
8.
9.
user defined masking rules.
Mask the data in place.
Generate masked data that is not reversible.
Permit the use of multiple rules at the same time
Capability to store data masking rules for subsequent re-use.
10.
11. Increase EC’s compliance with data collection and data use policies
12. Raise awareness of Information Management best practices
4. Scope
As mentioned above, there is a need for a data masking solution. Initially this solution will be
deployed in EC’s R and D (Research and Development) area to ensure that the solution
meets EC’s stated requirements.
Once satisfied with the data masking solution, EC will the roll-out this solution to its various
development and testing databases.
EC requires the licensed software and associated software maintenance and support on the
data masking solution.
EC may require optional professional services from the vendor on an as-needed-basis.
EC may require optional training services from the vendor on an as-needed basis, at the
request of the project authority for the DM solution.
Users
The users of the data masking solution are as follows:
1. Data Masking Administrator
2. Data Base Administrator
3. Privacy Coordinator
5. EC Computing Environment
EC uses over 200 custom developed business applications and over 20 Commercial off the
shelf (COTS) applications to support and streamline work processes, access data, and
process millions of elector related transactions. These business applications are critical
enablers to address ongoing EC and Canadian citizens’ needs.
The following numbers provide a glimpse as to the current level of Information Technology
(IT) support in EC:
Approximately 400 Workstations
Approximately 300 Blackberry devices and cell phones
Approximately 1,600 Servers (physical and virtual)
External Web Hosting Site
Internal Networks linking over 630+ points of service
External links to numerous partners (Provinces, Territories, MTO’s, Other
Government Departments such as Vital Stats, Canada Revenue Agency and some
provincial electoral bodies)
More than 200 custom developed business applications
Approximately 500 software products (including products for software development).
Telephony: 1 million calls (some are automated voice response system)
Email: 100,000 (over 1 million Spam emails are stopped each week)
Over 80 Terabytes of data.
a) Server Platform:
Operating
Technology
CPU
Number
Number of
Average
2
System
Used
configuration
OEL
Itanium
Unix
HP - DL 580
HP – Itanium
HP – Unix
4 Quad Core
2 Dual Core
2 Dual Core
of
Physical
Servers
9
2
2
VMs
currently on
all server
65
n/a
n/a
number of
databases
3 (per VM)
3
5
b) Desktop Platform:
EC currently uses Microsoft Windows XP SP3 running Microsoft Office 2010 but possibly
upgrade to a newer version.
c) Authentication and Email:
EC currently uses Microsoft Active Directory 2000 for user authentication to the network but
will be transitioning to Microsoft Active Directory 2008R2. EC currently uses Microsoft
Exchange 2007 for email.
d) Business Intelligence Platform:
IBM Cognos is EC’s current departmental standard Business Intelligence (BI) toolset. EC
currently holds named user licenses of the IBM Cognos solution.
e) Internet Browser:
EC currently uses Windows Internet Explorer version 8 (IE 8) as a web browser for
retrieving, presenting, and navigating information on the Web (Internet and Intranet).
6. EC Policies, Treasury Board of Canada Secretariat (TBS) Directives and
Information Management (IM) Computing Standards
a) Ethics Code Requirement
The contractor must conduct itself, and will instruct its employees, contractors and agents to
conduct themselves, at all times during the performance and delivery of services under any
Contracts resulting from this Request for Proposals (RFP), in a manner consistent with the
values and ethics prescribed for public servants in the Treasury Board Secretariat Values and
Ethics Code for the Public Service, effective April 2, 2012, and generally to support the intent
and spirit of the Code in all its dealing with and for Canada. An electronic version of the Code
may be found at the following URL: http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=25049
b) Treasury Board Policy on the use of Electronic Networks
The contractor must conduct itself, and will instruct its employees, contractors and agents to
conduct themselves, at all times during the performance and delivery of services under any
Contracts resulting from this RFP, in a manner consistent with the Treasury Board Policy on
the use of electronic networks. An electronic version of the Code may be found at the
following URL: http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=12419&section=text
c) Treasury Board Policy on the Duty to Accommodate (DTA) Persons with
Disabilities in the Federal Public Service
An electronic version of the Policy may be found at the following URL: http://www.tbssct.gc.ca/pol/doc-eng.aspx?id=12541
d) Treasury Board Standard on the Web Accessibility
An electronic version of the Policy may be found at the following URL: http://www.tbssct.gc.ca/pol/doc-eng.aspx?evttoo=X&section=text&id=23601
3
e) Common Look and Feel (CLF)
An electronic version of the Policy may be found at the following URL: http://www.tbssct.gc.ca/clf2-nsi2/index-eng.asp
f) Web Content Accessibility Guidelines (WCAG)
The Web Content Accessibility Guidelines (WCAG) covers a wide range of recommendations
for making Web content more accessible. Following these guidelines will make content
accessible to a wider range of people with disabilities, including blindness and low vision,
deafness and hearing loss, learning disabilities, cognitive limitations, limited movement,
speech disabilities, photosensitivity and combinations of these. Following these guidelines
will also often make Web content more usable to users in general. An electronic version of
the Policy may be found at the following URL: http://www.w3.org/TR/1999/WAIWEBCONTENT-19990505/
In order to measure Website Accessibility, the World Wide Web Consortium (w3c) has
established a set of 14 guidelines/general principles of accessible design. Each guideline has
a number of checkpoints. These checkpoints are graded WCAG 1.0, WCAG 2.0 and WCAG
3.0, based on the checkpoint's impact on accessibility.
[Priority 1 / WCAG 1.0] A Web content developer must satisfy this checkpoint. Otherwise,
one or more groups will find it impossible to access information in the document.
Satisfying this checkpoint is a basic requirement for some groups to be able to use Web
documents.
[Priority 2 / WCAG 2.0] A Web content developer should satisfy this checkpoint.
Otherwise, one or more groups will find it difficult to access information in the document.
Satisfying this checkpoint will remove significant barriers to accessing Web documents.
Priority 3 / WCAG 3.0] A Web content developer may address this checkpoint. Otherwise,
one or more groups will find it somewhat difficult to access information in the document.
Satisfying this checkpoint will improve access to Web documents.
Accessibility can be measured in 3 levels; these levels are:
Conformance Level "A": all WCAG Priority 1 checkpoints in each guideline are satisfied;
Conformance Level "AA": all WCAG Priority 1 and 2 checkpoints are satisfied;
Conformance Level "AAA": all WCAG Priority 1, 2, and 3 checkpoints in each guide
4
7. Mandatory Requirements
Ref.
No.
M1
Description
The data masking solution must deliver, enable and support a working and complete solution, which must include any and all
components that contribute to the composition of the whole or in part(s), as expressed in the RFP and its appendices and its
annexes. A complete list identifying the names and versions of each Licensed Software component delivered as part of the solution
must be provided.
The solution must allow EC to achieve the specific objectives detailed in section 3 of the statement of requirements.
The data masking solution must operate in the computing environment described in section 6 EC Computing Environment of this
document.
M2
The data masking solution must allow EC to deliver a data masking solution that is repeatable.
M3
This means that if the production data has not changed, running the data masking process must return the exact same result each
time
The data masking solution must preserve referential integrity of data.
M4
This means that pre-existing relationships to other data elements defined within the Oracle database must be maintained after
running the data masking process (use-case: Elector to Address)
The data masking solution must not establish a one to one translation.
M5
This means that the data masking process must not return the same result for all occurrences of a single value. (Example: all
occurrence of “Smith” must not always return “Jones”)
The data masking solution must produce the masked data that is human readable.
M6
This means that the result of the data masking process must appear to the end user as realistic production data (Example: “Smith”
appears as “Jones” – not “xcvtr”)
The data masking solution must provide both built-in data masking rules for common masking processes, as well as user defined
masking rules.
5
Ref.
No.
Description
Examples;
Built-in masking rule: (Example: SIN # masking rule)
User Defined masking rule: (Example: Family name across an address).
M7
The data masking solution must mask the data in place.
M8
This means that the solution must allow EC to mask the data directly in the database requested without having to perform some
series of ETL processes to load data into a unique Data Masking environment to perform the data masking and then extract it from
this unique environment to overwrite the original environment.
The data masking solution must mask the data at rest.
Within the data masking technology this is commonly referred to as Static Data Masking (SDM). This means that the result of the
data masking operation MUST be stored directly in the database as a replacement of the original data content. In this way there is
no way to by-pass the masking process to view the sensitive data while at rest inside the database or from a backup of the
database
M9
The data masking solution must generate masked data that is not reversible.
M10
This means that there must be no way to see or derive the original value of the data element after the data masking process has
been run.
The data masking solution must permit for the use of multiple rules at the same time.
Example: 1) repeatable, 2) gender, 3) family unit
John Smith 100 Main St (Father) - becomes - Frank Black 100 Main St
Barb Smith 100 Main St (Mother) - becomes - Nancy Black 100 Main St
Bob Smith 100 Main St (Son)
- becomes -
Dave Black 100 Main St
6
Ref.
No.
Description
Sue Jones 100 Main St (Grandmother) - becomes -
Linda Franks 100 Main St
M11
George Smith 200 Main St (Neighbour) - becomes Pat Brown 200 Main St
The data masking solution must support the “substitution” masking technique.
M12
The accepted techniques of masking include;
• Substitution, example: replace/ substitute the data for something else
• Suppression, example: suppress the sensitive column
• Redaction,
example: put stars for first 6 numbers of SIN
• Generalization, example: replacing a value within a range or set, example 21 replace will be a value between 20 and 30.
• Shuffling, example: substitution of data, using the existing data as a basis for substituted data.
• Randomization example: replace data with a random value data
The data masking solution must allow EC to provide source substitution data.
M13
This means that the data masking process must allow EC to use the current Register of Electors as a source of name substitution
values for the elector name mask process
The data masking solution must allow for the storing of data masking rules for subsequent re-use.
M14
This means that the solution must provide capability to permit user to configure data masking rules and then to re-use them
multiple times at later dates. This will avoid re-entering the data masking rules every-time the masking operations are invoked.
The data masking solution must mask the data on Oracle Enterprise version database (EC’s standard database).
M15
This does not restrict the solution to only Oracle database but Oracle database must be one of the primary database environments
and not an environment requiring special aftermarket add-ons. The predominant Oracle database character set used is
WE8MSWIN1252 although there are some occurrences of AL32UTF8 character set.
The data masking solution must mask the data on Oracle Enterprise LINUX (OEL) on a Virtual platform.
This does not restrict the solution to only LINUX platform, but LINUX must be one of the primary environments and not an
environment requiring special aftermarket add-ons. LINUX platform is the EC standard for Oracle databases and virtualization
technologies are extensive used at EC. Hence the base solution must support these.
7
Ref.
No.
M16
Description
There must be no limitation on the size/volume of data being masked by the data masking solution.
The size of the data that will be masked by the DM solution is currently at 800 Gigabytes, the solution must able to handle this
volume and any subsequent increases over the years.
8
Rated Technical Requirements
Ref.
No.
Description
R1
The data masking solution should be highly scalable for volume of data to be masked.
R2
This means that the solution should allow EC to mask all Electors (approx. 45 million) within the Register database inside any of our
current development environments (currently numbering 4, including Development, Test, QA & Pre-Prod environments) within 8 hours
based upon existing database and server configurations {EC current Register database configuration: 4 CPUs, 10G RAM, OEL 5.8, VM
ware 4.1 with Oracle 11GR2}
The data masking solution should have built-in recovery after failure functionality.
R3
This means that in the event of any failure (example: out of disk space, insufficient rollback segments etc.) the solution should be able
to continue the masking process from where it stopped once the event has been resolved. EC should not be forced to re-start the entire
masking operation from the beginning, simply due to the failure event.
The data masking solution should support masking techniques in addition to the “substitution”.
R4
The generally accepted techniques of masking include;
• Substitution, example: replace/ substitute the data for something else
• Suppression, example: suppress the sensitive column
• Redaction,
example: put stars for first 6 numbers of SIN
• Generalization example: replacing a value within a range or set, example: 21 replace be a value between 20 and 30.
• Shuffling,
example: substitution of data, using the exiting data as bases for substituted data.
• Randomization example: replace data with a random value
The data masking solution should encrypt any stored internal-usage data by using GC encryption standards
Most data masking products create interim processing data for short and /or long term usage. An example may be the “map” to be
used for substitution technique. If this “map” is not stored in a secured fashion, this “map” can be used to reverse the data masking of
any data. However this “map” may be required to repeat the same data masking operation at a later date. To avoid any such breaches,
the solution should store all such internal usage data in a secure fashion.
For GoC encryption standards refer the URL; http://www.cse-cst.gc.ca/its-sti/services/crypto-services-crypto/ca-ac-eng.html
9
R5
The data masking solution should provide a friendly graphical user interface (GUI).
R6
This means that the solution should provide some form of GUI interface for user community to interact with it, as opposed to
commands issued at operating system command line.
The data masking solution should provide capability to choose from pre-stored data masking rules.
R7
This means that the solution should provide capability by which the user is not forced to apply all masking rules during each masking
operation, but the user can choose which set of rules to run at the time of invoking each masking operation.
The data masking solution should have built in user access security restrictions.
R8
This means that the solution should come “out of the box” with some form of user access protection against unauthorized use of the
tool. The most basic example of user access security would include; protection by user name and password, where the tool maintains
the required credentials. A more sophisticated mechanism would be when the tool can leverage an LDAP such as Microsoft’s Active
Directory. Another, sophistication would be when the tool support multiple user profiles/ roles (example Administrator, Operator,
Read-only etc.).
The data masking solution should accommodate backup.
R9
This means that the solution should provide the required capabilities to work seamlessly with HP Data Protector (version 6 +). HP Data
Protector is back-up tool used by EC.
The data masking solution should prevent concurrent runs to the same target database.
R10
This means that the solution should be aware if a masking process is currently executing on the target database and prevent a
concurrent run on the same target database. This will help avoid issues of accidental launch/ re-launch of masking operations on the
same target database. Note that; this does not mean that the solution is single-user, but that no concurrent runs on the same target
database.
The data masking solution should provide both pre-built and configurable reporting capability.
R11
This means that the solution should provide a series of pre-built (canned) reports to provide the user with reports on data masking run
statistics and other data masking run log information like errors, counts etc. The solution must also provide the capability for the user to
create new additional reports or configure the original canned reports.
The data masking solution should provide built-in configurable audit.
The solution should have the built-in capability to perform audit on all the actions (examples: change in rules etc.) taken inside the
10
solution and externally by the solution (examples: run-time logs).
11
DATA MAKING SOLUTION
Acronyms Table
Acronym
Description
OCEO
CEOC
COTS
IT
EC
RnD
OEL
VM
RAM
CPU
QA
BI
IE
TBS
IM
RFP
URL
DTA
CLF
WCAG
W3C
GUI
SOR
SIN
Office of Chief Elector Officer
Chief Elector Officer of Canada
Commercial off the shelf
Information Technology
Elections Canada
Research and Development
Oracle Enterprise Linux
Virtual Machine
Random Access Memory
Central Processing Unit
Quality Assurance
Business Intelligence
Internet Explorer
Treasury Board of Canada Secretariat
Information Management
Request for Proposals
Uniform Resource Locator
Duty to Accommodate
Common Look and Feel
Web Content Accessibility Guidelines
World Wide Web Consortium
Graphical User Interface
Statement of Requirements
Social Insurance Number
12