SQream Technologies SQream DB GPU
White Paper - GPU-Based SQL Database
SQream Technologies
SQream DB GPU-Based SQL Database
Technical Overview White Paper
White Paper - GPU-Based SQL Database
Overview
SQream DB is an analytic database built from scratch to harness the unique performance of graphical processors (GPUs)
for handling petabyte-scale data, thus yielding significant savings in time and resources to its users.
SQream DB’s unique, cost-effective solution, provides enterprises with significant added value – empowering BI, data
scientists, engineers and even marketing teams with new possibilities in big data analytics.
SQream DB running on a single or multiple NVIDIA GPUs, is capable of processing enormous data sets up to 100 times
faster than any other leading data warehouse solution available today, by easily integrating it with existing tools and
relational SQL queries - boosting productivity while reducing infrastructure and operating costs.
Translating the above into tangible gains - running 100 times more queries while lowering the TCO - means that SQream
DB is an outstandingly valuable asset to any organization handling big data analytic workloads.
The SQream Advantage
With the worldwide exploding data creation, organizations need to make use of and stay on top of their collected data.
Organizations are facing a serious challenge in regards to storing immense volumes of structured and semi-structured
data, analyzing it and obtaining real-time, rapid, actionable insights from it.
Entities with quickly scaling data need a high-performance solution that will continue to perform well when addressing
multi-petabyte data sets and heavy workloads. SQream DB is designed to address such needs, with the following four
main advantages:
Small Server Size
SQream DB is designed from ground up to serve as a powerful database, while requiring as little as a single standard tower
server or a 2U rack mount enclosure. Comparing a single 2U server with a full 42U rack vendor-supplied enclosure such as
Teradata, Oracle Exadata and IBM PureData System for Analytics (formerly Netezza), the 2U server is capable of yielding
equal or better query execution performance. As for costs - the savings in hardware, power, floor space, cooling and
maintenance are enormous.
SQream DB is not limited to the 2U form factor and can scale to larger configurations supporting multiple GPUs.
Scale – GPU is a Massively Parallel Processor (MPP) on a Card
The idea behind SQream’s architecture is harnessing the readily available power of thousands of parallel processing
cores in a cost-effective GPU, to compete with and overtake standard and parallel DBMS solutions, running on dozens
of expensive general-purpose processors.
2
White Paper - GPU-Based SQL Database
MULTI-CORE CPU - up to 32 cores
CORE
CORE
CORE
CORE
CORE
CORE
GPU - up to 2880 cores
CORE
CORE
CACHE
CACHE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
RAM
RAM
MULTI-CORE CPU - up to 32 cores
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
RAM
CORE
GPU - up to 2880 cores
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
CACHE
CORE
CORE
CORE
CORE
CORE
CORE
RAM
A 32-core CPU installation (latency- oriented) requires a lot of power and can cost thousands of dollars. On the other
hand, a single throughput-oriented GPU can have as many as 3000 onboard cores, delivering superior performance at a
significantly lower cost, and a 90% reduced power consumption.
With up to 20 times more processing power per node, suitable for aggressive data operations, and outstanding highspeed and scalability – it is easy to see how SQream DB benefits the use of GPUs.
While other clustered solutions may be massively parallel through scaling-out computers, SQream DB is massively
parallel through the GPUs’ on-board thousands of cores. Moreover, several GPUs can link together inside the same
enclosure, delivering a reduction of both memory and network I/O while decreasing network load and latency.
Simplicity in Integration
With SQream DB implementation could not be easier. SQream DB uses the familiar ANSI SQL syntax, meaning there is no
need for any data remodeling, and no new skills need to be acquired. Employees don’t need retraining and do not have
to rewrite hundreds of queries. Even third party ETL and BI tools can easily be connected and used via industry standard
ODBC/JDBC interfaces, without hiring integration specialists.
3
White Paper - GPU-Based SQL Database
[At the time of writing this paper, SQream DB was tested to work with the following ETL and BI tools: Pentaho, Talend,
Informatica, DataStage, SSIS, QlikView, Spotfire, Tableau, Business Objects and even Excel.]
Simplicity by Design
SQream DB is a columnar database, in which each column is stored as a collection of “data chunks”, each containing
millions of values. SQream DB automates the creation of smart metadata on top of each column and every data chunk.
This smart metadata replaces the common indexing used by most databases, thus eliminating the lengthy and limiting
process of index creation while ingesting new data. The result is a smart grid for accessing any desired data on demand,
at petabyte scale.
SQream Database Architecture
Connectors: JDBC, .Net, ODBC
SQream Server
SQL Parser
Resource
Manager
Optimizer
CPU/GPU
Execution graph
Runtime
I/O
Manager
SQream Storage Metadata
ext4/NTFS
4
White Paper - GPU-Based SQL Database
Relational Algebra
SQream DB utilizes a concept called relational algebra, first proposed by Edgar F. Codd from IBM Research, in 1969. This is a
powerful model based on mathematical theory and is used by many SQL engines. It is based on set theory. The operations
described as filters and joins, are such strong concepts, that they are comparable to mathematical basics like addition
and multiplication. Relational Algebra is therefore not only well studied, but comprehensively battle tested in real world
applications. By transforming your relational SQL queries into clever, highly parallelizable relational algebra, SQream DB can
efficiently perform complex operations on the massively parallel GPU cores. These operations are performed internally by the
SQream DB compiler and require no user intervention.
Performance
Relational Algebra Optimizations
The SQream DB compiler does a lot of the heavy lifting. The compiler processes the given SQL query (from standard ODBC
or JDBC connectors), creates an execution plan and then optimizes it. The result is an equivalent query that produces the
same results, but runs a lot faster.
Because SQream DB works in a massively parallel environment, most of the optimizations involve combining repeated
work and choosing alternative paths that reduce repeated processor and I/O operations.
GPU Parallelism
SQream DB’s main processing power comes from the massively parallel NVIDIA GPU. The execution plan that the compiler
choses is uniquely suited and optimized for the NVIDIA GPU, resulting in high-speed, real-time, high scale performance.
By using original patent-pending concepts, SQream DB’s compiler and compressors are able to reduce the amount of I/O and
repeated operations before the data is even transferred to the GPU, resulting in an incredible speed advantage with complex
queries.
Storage
SQream DB utilizes powerful and robust columnar storage, split up into GPU manageable chunks. While some newer
DBMS solutions are semi-columnar, SQream DB is fully columnar, including both the storage and the query engine.
Vertical partitioning - columnar storage - This feature allows selective access to the required subset of columns,
reducing disk scan and memory I/O time, compared with standard row storage. This seemingly straightforward concept
enables SQream DB to operate very quickly.
Horizontal partitioning - “extent storage” – SQream automatically splits up the storage horizontally into manageable
chunks enabling optimal usage of the hardware resources and relatively small memory availability in GPUs, compared
with CPU RAM.
5
White Paper - GPU-Based SQL Database
Emp_no
1
2
3
4
5
6
Dept_id
1
1
1
2
2
3
Hire_date
2012-01-01
2014-05-16
2014-01-22
2012-06-08
2013-04-25
2013-08-01
Emp_in
Smith
Johnson
Miller
Taylor
Wilson
Brown
Dept_in
John
Barbara
Amanda
Evelyn
Bob
Jim
1
1
2012-01-01
Smith
John
1
2
3
4
5
2
1
2014-05-16
Johnson
Barbara
1
1
1
2
2
3
1
2014-01-22
Miller
Amanda
2012-01-01 2014-05-16 2014-01-22 2012-06-08 2013-04-25
Smart Metadata
Smart metadata is automatically generated on the fly for each “chunk”, while data is ingested. The smart metadata
enables the immediate pinpointing of the exact required data for each query.
When using leading RDBMS solutions, DBAs need to set up indexing, at least on a few columns. SQream DB’s smart
metadata method means that the DBAs do not need to perform any data modeling or create indexes or primary keys,
as these are automatically dealt with through the smart metadata during the data ingestion. The result is a cutting-edge
smart grid for accessing and querying any desired data on demand, at petabyte scale.
Smart metadata comes into play and enables ultra-fast, sub-second responses to specific queries, such as SELECT
COUNT … or SELECT DISTINCT … SQream utilizes the smart metadata extensively, while saving significant processing and
I/O time by pinpointing data “chunks” that are involved in the processing of each query.
SQream DB offers ultra-fast data ingestion. Processing is done on the GPU, leaving the CPU free to perform heavy I/O.
Thus, up to 2TB worth of ETL operations may be ingested by the server each hour, even with a basic configuration
consisting of a single GPU card.
Compression
By utilizing cutting-edge but well-established compression algorithms specially tuned for fast operations, SQream DB
enables reduction of disk storage size, while still maintaining blazing fast queries. In fact, the compression algorithms are
so fast, that most hard-drives will be the bottleneck of the compress/decompress process.
SQream’s compression and decompression is performed on-the-fly on the GPU, 50 times faster than on a standard
CPU. It is so fast that SQream DB compresses and decompresses everything. Other leading databases compress only
some of the data.
6
White Paper - GPU-Based SQL Database
Scaling
Linear scaling in performance – As opposed to other DBMSs - where performance decreases as data volume increases
(beyond a certain threshold) - SQream DB’s innovative technology allows for steady performance regardless of the data
scale.
Scaling in storage – Storage may be enlarged easily, by adding more drives to the server. SQream DB’s highly capable
algorithms tackle the rest. Since SQream DB is throughput intensive, it is opt for multi-terabyte conventional hard drives
and basic SSDs.
Scaling in GPUs, not CPUs or nodes – Adding additional compute power is simple. There is no need to replace the entire
server, but only to plug in additional NVIDIA GPU cards.
Interfaces and Integration
SQL Support
SQream DB supports the pure ANSI SQL language. Stored procedures such as Microsoft T-SQL and Oracle PL/SQL are
not supported.
SQream DB integrates easily into existing systems by supporting the usage of both ODBC and JDBC connectors. This
means existing ETL and analytics tools and developed applications can stay, minimizing the time needed to get up and
running with SQream DB.
SQream DB may be introduced on its own, as a standalone petabyte-scale database, to meet all the analytic needs.
However, there is no need to throw away existing solutions. Instead of upgrading current solutions by procuring additional
non-linearly scaling hardware, organizations may plug in SQream DB as a secondary database solution, creating an on/
offloading system and empowering existing investments.
IT Monitoring
SQream DB runs on standard hardware and can easily integrate with any control and monitoring software in use, to track
Linux based machines.
Logging
SQream DB contains a built-in logger that tracks critical server information, enabling IT and security teams to gain
insights from the server’s operations - from failed login attempts, to CPU time spent per query, through read-write
cycles and memory utilization.
Security
SQream DB offers username/password authentication for levels ranging from the cluster (multi-database), all the way
down to per-table authentication.
7
White Paper - GPU-Based SQL Database
Backup and Restore Operations
SQream DB offers backup and restore operations either via SQL statements or directly from the file system. The latter
means that SQream DB can be backed up and restored, using any external storage system (Data Replication Manager).
High Availability Configuration
Multiple SQream DB servers may be connected to a single external storage system, while at any point in time, only one
server is “active” and the others are “passive”.
When the Active server fails, the Passive server mounts the “shared” storage and continues to respond to queries,
without any data loss. [Active/Active and automatic Fail-Over is planned for the next release].
Alternatively, SQream DB can also run in a stand-alone “cluster” topology, in which two servers - both with the same
internal direct attached storage, are active - while the first, which ingests new data and serves queries, continuously
updates the other. Upon the first server failure, the other seamlessly takes control, with no time or data loss.
Active
Passive
Storage
SQream vs. Other Big Data Solutions
Organizations may be considering a trendy new cluster or NoSQL solution. These are excellent for specific implementations,
but they require experienced DBAs and new application development skills. Compared with the painless and hassle-free
integration of SQream DB, the benefits of the latter are obvious.
8
White Paper - GPU-Based SQL Database
Summary
SQream DB delivers up to 100 times faster big data analytics compared with other key market players, while using significantly smaller hardware footprint. SQream DB is the only solution that is truly capable of dealing with massive big
data escalating magnitudes (petabyte scale and hundreds of billions of rows of data), and doing so at relative ease and
extraordinary value.
SQream DB opens up new opportunities for organizations to do much more with their data, in relevance to their unique
business use cases. Petabyte scale data insights with hundreds of billions of entries are now within reach.
Organizations may integrate SQream DB as a standalone database solution or as a complementary analytics database,
maximizing existing core IT investments.
The SQream DB hardware architecture enables significant cost savings through the use of GPU’s and their massively
parallel abilities, instead of clustering servers and nodes - thus optimizing the system in a way that saves both hardware,
infrastructure, utilities and maintenance costs.
The integration of SQream DB is extremely straight-forward and requires no massive rewrites of SQL queries, no additional skills need to be acquired, and the database plugs in easily to the existing ecosystem - requiring little to no transition time and no investment in training, etc.
All of the above translate into substantial gains for the organization by enabling the running of two orders of magnitude
more queries - unlocking the critical business intelligence and information hiding in organizations’ collected big data.
SQream DB brings organizations to a leading advantage point, while significantly reducing their hardware and operating
costs.
For more information about SQream DB, visit www.sqream.com or call +972.3.544.4871.
Copyright © 2010. All rights reserved.
This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be
error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either
directly or indirectly by this document. This document may not be reproduced in any form, for any purpose, without our prior written permission.
9
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising