Right-Sizing Your Big Data Infrastructure
Right-Sizing Your Big Data
Infrastructure
Tom Lyon
Founder & Chief Scientist
For Strata + Hadoop World, Mar. 15, 2017
Cluster Out of Balance?
• Too little CPU, too much disk?
• Too little disk, too much CPU?
• How can you evolve the cluster
balance as workloads change?
2
Too Many Silos? Too many SKUs?
• Each type of cluster “wants” a different amount of disk per server
• Hadoop Data Lake
• Dev/Test
• Hbase
• Kafka
• Cassandra
• …
• Fixed silos per cluster type lead to madness
• No resource sharing
• No elasticity
• Too many server types / SKUs
3
Hadoop Storage Needs vs Supposed Solutions
Locality
Converged
compute &
storage
Replication
Extreme
Read BW
Erasure
Coding
Hadoop HDFS
✔
✔
✔
✔
✖
NAS - Enterprise
✖
✖
✔
✖
✔
Isilon, Qumulo, Gluster
NAS - HPC
✖
✖
✖
✔
✖
Lustre, GPFS
SAN/Block - External
✖
✖
✔
✖
☐
ScaleIO, Ceph, Datera,
Cinder, AWS EBS
SAN/Block Hyperconverged
✖
✔
✔
✖
☐
Nutanix, ScaleIO, Robin
Object
✖
✖
✔
✖
✔
AWS S3, Scality, Swift,
EMC ECS
DriveScale Confidential Information © 2016
Examples
4
DriveScale is a rack scale architecture, providing composable
infrastructure on pooled commodity resources
Typical Rack Server
Rack Configuration
Rack Scale Architecture
•  Compute pool:
Processor +
Memory Servers
DriveScale Adapter
•  1U DriveScale
Adapter (DA) Ethernet to SAS
•  Storage pool:
Disks in JBODs,
connected via SAS
to DAs
DriveScale Adapter
•  DriveScale
composes Logical
Nodes (software
defined physical
nodes)
•  Example: Logical
node might consist
of dual proc server
and 12 drives
across 2 JBODs
5
DriveScale spans the data center and makes resources fungible
The boundaries between clusters are “movable” in software
Cluster 1
Balanced
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
DriveScale Adapter
Cluster 2
Data Lake
Cluster 3
Compute Heavy
6
DriveScale’s Core Value Propositions
Flexible and Responsive
Physical Infrastructure
•  Get the infrastructure
that’s needed when it’s
needed
•  Repurpose resources
on demand
Simplicity for Any Scale
•  No changes in the app stack
required.
•  Equivalent performance to
direct attached drives
•  No loss in “data locality”
information
Enterprise Class Solution
•  Highly available, Secure,
Reliable
•  Use industry standard
servers and storage of
your choice
7
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising