White Paper - Top Considerations for Enterprise SSDs (A Primer)

White Paper - Top Considerations for Enterprise SSDs (A Primer)
The Essential Guide to
Enterprise SSDs
Finding the Right Fit for Your IT Infrastructure
1 Interface Options
2 SSD Performance Scaling
3 Form Factors
3 Endurance Considerations
3 NAND Considerations
3 Error Handling and Data Protection
4 Power Considerations
4 Measuring Performance
5 Monitoring and Management
5 Conclusion
Evaluating solid-state drives (SSDs) for use in
enterprise applications can be tricky business.
In addition to selecting the right interface
type, endurance level and capacity, decision
makers must also look beyond product
data sheets to determine the right SSD to
accelerate their applications. Often the
specifications on SSD vendor collateral are
based on multiple and different benchmark
tests, or other criteria that may not represent
one’s unique environment. This paper will
examine basic SSD differences and highlight
key criteria that should be considered
in choosing the right device for a given
Interface Options
The industry typically classifies enterprise-class SSDs by
interface type. The main interfaces to evaluate are SATA,
SAS and PCIe. From here, it is easy to qualify the devices
based on factors such as price, performance, capacity,
endurance and form factor. SATA is usually the least
expensive of the device types, but also brings up the rear
in terms of performance due to the speed limitations
of the 6Gb/s SATA bus. SAS SSDs are mostly deployed
inside SAN or NAS storage arrays. With dual-port
interfaces, they can be configured with multiple paths to
multiple array controllers for high availability. SAS drives
deliver nearly double the performance of SATA devices,
thanks to the 12Gb/s SAS interface. SATA and SAS SSDs
are the most widely deployed interface types today, with
capacities that can reach more than 4TB.
At the high-end of the performance spectrum are PCIe
SSDs. These devices connect directly into the PCIe
bus and are able to deliver much higher speeds. By
implementing specialized controllers that closely resemble
memory architectures, PCIe products eliminate traditional
storage protocol overhead, thereby reducing latencies
and access times when compared to SATA or SAS. Given
the importance of latency in enterprise applications, PCIe
is often preferred as it reduces IO wait times, improves
CPU utilization and enables more users or threads per SSD.
Figure 1. Software Stack Comparison of SATA or SAS SSD
vs. PCIe SSD
SATA and SAS SSDs typically have higher latency than
PCIe SSDs. This is primarily due to the software stack that
must be traversed to read and write data. The diagram in
Figure 1 above illustrates the layers of this stack.
More on PCIe
A PCIe connection consists of one or more data
transmission lanes, connected serially. Each lane
consists of two pairs of wires, one for receiving and one
for transmitting. The PCIe specification supports one,
four, eight or sixteen lanes in a single PCIe slot typically
denoted as x1, x4, x8 or x16. Each lane is an independent
connection between the PCI controller and the SSD,
and bandwidth scales linearly, so an x8 connection will
have twice the bandwidth of a x4 connection. PCIe
transfer rates will depend on the generation of the base
specification. Base 2.0 or Gen-2 provides 4Gb/s per
lane, so a Gen-2 with 16 lanes can deliver an aggregate
bandwidth of 64Gb/s. Gen-3 provides 8Gb/s per lane, so
accordingly a Gen-3 with 8 lanes provides 64Gb/s.
When evaluating PCIe devices, it is important to look
for references to generation and number of lanes
(Gen-2 x4 or Gen-3 x8 and so on). While PCIe devices
tend to be the most expensive per GB, they can deliver
more than 10 times the performance of SATA. Table 1
on the following page shows a quick summary of the
distinctions between the three SSD interfaces.
Why NVMe Matters for PCIe
Table 1. Typical Specifications for SATA, SAS and PCIe SSDs
Table 2. Features Comparison of SATA/AHCI vs. PCIe/NVMe
Another recent PCIe development is a standards-based device driver
called NVM Express™ or NVMe. Most operating systems are shipping
the standard NVMe driver today, which eliminates the hassles of
deploying proprietary drivers from multiple vendors. Compared to the
SATA interface, NVMe is designed to work with pipeline-rich, random
access, memory-based storage. As such, it requires only a single
message for 4K transfers (compared to 2 for SATA) and has the ability
to process multiple queues instead of only one. In fact, NVMe allows
for up to 65,536 simultaneous queues. Highlighted in Table 2 to the
right are more details comparing SATA (which follows the Advanced
Host Controller Interface standard) to NVMe-compliant PCIe.
75k /
130k /
740k /
Maximum Queue
500MB /
1.1GB /
3GB /
One command
per queue; 32
commands per
65,536 queues;
commands per
Six per non-queued
command; Nine per
queued command
Two per
Register Accesses
(2000 cycles
A single interrupt;
No steering
2048 MSI-X
The NVM Express Work Group has not stopped at just a driver. Coming
soon are standards for monitoring and managing multi-vendor PCIe
installations under a single pane of glass, as well as common methods
for creating low-latency networks between NVMe devices for
clustering and high availability (NVMe-over-Fabrics).
Message Signal
Interrupts (MSI-X)
and Interrupt
Parallelism and
Multiple Threads
synchronization lock
to issue a command
No locking
Efficiency for 4KB
parameters require
two serialized host
DRAM fetches
Gets command
parameters in a
single 64-byte
SSD Performance Scaling
SSDs deployed inside storage arrays as “All-Flash” or “Hybrid”
(where storage controllers use tiering or caching between HDDs
and SSDs to aggregate devices together and manage data protection)
provide large capacity shared storage that can take advantage of
SSD performance characteristics. These architectures are ideal for
many enterprise use cases, but not for databases like MySQL and
NoSQL. For the latter, each server node has its own SSD/HDDs and
the databases scale and handle data protection with techniques like
sharding that stripe data across lots of individual nodes.
For MySQL and NoSQL environments, achieving optimal SSD
performance is usually done with PCIe devices due to their low latency
and high speeds. Depending on the workload requirements, there
are scenarios where striping data across SATA or SAS SSDs inside a
single server using RAID-0 can add capacity to the node. However,
striping numerous SATA or SAS drives does not necessarily guarantee
similar performance to PCIe. As workloads or thread counts increase,
SATA and SAS latencies are magnified and software overhead tends
to throttle the aggregate performance of the devices. As a result, a
single PCIe SSD can often be less expensive than multiple “cheaper”
SATA or SAS devices aggregated together. This is why many vendors
have started to speak about performance SSDs as a cost per IOP,
rather than the traditional cost per GB that is more familiar from
the traditional storage world. Illustrated in Figure 2 to the right is an
example of NVMe-compliant PCIe devices compared to 1, 2 and 4
SATA devices in RAID-0 using a tool called SSD Bench.
Figure 2. Performance Comparison NVMe PCIe vs. Multiple SATA
Sustained Multi-Threaded Random 4KB Mixed
(70R/30W) Using 100% Capacity
Sustained Random 4KB Mixed (70R/30W)
Performance by # of Threads Using 100% Capacity
Sustained Random 4KB Read Performance by
# of Threads Using 100% Capacity
Sustained Random 4KB Write Performance by
# of Threads Using 100% Capacity
This data illustrates that the overhead of RAID on SATA offsets the
potential for linear scalability that can be gained from a single PCIe SSD.
Form Factors
Both SATA and SAS devices come in 2.5” disk form factors. Until
recently, PCIe devices were only available in the Half-Height, HalfLength (HH-HL) card form factor, meaning that the buyer would
have to open up the server to install the SSD. This has changed
in recent months. Almost all server vendors now offer machines
where PCIe Flash can be accessed in the front of the server just
like a traditional hard drive. Adoption of this server and storage
combination is growing rapidly as it allows simple maintenance
(like hot-swap) and gives customers the choice of easily adding or
changing SSDs as needed.
Endurance Considerations
SSD endurance is usually described in terms of Drive Writes per
Day (DW/D). Specifically, this is how much data that can be written
to the device for a specified time period (typically three years
or five years). For many vendors, this time period is the same
as the SSD’s warranty period. But this is not always the case, so
understanding the definition of DW/D is important. For example,
if a 1TB SSD is specified for 1DW/D, it should handle 1TB of data
written to it every day for the warranty period.
A few years ago, endurance was the top criteria for
purchasing an SSD. What the industry has found over
time is that SSD technology has improved and generally
use-cases tend to be more read intensive. As such, there
is now a broad mix of SSD endurance, capacities and
DW/D annotations for High Endurance (HE), Medium
Endurance (ME), Read Intensive (RI) and Very Read
Intensive (VRI) along with associated DW/D warranties.
It is important to pay close attention to how DW/D is presented.
Some vendors show DW/D in a best case scenario using Total
Flash Writes. This is very different from measurements that use
Application Writes. The latter takes into consideration worst-case,
small block (4K) random I/O patterns with all device activities
including writes, reads, wear leveling and garbage collection. It is
common to hear about “Write Amplification” which is a reference
to the realistic view of what happens over time when writing to an
SSD. Other considerations like random or sequential writes will
have an impact on endurance. The above reference is for random
writes, which will yield lower endurance than sequential writes.
There are several good ways to choose the right DW/D
for a specific environment’s needs. Options include
vendor-supplied profiling tools or historical storage
information with Self-Monitoring, Analysis and Reporting
Technology (S.M.A.R.T.).
Another metric that is used for SSD write endurance is Terabytes
Written (TBW), which describes how much data can be written to
the SSD over the life of the drive. Again, the higher the TBW value,
the better the endurance of the SSD.
Depending on the supplier, endurance may be reported as either
DW/D or TBW. To convert between the two metrics, the drive
capacity and the supplier measurement period must be known. To
convert TBW to DW/D, the following formula can be used.
TBW = DWD * Warranty * 365 * Capacity/1024
Note: 1024 is simply the conversion for gigabytes to terabytes.
NAND Considerations
Endurance, footprint, cost and performance are all directly
impacted by the underlying NAND technology used by
the SSD maker. Early on, Single-Level Cell (SLC) NAND
Flash, which uses a single cell to store one bit of data,
was the primary choice as it provided high endurance
for write intensive applications. The downside, however,
was that SLC was extremely expensive. To allow the
cost of SSDs to reach the mainstream, the industry
moved to Multi-Level Cell (MLC) architectures. While less
expensive, MLC also has lower endurance. Pioneering
SSD vendors addressed MLC endurance challenges with
specialized controller architectures for error handling
and data protection, yielding Unrecoverable Bit Error
Rates (UBER) of 1 error in 100,000 trillion bits read over
the full write endurance of the device.
With broad adoption of MLC NAND today, the industry
continues to seek new ways to reduce cost and expand
the use cases for SSDs. To address both capacity and
cost, a new technology is emerging called 3D NAND,
where the NAND cells are arranged vertically in the NAND
die to gain more density in the same footprint.
NAND manufacturers have chosen different paths for
the construction of NAND cells. Some fabrications use
traditional floating gate MOSFET technology with doped
polycrystalline silicon. Others use Charge Trap Flash
(CTF) where silicon nitride film is used to store electrons.
Floating gate is more mature based on its long history, but
CTF may have advantages in certain areas. Enterprises
should look to vendors with a strong track record of
delivering high-quality and high-reliability to successfully
manage the 3D NAND transition.
Error Handling and
Data Protection
Every vendor addresses NAND management in a slightly
different way with unique software and firmware in the
controller. The primary objective is to improve SSD
endurance through Flash management algorithms.
Proactive cell management provides improved reliability
and reduced bit error rates. State-of-the-art controllers
also employ advanced signal processing techniques to
dynamically manage how NAND wears. This eliminates the
need for read-retries by accessing error-free data, even at
vendor-specified endurance limits. In addition, techniques
such as predictive read-optimization ensure there is no
loss of performance during the useful life of the drive.
Some technologies also incorporate controller-based
media access management, which dynamically adjusts
over the lifetime of the media to reduce the Unrecoverable
Bit Error Rate (UBER). Advanced Error Correction Code (ECC)
techniques enable a higher degree of protection against media
errors, leading to improved endurance while maintaining or
delivering higher performance.
From a data protection standpoint, certain SSDs can prevent
data loss associated with Flash media. These products provide
the ability to recover from NAND Flash page, block, die and chip
failures by creating multiple instances of data striped across
multiple NAND Flash dies.
Fundamentally, each NAND Flash die consists of multiple pages
which are further arranged in multiple blocks. Data stored by
the controller is managed at the NAND block level. Software in
the controller is used to arrange data in stripes. When the host
writes data to the SSD, redundancy information is generated by
the controller over a stripe of data. The controller then writes the
host data and the redundant data to the Flash stripes. Data in the
stripe is spread across the NAND Flash blocks over multiple Flash
channels, so that no two blocks of data within a stripe resides in the
same NAND block or die. The result is RAID-like protection of NAND
that yields very high reliability.
Power Considerations
PCIe SSDs use more power than their SATA and SAS counterparts.
High-end NVMe-compliant PCIe SSDs generally specify maximum
power ratings for Gen-3 x4 around 25 watts. While there are
“low-power” PCIe SSDs, they typically have lower performance
characteristics than the high-end devices.
A few products on the market also offer field programmable power
options that allow users to set power thresholds. As a lower power
threshold can throttle performance, users should check with the
manufacturer for proper power/performance tuning.
Measuring Performance
SSD performance typically is measured by three distinct metrics
– Input/Output Operations per Second (IOPS), throughput and
latency. A fourth metric that is often overlooked but important to
note is Quality of Service (QoS). Each is described below:
IOPS is the transfer rate of the device or the number of
transactions that can be completed in a given amount of
time. Depending on the type of benchmarking tool, this
measure could also be shown as Transactions per Minute (TPM).
Throughput is the amount of data that can be transferred to
or from the SSD. Throughput is measured in MB/s or GB/s.
Latency is the amount of time it takes for a
command generated by the host to go to the
SSD and return (round trip time for an IO request).
Response time is measured in milliseconds or
microseconds depending on the type of SSD.
QoS measures the consistency of performance
over a specific time interval with a fixed
confidence level or threshold. QoS measurements
can include both Macro (consistency of average
IOPS latency) and Micro (measured command
completion time latencies at various queue depths).
Performance measurements must be tied to the
workload or use case for the SSD. In some cases, block
sizes are small, in others they are large. Workloads also
differ by access patterns like random or sequential and
read/write mix. A read or write operation is sequential
when its starting storage location, or Logical Block
Address (LBA), follows directly after the previous
operation. Each new IO begins where the last one ended.
Random operations are just the opposite, where the
LBA is not contiguous to the ending LBA of the previous
operation. SSD controllers maintain a mapping table to
align LBAs to Flash Physical Block Addresses (PBA). The
algorithms employed by different vendors vary and have
a big impact on both performance and endurance.
The mix of read and write operations also impact SSD
performance. SSDs are really good at reads since there
are very few steps that the controller must take. Writes
on the other hand are slower. This is because a single
NAND memory location cannot be overwritten in a single
IO operation (unlike HDDs that can overwrite a single
LBA). The number of write steps depends on how full
the device is and whether the controller must first erase
the target cell (or potentially relocate data with a read/
modify/write operation). Overall, SSDs can deliver very
high IOPS in small random read access patterns and high
throughput with large block sequential patterns.
“real world” is a mixed read/write workload.
It is fairly rare to see published specs on mixed read/
write performance. It is even more uncommon to see
mixed workload measures as thread count or user count
increases. But this type of measurement is critical as it
illustrates the robustness of controller design and how
much work a given SSD can perform, allowing users
to accurately plan the number of SSDs (and potentially
servers) needed. Highlighted in Figure 3 to the right is an
example of different NVMe vendor device IOPS in a 70/30
read/write mix as workloads increase.
As shown in Figure 3 at the bottom of the page, most
SSDs perform well under light workloads but as the load
increases, only a few can scale in a linear fashion. In
fact, the difference between NVMe Comp-A and NVMe
Comp-C is 2x.
User experience, CPU utilization, number of CPU cores
and ultimately the number of software licenses and
servers required for an application are all driven by a
combination of IOPS, throughput and latency. Of these
three metrics, latency has the greatest impact. As the unit
of measure for response time, being able to view latency
over increasing workload demand should be a driving
factor in selecting a SSD. Most vendors will publish an
average latency metric on their data sheets. However, just
like the 100% 4K random read metric, this number must
be put into a workload context. Figure 4 on the next page
illustrates how several vendors stack up on a more realistic
scenario measuring throughput of a 4K random 70/30
mixed read/ write workload.
Figure 3. Sustained 4KB Random Mixed 70/30 by Number
of Threads for NVMe-compliant PCIe SSDs (100% Capacity
with Full Preconditioning)
IOPS and Throughput
Nearly all SSD data sheets will specify IOPS performance
using 100% read or 100% write at 4K block sizes, as well
as throughput specifications on 100% sequential reads
and writes with a 128KB block size. These numbers make
SSDs look extremely fast, and vendors tune their controllers
to optimize the results. However, an application that is
100% read is quite unusual. A more relevant metric of the
Quality of Service
In mission critical environments, the consistency of SSD
performance is paramount. But controller tasks like garbage
collection or wear-leveling operating concurrently with IO traffic
can severely impact data delivery. A few vendors are starting to
publish QoS measurements. Both Macro QoS (consistency of IOPS
and latency at a specific queue depth and time interval) as well
as Micro QoS (cumulative summary of command completions
in a specific time interval based on a given workload) should be
reviewed. Examples of each type of report are shown in Figure 5 to
the left.
Figure 4. Average Latency Comparisons Between NVMe SSD
Vendors (1.6TB PCIe, 4KB Random 70/30 at Given Level of IOPS)
Micro QoS Example Reports
Cumulative CCT: 4KiB Random 70R/30W, QD = 32
Percentile Summary:
99% = 2.8ms
99.99% = 7.5ms
100% = 70.46ms
Monitoring and Management
Deploying SSDs is relatively easy. But as more SSDs are installed,
having tools that can monitor health, performance and utilization
from a centralized platform will save time and reduce stress.
Monitoring tools that will provide the most value will have Active
Directory/LDAP integration, automated email alerts and endurance
reporting, as well as the ability to perform device-specific and/or
enterprise-wide functions like format, sanitize, resize and firmware
According to IDC, 11.5 million Enterprise SSDs were shipped in
calendar year 2015 which represents 7.1 Exabytes of capacity
with a 32% unit volume increase compared to 2014. Clearly the
technology has moved from niche to mainstream as NAND
fabrication and the products themselves have matured. While there
is no expected date when SSD cost/GB pricing will match HDDs,
the cost/IOP metric is rapidly being embraced for performance
and latency-sensitive applications.
There are many different enterprise SSD options that span the
gamut of price, performance, endurance and form factor. The
contents of this paper should serve as a useful primer to help guide
users in making the best decisions for SSD deployment. During
the process, it is also valuable to source and reference additional
materials to assist with benchmarking, application-specific
implementation guides and peer case studies to stay abreast of the
latest news in SSD technologies.
Figure 5. Example Reports of Macro QoS and Micro QoS
Macro QoS Example Report
1 Second Time Series: 4KiB Random 70R/30W, QD = 32
CCT Histogram: 4KiB Random 70R/30W, QD = 32
Max Observed Latency:
IOPS StDev / Avg IOPS = 1.9%
About Silicon Mechanics
Silicon Mechanics Inc., is an open technology solutions integrator of Server, High-Performance Computing, Software
Defined Storage, Cloud and Virtualization solutions. For over 15 years, the Silicon Mechanics Expert included.
advantage has enabled organizations to deploy purpose-built, high-value compute and storage solutions across
multiple industries.
Offering customers deep technical experience, along with our defined methodology, we partner with our customers
to architect, build, deploy and support flexible solutions from a wide network of technology partners. Founded in
2001 and recognized as one of the fastest growing companies in the Seattle metropolitan technology corridor,
Silicon Mechanics is empowering innovative organizations as they transform the world through open technology.
To learn more visit www.siliconmechanics.com.
Customize Your Server with HGST Enterprise SSDs at siliconmechanics.com/hgst-ssd
© 2016 HGST, Inc., 3403 Yerba Buena Road, San Jose, CA 95135 USA. Produced in the United States 5/16. All rights reserved.
References in this publication to HGST’s products, programs or services do not imply that HGST intends to make these available in all countries in which it operates. Information is true as of the date of publication and is subject to change and does not
constitute a warranty. Individual performance may vary. Users are responsible for evaluating their own requirements.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF