Empirical Evaluation of NAND Flash Memory Performance.

Empirical Evaluation of NAND Flash Memory Performance.
Empirical Evaluation of NAND Flash Memory Performance
Peter Desnoyers
Northeastern University
360 Huntington Ave.
Boston, MA 02115
pjd@ccs.neu.edu
ABSTRACT
Reports of NAND flash device testing in the literature have
for the most part been limited to examination of circuit-level
parameters on raw flash devices or prototypes, and systemlevel parameters on entire storage subsystems. However,
there has been little examination of system-level parameters
of raw devices, such as mean latency and endurance values.
We report the results of such tests on a variety of devices. Read, program, and erase latency were found to
align closely with manufacturer’s specified “typical” values
in almost all cases. Program/erase endurance, however, was
found to exceed specified minimum values, often by as much
as two orders of magnitude. In addition significant performance changes were found to occur with wear, providing
mechanisms which may be used to track this wear as well
as bearing significant implications for system performance
over the lifespan of a device. Finally, random write patterns
which incur performance penalties on current flash-based
memory systems were found to incur no overhead on the
devices themselves.
1.
INTRODUCTION
Fixed magnetic disk has been the predominant media for
secondary storage for over three decades. In the last five
years, however, solid state storage in the form of NAND
flash memory has come into increasing use, becoming the
first competitor to magnetic disk storage to gain significant
commercial acceptance.
With the increasing use of flash-based secondary storage,
detailed understanding of behavior which affects operating
system design and performance becomes important. However, while disk behavior has been extensively studied, there
appear to be few sources for the information needed to predict performance and reliability of flash-based storage systems. Detailed studies of low-level electrical characteristics
are available [5, 6, 9], as well as performance studies of complete storage assemblies (e.g. SSDs) containing flash devices
and controllers [1, 8]. However, to the best of our knowl-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
HotStorage ’09 Big Sky, Montana
Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00.
edge, there is no experimental study to date of actual flash
devices giving measured values for read, write, and erase
speed, power consumption, or write/erase longevity.
This paper reports measurements of these parameters for
a range of raw flash devices. We focus on devices themselves,
rather than flash-based systems such as USB drives or SSDs,
in order to understand the capabilities and limitations of
the underlying technology rather than that of any particular
implementation.
Of the results found, the most unexpected were these:
• High write/erase endurance. Although NAND flash memory degrades with repeated write/erase cycles, measured
lifetime varies greatly, and is often as much as two orders
of magnitude higher than manufacturer specifications.
• Wear-dependent performance changes. On all devices
tested, repeated write/erase cycling of a single block decreases write time and increases erase time—by as much
as a factor of three or more—as that block wears out,
changing overall system performance as well as providing
a predictor of individual block failure.
• Random write speed. Although flash-based storage systems such as SSDs may have poor random write performance [1], the chips themselves perform as well on random
writes as sequential ones.
In the remainder of this paper we first present an overview
of flash memory technology from a system perspective in
Section 2, followed by experimental results (Section 3) and
conclusions (Section 4).
2.
BACKGROUND
NAND flash is a form of electrically erasable programmable read-only memory based on a particularly spaceefficient basic cell, optimized for mass storage applications.
Unlike most memory technologies, NAND flash is organized
in pages of typically 2K or 4K bytes which are read and
written as a unit. Unlike block-oriented disk drives, however, pages must be erased in units of erase blocks comprising multiple pages—typically 32 to 128—before being
re-written.
2.1
Technical Overview
To inform our discussion we present an overview of the
circuit and electrical aspects of flash technology which are
relevant to system software performance; a deeper discussion
of these and other issues may be found in the survey by
Sanvido et al [10].
Bit line (in)
Bit line (in)
Word 0
Word 0
Word 1
Word 1
Word 2
Word 2
Word 3
Bit line (out)
Bit line (out)
(a) NOR Flash
(b) NAND Flash
Figure 1: Flash circuit structure. NAND flash is distinguished by the series connection of cells along the bit line, while
NOR flash (and other memory technologies) arrange cells in parallel between two bit lines.
Independent Flash
planes (typ. 1 or 2)
Flash package
Row decode
Col. decode
Control
and
address
logic
Erase block
NAND
flash
array
pages
1 page
I/O
Data register
Data page
Spare area
8 or
16 bits
devices) use what is termed Single-Level Cell (SLC) technology, storing a single bit on each cell. High-capacity MultiLevel Cell (MLC) devices use more than two levels for each
cell, storing 2 to as many as 4 [12] bits each.
With few exceptions today’s flash devices correspond to
the block diagram in Figure 2.1. Cells are arranged in pages,
typically containing 2K or 4K bytes plus a spare area of 64
to 256 bytes for system overhead. Between 16 and 128 pages
make up an erase block, or block for short, which are then
grouped into a flash plane. Devices may contain independent
flash planes, typically storing odd and even blocks, allowing
simultaneous operations for higher performance. Finally, a
static RAM buffer holds data before writing or after reading,
and data is transferred to and from this buffer via an 8- or
16-bit wide bus.
This architecture evolved to meet storage demands for
digital photography and MP3 players, with modest performance requirements and strict cost constraints. This is reflected in current flash devices, with low-cost interfaces limited to a peak bandwidth of 40 MB/sec. More recently, the
market for high performance SSDs has generated demand
for higher transfer rates, resulting in efforts such as ONFI
2.1 [3] to standardize 100MB/s to 200MB/s DDR interfaces.
In this study we are interested in the performance of basic
operations—i.e. writing from the internal buffer to the flash
plane, reading from the flash plane to the buffer, or erasing a
block. These represent the fundamental performance limits
of any particular NAND flash design, while I/O interfaces
with sufficient performance are readily available. (e.g. as
used in DRAM)
2.2
Figure 2: Typical flash device architecture. Read and write
are both performed in two steps, consisting of the transfer of data
over the external bus to or from the data register, and the internal
transfer between the data register and the flash array.
The basic cell in a NAND flash is a MOSFET transistor
with a floating (i.e. oxide-isolated) gate. Charge is tunnelled onto this gate during write operations, and removed
(via the same tunnelling mechanism) during erasure. This
stored charge causes changes in VT , the threshold or turnon voltage of the cell transistor, which may then be sensed
by the read circuitry. NAND flash is distinguished from
other flash technologies (e.g. NOR flash, E2 PROM) by the
tunnelling mechanism (Fowler-Nordheim or FN tunnelling)
used for both programming and erasure, and the series cell
organization shown in Figure 2.1(b).
Many of the more problematic characteristics of NAND
flash are due to this organization, which eliminates much of
the decoding overhead found in other memory technologies.
In particular, in NAND flash the only way to access an individual cell for either reading or writing is through the other
cells in its bit line. This adds significant noise to the read
process, and also requires care during writing to ensure that
adjacent cells in the string are not disturbed. During erasure, in contrast, all cells on the same bit string are erased.
In order to ensure precise programming and erasure in the
face of process, temperature, and other variations, an internal state machine repeatedly programs (or erases) a page
and reads it back until the operation has succeeded. Earlier
generations of NAND flash (and high-performance modern
Related Work
Prior experimental studies of flash memory performance
and endurance may be classified as circuit-oriented and
system-oriented. Circuit-level studies have examined the effect of program/erase stress on internal electrical characteristics, often using custom-fabricated devices to remove the
internal control logic and allow e.g. measurements of the
effects of single program or erase steps. A representative
study is by Lee et al. at Samsung [6], examining both program/erase cycling and hot storage effects across a range of
process technologies. Similar studies include those by Park
et al. [9] and Yang et al. [13], both also at Samsung.
System-level studies have instead examined characteristics
of entire flash-based storage systems, such as USB drives and
SSDs. The most recent of these presents uFLIP [1], a benchmark for such storage systems, with measurements of a wide
range of devices; this work quantifies the degraded performance observed for random writes in many such devices.
Additional work in this area includes [2] and [8]. There has
been a small amount of empirical testing of raw flash devices
in the wireless sensor network community [7], but this work
has focused primarily on energy usage and has not addressed
performance or endurance.
3.
3.1
EXPERIMENTAL RESULTS
Methodology
In order to test a wide range of devices, flash chips were
acquired both through traditional distributors and by purchasing and disassembling mass-market devices. A programmable flash controller was constructed using software
control of general-purpose I/O pins on a micro-controller to
4
VT (program)
VT (erase)
3
VT in Volts
2
1
0
-1
-2
-3
-4
0
10
Figure 3: Flash device test apparatus. Test system is based
on a NetBurner 5270 controller and TSOP48 programming socket.
Mfr
Size
Cell
Nominal
endurance
NAND128W3A2BN
HY27US0812JA
MT29F2G08AAD
MT29F4G08AAC
NAND08GW3B2C
MT29F8G08MAAWC
29F16G08CANC1
MT29F32G08QAA
ST
Hynix
Micron
Micron
ST
Micron
Intel
Micron
128Mbit
512Mbit
2Gbit
4Gbit
8Gbit
8Gbit
16Gbit
32Gbit
SLC
SLC
SLC
SLC
SLC
MLC
SLC
MLC
105
105
105
105
105
104
105
104
3.2
3
4
10
10
5
10
Figure 4: Typical VT degradation with program/erase
cycling. Data is abstracted from [6], [9], and [13].
Table 1: Devices tested
implement the flash interface protocol for 8-bit devices; this
test setup may be seen in Figure 2.2. Flash devices tested
ranged from early 128Mbit (16MB) SLC devices to recent
16Gbit and 32Gbit MLC chips. A complete list of devices
tested may be seen in Table 2.2. Unless otherwise specified,
all tests were performed at 25◦ C.
2
10
Program/erase cycles
Program/erase endurance (cycles)
Device
1
10
108
7
10
106
105
104
103
102
128mb 512mb
2Gb
4Gb
8Gb
8Gb
(ST)
(Micron)
16Gb
32Gb
Flash Device
Figure 5: Write/Erase endurance by device. Measured
lifetimes of individual blocks are plotted. Nominal endurance of
devices tested is 105 cycles for all devices except the 8Gb Micron
and 32Gb device, which are rated for 104 cycles.
Endurance
Limited write endurance is a key characteristic of flash
memory, and all floating gate devices in general, which is
not present in competing memory and storage technologies.
As blocks are repeatedly erased and programmed, the oxide
layer isolating the gate degrades, as described in more detail in [5]. This in turn causes a change in the response of
the cell to a fixed programming or erase step, as shown in
Figure 4. In practice this degradation is compensated for by
adaptive programming and erase algorithms internal to the
device, which use multiple program/read or erase/read steps
to achieve the desired state. If a cell has degraded too much,
however, the program or erase operation will terminate in
an error, after which the external system must consider the
block bad and remove it from use.
Program/erase endurance was tested by repeatedly programming a single page with all zeroes, and then erasing the
containing block. Although rated device endurance ranges
from 104 to 105 program/erase cycles, in Figure 5 we see
that measured endurance was higher, often by nearly two
orders of magnitude, with a small number of outliers.
Operations were timed by measuring the period during
which the device indicated that it was busy after accepting a
command, thus eliminating any dependency on the speed at
which the test system was able to read or write data over the
bus. Timing traces were collected during endurance tests,
and a representative trace is shown in Figure 6. Cell degradation of VT as seen in Figure 4 may be seen affecting the iterative programming and erase algorithms here, as program
times decrease and erase times increase over the lifetime of
a block. This effect was seen in all devices tested except for
those based on the oldest technology: for the 128Mbit part
erase times remained constant and program times decreased,
while both remained constant for the 512Mbit device.
3.3
Performance
Read performance was tested under a number of scenarios, including random and sequential reads. Again, latency
was measured from the end of the read command until the
device indicated that data was ready to be transferred, thus
avoiding effects of varying transfer speed. No significant difference was found between random and sequential speeds,
nor was read performance seen to vary with program/erase
cycling, and so a single average is reported for each device.
Results may be seen in Figure 7, where measured speeds
are compared to speeds specified by the manufacturer when
available.
Specified read latency (across all environmental and circuit conditions) is typically 25µs for current-generation SLC
devices and 50µs for MLC ones, although early small-page
SLC devices are rated at 12µs. Measured speeds under test
conditions are seen to be somewhat better than specification,
but not by large margins except in the case of the smallest
device. We speculate that this anomaly may be due to the
device being produced in a newer process technology than
it was originally designed for.
As described above, write and erase performance vary over
10
8
6
4
2
0
Write latency (µs)
Write time (µs)
300
250
200
150
100
50
0
Erase time (ms)
1000
Write latency
1×105
3×105
5×105
7×105
9×105
Specified
800
600
400
200
Erase latency
0
128mb 512mb
5
1×10
5
3×10
5
5×10
7×10
5
2Gb
Figure 6: Wear-related changes in latency. Data points
are subsampled rather than averaged to illustrate the quantized
latency values due to iterative internal algorithms.
8Gb
8Gb
(ST)
(Micron)
16Gb
32Gb
Flash Device
5
9×10
Iterations
4Gb
Figure 8: Write latency by device. Values shown are typical
(mean of first 104 writes to a block), worst-case (mean of first 100
writes), and best-case (mean of last 100 writes before failure).
8
Specified
60
Read latency (µs)
50
Erase latency (ms)
7
Specified
Measured
40
30
6
5
4
3
2
1
20
0
10
128mb 512mb
0
128mb 512mb
2Gb
4Gb
8Gb
8Gb
(ST)
(Micron)
16Gb
Figure 7: Read latency by device. Measured values were
unaffected by access pattern or block wear.
the lifetime of a flash block, complicating the task of summarizing our measurements. The best write performance is
obtained just before a block fails; however we hope to rarely
if ever operate in this region. The slowest write performance
occurs on fresh pages, but may speed up significantly after
the first few hundred writes, leading to a sizable difference
between expected and worst-case performance.
To address this we report three values for both write and
erase: the worst-case latency, seen by the first writes and
last erases, mean latency for the first 10000 operations on a
block, and the best-case latency as seen by the first erases
and last writes. Results are shown in Figures 8 and 9, again
compared to manufacturer specifications when available.1
Experiments were performed to examine the effect of random writes on performance. We note that true random
writes are not possible on most flash devices, as the pages
within an erase block must be written sequentially in order
to avoid disturbing data on previously-written pages. Instead, a random sequence of erase blocks was chosen, and
then the first page was written within each block in the sequence, followed by the second in each, etc. No detectable
1
Many test runs for the 4Gbit device showed anomalous
write and erase delays, often exceeding the 15ms timeout
of the test system; these runs are not included in the calculated results. We are investigating whether these runs reflect
true performance of the device, or whether it was due to a
malfunction of the test system.
4Gb
8Gb
8Gb
(ST)
(Micron)
16Gb
32Gb
Flash Device
32Gb
Flash Device
2Gb
Figure 9: Erase latency by device. Similar to write latency,
but the first 100 erasures yield the best case, while the first 104
yield the typical value and the last 100 yield the worst-case point.
different in write performance was seen as compared to writing pages sequentially within a single block.
3.4
Additional Testing
Further investigation was performed to determine whether
the surprisingly high endurance of the devices tested is typical, or is instead due to anomalies in the testing process.
In particular, we varied both program/erase behavior and
environmental conditions to determine their effects. Due to
the high variance of the measured endurance values, we have
not collected enough data to draw strong inferences, and so
report general trends instead of detailed results.
Usage patterns: The results reported above were measured by repeatedly programming the first page of a block
with all zeroes (the programmed state for SLC flash) and
then immediately erasing the entire block. Several devices
were tested by writing to all pages in a block before erasing it; endurance appeared to decrease with this pattern,
but by no more than a factor of two. Additional tests were
performed with varying data patterns, but no difference in
endurance was detected.
This result is not unexpected, as we surmise that one way
in which erasure or programming fails is when a single cell
fails to reach its target state after a certain number of internal program or erase steps. Given some amount of variation
between cells, it is not unexpected that changing the state
of a larger number of cells would result in a higher chance of
failure as cells wear. (We note, however, that repeated erase
cycles with no intervening writes show the same latency increase and similar endurance as erasures with a single intervening page write.)
Environmental conditions: The processes which result in flash failure are exacerbated by heat [13], although
internal temperature compensation is used to mitigate this
effect [4]. The 16Gbit device was tested at 80◦ C, and no
noticeable difference in endurance was seen. However, at
5◦ C endurance was seen to drop by a factor of about two.
Although not expected, this decrease of endurance at low
temperature has also been reported for NOR flash [11].
We note that one of the primary differences between our
tests and typical system usage is that cells are erased almost
immediately after being programmed. We are curious as
to whether endurance would be affected by the passage of
time between program and erase or vice versa; however, the
long durations required for such tests have precluded their
implementation to date.
4.
CONCLUSIONS
Many of the results of these tests were expected: read,
program (with one exception) and erase times were for the
most part slightly lower than the “typical” values specified by
the manufacturers, no doubt reflecting a margin to account
for variations outside of our test conditions.
The high endurance values measured—often nearly 100
times higher than specified—were highly unexpected and
deserve more study. Further investigation is needed to determine whether such high endurance may be expected under typical system conditions, and whether any special care
must be taken to achieve such behavior. If real systems
are able to achieve average endurance levels of 106 or 107
write/erase cycles, then it would appear that many of the
concerns raised in the systems community have been misplaced, and that flash endurance may merely become another MTBF parameter, much like mechanical failure in disk
drives.
The variation in program and erase performance with
wear, although obvious in hindsight, was also unexpected.
This has obvious applications in wear leveling algorithms,
as it supplies a measurement of a block’s remaining lifetime
that—unlike explicit erase count tracking—imposes no additional writes to the device. However, it also has implications
for block management on flash devices. If the latency of erasures can be hidden, then repeatedly re-using blocks until
they fail may yield improved write performance. However,
if system performance is impacted by erase latency, then
wear should be distributed as evenly as possible in order to
avoid high erase latencies at the end of a block’s lifespan.
Additional experimentation is needed to explore the endurance behavior seen in these experiments. How sensitive
are these results to environmental and circuit conditions?
Do they hold up across a much wider sampling of devices?
And perhaps most importantly, how sensitive are they to
system behavior—i.e. usage patterns and wear leveling?
Work to date has focused on generating usage patterns which
avoid exceeding a fixed endurance threshold for any individual block; however, it appears that this endurance level may
be variable, and that it may be more profitable to look for
patterns which maximize that endurance, instead.
Our results to date raise more questions than they answer, and we believe that further answers will require closer
collaboration between the circuit and device community and
the systems community than may have been present to date.
Historically the device community has focused on worst-case
behavior, as is appropriate for e.g. memory buses. However,
as systems designers we often are concerned with averagecase behavior instead. We believe a deeper understanding
on both sides, and focused experimentation, will help design
higher-performance flash-based systems in the future.
5.
REFERENCES
[1] L. Bouganim, B. JÃşnsson, and P. Bonnet. uFLIP:
understanding flash IO patterns. In Int’l Conf. on
Innovative Data Systems Research (CIDR), Asilomar,
California, 2009.
[2] P. Huang, Y. Chang, T. Kuo, J. Hsieh, and M. Lin.
The Behavior Analysis of Flash-Memory Storage
Systems. In IEEE Symposium on Object Oriented
Real-Time Distributed Computing, pages 529–534.
IEEE Computer Society, 2008.
[3] Hynix Semiconductor, Intel Corporation, Micron
Technology Inc., Numonyx, Phison Electronics Corp.,
Sony Corp., and Spansion. Open NAND Flash
Interface Specification, rev. 2.1. Available from
www.onfi.org/specifications, Jan. 2009.
[4] K. Kimura and T. Kobayashi. Trends in high-density
flash memory technologies. In IEEE Conference on
Electron Devices and Solid-State Circuits, pages
45–50, 2003.
[5] J. Lee, J. Choi, D. Park, and K. Kim. Data retention
characteristics of sub-100 nm NAND flash memory
cells. IEEE Electron Device Letters, 24(12):748–750,
2003.
[6] J. Lee, J. Choi, D. Park, and K. Kim. Degradation of
tunnel oxide by FN current stress and its effects on
data retention characteristics of 90 nm NAND flash
memory cells. In IEEE Int’l Reliability Physics
Symposium, pages 497–501, 2003.
[7] G. Mathur, P. Desnoyers, D. Ganesan, and P. Shenoy.
Ultra-low power data storage for sensor networks. In
IPSN/SPOTS, April 2006.
[8] K. OâĂŹBrien, D. C. Salyers, A. D. Striegel, and
C. Poellabauer. Power and performance characteristics
of USB flash drives. In World of Wireless, Mobile and
Multimedia Networks (WoWMoM), pages 1–4, 2008.
[9] M. Park, E. Ahn, E. Cho, K. Kim, and W. Lee. The
effect of negative VTH of NAND flash memory cells on
data retention characteristics. IEEE Electron Device
Letters, 30(2):155–157, 2009.
[10] M. Sanvido, F. Chu, A. Kulkarni, and R. Selinger.
NAND flash memory and its role in storage
architectures. Proceedings of the IEEE,
96(11):1864–1874, 2008.
[11] R. Saripalli. Maximizing endurance of MSC1210 flash
memory. Technical Report Application Report
SBAA091, Texas Instruments, 2003.
[12] N. Shibata, H. Maejima, K. Isobe, K. Iwasa, et al. A
70 nm 16 gb 16-Level-Cell NAND flash memory. IEEE
Journal of Solid-State Circuits, 43(4):929–937, 2008.
[13] H. Yang, H. Kim, S. Park, J. Kim, et al. Reliability
issues and models of sub-90nm NAND flash memory
cells. In Solid-State and Integrated Circuit Technology
(ICSICT), pages 760–762, 2006.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising