Subpage-based Flash Translation Layer For Solid State Drivers

Subpage-based Flash Translation Layer For Solid State Drivers
Subpage-based Flash Translation Layer
For Solid State Drivers
Mincheol Kang,Wonyoung Lee, and Soontae Kim
School of Computing
KAIST, Daejeon, Korea
{mincheolkang, wy lee, kims}@kaist.ac.kr
Abstract—Solid State Drives (SSDs), which consist of NAND
flash chips, are widely used in current computer systems because
of the higher read and write speed than conventional Hard Disk
Driver (HDD). However, because NAND flash chips has limited
Program and Erase (P/E) cycles, reducing writes to NAND flash
chips can improve lifetime of SSD. As vendors increase the
page size which is minimum I/O unit in NAND flash chip, the
number of subpage which size is less than page unit is also
increased because of different I/O unit between host and NAND
flash chip. This subpage write requests can cause endurance
problem and internal fragmentation. In addition, subpage write
increase the number of Read Modify Write (RMW) operation
because of out-place-update features in NAND flash chips. Also,
it increases write latency because of reading old data first before
update write. In this paper, we proposed subpage-based Flash
Translation Layer (FTL) to increase lifetime and performance by
reducing writes to NAND flash chips and eliminating unnecessary
RMW. We modified write buffer in SSD to achieve merging
subpage to full page and added size information to mapping table
to detect unnecessary RMW. Our proposed scheme reduces the
subpage writes ratio less than 2% and is shown to reduce writes
to NAND flash chip by up to 23 % and 13% on average and
write response time to maximum 14% and 8% on average.
I. I NTRODUCTION
Solid State Drives (SSDs) are used in a wide range of computer devices as a storage system. Unlike a conventional Hard
Disk Drive (HDD), an SSD has many benefits including high
access speed, high durability, and low power consumption.
Nowadays, the cost which is the main disadvantage of SSD in
comparison with HDD is steadily being lowered because the
NAND flash density has been increasing. Thus, the SSD is
main storage rather than HDD. However, the NAND flash chip,
which is the main component of SSD, has limited Program
and Erase (P/E) cycle which determines the lifetime of SSD.
Also, it cannot update the same physical location which is
called out-place-update. Thus, reducing the number of writes
to NAND flash chip can improve the lifetime of SSD [32].
An SSD uses a page unit for read and write operations
which is different from current host system I/O unit called
sector. Because of different I/O unit and limited P/E cycles,
the SSD uses Flash Translation Layer (FTL) which translates
address using mapping table to improve lifetime the SSD [17].
Furthermore, because of slow write latency than read latency
in NAND flash chip, the SSD exploits internal DRAM as write
buffer which temporarily stores the write requests [23]. Thus,
the SSD can mitigate disadvantages of NAND flash chip.
Recently, manufacturers have been maximizing the throughput and capacity of SSD by increasing the page size [2],
which currently is 8KB or 16KB. However, current size of
sector unit which is designed for HDD is less than page
unit. Thus, the requests which are smaller data size than
the page unit can occur. This requests are called subpage
requests. This subpage write can cause endurance problem
[20] and internal fragmentation within a page. Because NAND
flash chip has out-place-update feature, the subpage write can
produce Read Modify Write (RMW) when update operation
with same physical address occurs. This RMW delays the new
write request because it have to read old data first and write
to new physical location [3][5].
Several techniques have been proposed to address subpage
problem. [12] proposed sector log to reduce subpage write to
NAND flash chip and elapsed times. They manage subpage
write requests to a small part of NAND flash chips using
sector mapping table. They can reduce the number of write
requests by mering subpage write requests. However, when
limited NAND flash chips that store merged subpage write
are full, it have to evict merged data to write the other part
of NAND flash chips which store full page. This process
increases read latency. Compression supported FTL [25] has
been proposed. They make hardware compressor which can
compress several subpage writes. Even though they can reduce
the number of writes to NAND flash chip, it has encoding and
decoding overhead and cannot solve internal fragmentation
within a page.
In this paper, we propose a subpage-based Flash Translation
Layer (FTL) to reduce subpage write requests and eliminate
unnecessary RMW to improve lifetime and performance. We
modified the write buffer which can merge subpage write
requests to full-page to reduce subpage writes to NAND flash
chips. Also, we added size information to the mapping table
to eliminate unnecessary RMW. Our experimental results were
obtained using a trace-based simulator and show that subpagebased FTL can reduce the NAND writes up to 23% and 13%
on average and subpage writes ratio less than 2%. Also, the
write response time is reduced by up to 14% and 8% on
average.
The rest of this paper is organized as follows. The next
section explains the background and motivations behind this
work. Section 3 introduces and discusses related work. Our
subpage-based FTL is proposed in Section 4. The experimental
Host Interface
SSD Microprocessor
DRAM
Write Buffer
Buffer Data
Flash Translation Layer
Mapping Table
Die
Flash Controller
Channel
Channel
NAND Flash Chip
NAND Flash Chip
NAND Flash Chip
NAND Flash Chip
NAND Flash Chip
NAND Flash Chip
NAND Flash Chip
NAND Flash Chip
Die
Plane
Block
Page
Page
Page
Page
Plane
Block
Page
Page
Page
Page
Plane
Block
Page
Page
Page
Page
Plane
Block
Page
Page
Page
Page
Block
Page
Page
Page
Page
Block
Page
Page
Page
Page
Block
Page
Page
Page
Page
Block
Page
Page
Page
Page
Multiplexed Bus
Chip Internal
Fig. 1: SSD Architecture
results are discussed in Section 5. Finally, the conclusions are
given in Section 6.
II. BACKGROUND AND M OTIVATION
A. Internal SSD
1) SSD Architecture: Figure 1 shows the overall SSD
architecture. The host I/O requests transfer to the SSD through
the host interface which is SATA or PCIe. The microprocessor
has responsible for running embedded software modules which
are write buffer and FTL. These software modules exploit
internal DRAM. The write buffer temporarily stores write
requests data and mapping table of FTL which is used for
address translation. After address translation, flash controller
obtains NAND flash commands with physical address which
are generated by FTL. The flash controller is connected to multiple channel which can be operated independently. Multiple
NAND flash chips share a serial bus to a single channel. These
structure can improve parallelisms [6]. A NAND flash chip
consists of a set of dies which can operate command independently by interleaving because several dies are connected to a
multiplexed bus [16]. Each die has several planes which shares
the word-line. A plane consists of thousands of block which is
erase unit. The block has several pages which is read or write
unit. The page can be separated to sector unit (512 Bytes)
which is minimum I/O unit [27][9][28]. However it cannot
read or write sector unit within a page. If we want to read
a particular sector in a page, NAND flash chip must be read
whole page even though the others sector is not meaningful.
2) Write Buffer and FTL: Because NAND flash memory
has higher write latency than read latency, the SSD exploits
DRAM as the write buffer. The host system notifies the
SSD of new requests by providing information including the
logical sector number (LSN), size, and command. Because the
minimum I/O unit differs between HDD and SSD, the LSN
has to be changed to the logical page number (LPN). If the
I/O requests exceed the page size, they are split into memory
requests based on the page size [14]. In addition, the subpage
Free space
keeps information sector numberNew data
withinOld data
a page.
Even though
PPN 100
PPN 100
Update new PPN 100 the I/O request
size fits a page, it can be subpage
because
of mis-aligned sector number PPN 100
[11]. If the write PPN 100
requests are
Read old PPN 100
coming, the write buffer tries to add the memory requests
to the buffer list, which is managed by LPN. If some of the
PPN 100
Modify new PPN 100 memory requests
are found in the
buffer list, thesePPN 100
are updated
using old PPN 100 to the new data. If the buffer list is full, it tries to choose a
victim memory request by using a Least Recently Used (LRU)
policy, which is widely used in the case of write buffers [13].
FTL helps the SSD to improve lifetime. Because of different
I/O unit between host system and NAND flash chip, Address
translation must be required. After translating LSN to LPN,
the FTL translates LPN to Physical Page Number (PPN) by
using mapping table which keeps physical location of NAND
flash chips corresponding LPN. This mapping table is stored
in internal DRAM. Previous types of mapping table is block
level mapping [19] or hybrid scheme [17] because of limited
internal DRAM size. However, those types have overhead
which has to data copy and erase block. Nowadays, because
the internal DRAM size is increased [29], the page level
mapping can be used. Because of out-place-update feature,
if the update requests which indicate physical location where
data have already been written are coming, FTL has to give
free PPN to write data and updates that PPN to mapping table.
The previous page with old data changes valid to invalid state.
These invalid pages will be erased by Garbage Collection
(GC). It selects victim block by their policy and copies valid
page to free block and erase victim block. In addition, because
similar access pattern can consume a particular limited P/E
cycle of NAND flash chip, wear leveling technique is used to
evenly erase blocks. Thus, the FTL prolongs the lifetime of
the SSD.
B. Motivations
1) Subpage Write: As we have seen in Section 2.1, subpage
requests are generated because of different host I/O unit
and mis-aligned of sector address. The subpage writes ratio
gradually is being increased because vendors increase the
page size to improve bandwidth. Recent commercial SSDs
use 8KB and 16KB page size [24][22]. The subpage write
can cause program (write) disturbance problem which affects
adjacent cells around programed cell because of different
voltage between programed cell and adjacent cell [20]. Also,
it increases the number of RMW [30] and causes internal
fragmentation within a page. Even through the write buffer
can reduce the number of subpage writes by updating new
write requests in the write buffer, it cannot cover all subpage
writes. Figure 2 shows the subpage write ratio with 64 MB
write buffer. As page size is bigger, subpage write ratio is also
increased. In 8KB and 16KB page size, the subpage write ratio
is by up to 45%, 61% and 26%, 41% on average respectively.
2) Unnecessary RMW operations: Because the NAND
flash chip does not allow in-place-update, the FTL has to
handle update operation by RMW which have to read old
16KB
4KB
32KB
8KB
16KB
32KB
Subpage writes ratio (%)
60
40
20
0
Fig. 2: Subpage Write Ratio
New data Old data Free space
PPN 100
PPN 100
Read old PPN 100
PPN 100
PPN 100
Modify new PPN 100 using old PPN 100 and Write to PPN 200
PPN 200
PPN 200
Update new PPN 100 (a) RMW
(b) Unnecessary RMW
Fig. 3: RMW and Unnecessary RMW operations.
data before writing new request. It increase write latency. As
subpage write ratio is increased, the number of RMW can
be increased because particular subpage write which updates
old data to new data can cause RMW. As we have seen in
Section 2.1, because the page can be separated to sector unit,
we can describe subpage write base on sector unit within a
page. If sectors within a new subpage write can cover the
old sectors that already be written in physical page, it is not
necessary to read old data. This is called unnecessary RMW.
Figure 3 shows normal RMW and unnecessary RMW cases.
The page size is 2KB which has four sector (512 Byte) in this
example. The new subpage update operation which indicates
PPN 100 is coming, we have to read old data within PPN
100 and modify PPN 100 which size is changed to 2KB and
write to free physical location which is PPN 200. However,
in unnecessary RMW case, new PPN 100 data can cover
old PPN 100 data. Thus, it does not necessary to read old
data. We observe divers applications to see unnecessary RMW
portion by whole subpage writes. Figure 4 shows this result.
The diverse applications have an unnecessary RMW portion of
almost 20% on average for different page sizes. The number
of unnecessary RMW operations does not affect the page size.
However, the EXC4 application has an unnecessary RMW
portion of 42% in the worst case.
III. R ELATED W ORK
Many schemes have been proposed to increase lifetime and
performance by reducing NAND writes and RMW overhead.
Unnecessary RMW operations ratio (%)
8KB
80
Unnecessary RMW operations ratio (%)
4KB
4KB
60
40
20
0
60
8KB
16KB
32KB4KB
8KB
16KB
32KB
40
20
0
Fig. 4: Unnecessary RMW opearations ratio
Previous studies can be separated into two approach. One is
software [30][18][12] and the other is hardware [25][21]. In
software approach, [30] proposed a new buffer replacement
algorithm which exploits chip level parallelism in SSD to
reduce RMW overhead. They modified the buffer replacement
policy that looks up chip state whether is busy or not and
chooses victim requests and allocates the new write request
to non busy chip. [18] proposed a new buffer mechanism
called partial page buffering to reduce the number of RMW
operations. They only keep subpage writes in write buffer to
make full page by updating. The full page writes are passed to
write buffer and directly write to NAND flash chip. Thus, they
can reduce the number of RMW by reducing subpage writes.
[12] proposed sector log by managing subpage writes to reduce
NAND writes. They managed the merged requests by adding a
table for sector mapping, which stores different sectors within
a page in a small part of the NAND flash chips. They merged
non-contiguous sectors with different LPNs. Once the sector
mapping table is full, they merged the different LPNs in the
sector log mapping table. Thus, they can achieve the subpage
writes and elapsed times.
In hardware approach, modern SSD exploits data compression to reduce NAND flash writes to increase lifetime of SSD
[10]. [25] proposed zFTL to reduce the number of NAND
flash writes by compressing data. They used the page buffer in
flash controller which size is same to page size. They added a
hardware module to the controller to compress the data, which
they managed by adding information to the mapping table to
indicate to the location of the compressed data within a page.
In addition, they included detail information of the compressed
data by inserting the page state to physical page because the
compressed data almost generate internal fragmentation within
a page. When several writes are issued regardless of subpage
or full page writes, they try to compress those requests to
a physical page. On the other hand, the read request which
points to compressed data must be decompressed. They can
achieve the reducing NAND writes by increasing compression
rate. However, they cannot solve the internal fragmentation
within a page because the compressed data do not fit a sector
unit. In addition, it predicts whether data can compressible or
not and have compression overhead for almost write requests
NAND Flash Chip
NAND Flash Chip
Multiplexed Bus
Chip Internal
Subpage Writes
LPN 0
Page Buffer
Full page write
Full page write
LPN 0, 1
LPN 2
LPN 2
Write Buffer
Subpage Writes
LPN 0
LPN 1
Full page write
LPN 2
LPN 1
Write Buffer
Sector Mapping Table
LPN
LPN 0
LPN 1
PPN 1
LPN
PPN 124
...
124
...
Page Mapping Table
PPN
PPN 0
...
1
2
...
...
0
1
...
PPN 1
Read PPN 0,1
(flush)
(a) Sector Log
LPN 83
NAND Flash Chips
LPN
LPN 0
One list LRU
LPN 232
LPN 20
Data Shift
(b) Partial Page Buffering
LPN 1
Page State Mapping Table
Page LPN
PPN
Flag
State
0
0
1000
T
1
0
1110
T
2
1
1111
F
...
...
...
...
NAND Flash Chips
PPN 0
PPN 1
...
LPN 21
LPN 0, 1
LPN 1
...
...
2
...
PPN
Write Buffer
New write requests
Update
LPN1
NAND Flash Chips
Page Mapping Table
...
...
PPN 0
...
1
0
1
2
3
4
5
6
7
...
Merge Subpage writes
LPN 1
NAND Flash Chip
(for sector data region)
...
0
LSN
PPN : Sector
1:0
0:1
0:2
0:3
NULL
1:1
1:2
1:3
...
LPN 0
(c) Subpage-based
FTL
Choose Victim
LPN 0
...
Fig. 5: Comparison
of sector log, partial page buffering, and our subpage-based FTL
MRU
LRU
Merge fail
Is subpage?
Size aware LRU lists
LPN 0
0.5K
No
Write full page
No
...
because it does not consider data size of write requests
Merge
and decompression 1Kwhen
accesses
data which can
LPN 487
LPN 2 to merged
LPN 0,1
increase the response time. [21] proposed multiple page size
Data shift LPN 1
1.5K
flash memory to address
subpage write problem. The key
idea of this scheme is LPN 232
to change
the
cell array in a die into
LPN 77
...
2K
asymmetric cell array
which has small page size and large page
size. These different page
size also has different write latency.
Page State Mapping Table
NAND Flash Chips
The Victim requests
small page size area has Page lower write latency then large
LPN 2
LPN
PPN
Flag
State
page size. By writing subpage write
request to PPN 0
small page size,
0
0
1000
T
1
0
1110
T
they can
reduce
write
latency.
However,
the
implementation
is
PPN 1
LPN 0, 1
2
1
0001
F
difficult for vendors because
they
have
to
modify
the
peripheral
...
...
...
...
part in NAND
flash chip. Moreover, because limited capacity
Unnecessary RMW
of small page(LPN2 covers PPN2)
size, they require the migration overhead to large
Merged write
Write Buffer
Recover page size region.
(1110) LPN 2
LPN 1,2
LPN 1,0
LPN 1
Among prior studies, the sector log and partial
pageLPN 0
buffer0 1 2 3
0 1 2 3
ing (PPB) schemes address the subpage
write problem. The
Page State Mapping Table
Page State Mapping Table
Page sector log and PBB modify the FTL
LPN and
PPN write buffer
Flag respecPage State
Flag
State scheme modifies both. Figure 5 shows a
tively.LPN
Our PPN
proposed
0
0
0011
T
Read 0
NULL
0000
F
overview
and1 comparison
of sector1 log,1  0
partial
pageTbuffering,
1000 
LPN 2
1
1000
T
1001
1
T
1110
and our...2 proposed
scheme
to show 2difference
amongT them. We
...
...
...
1
1110
assume that
page
size
is
2KB
and
flash
...the number
...
...of NAND
...
NAND Flash Chips
NAND Flash Chips
chip is four inPPN 0
this figure. The sector log
uses a sector mapping
PPN 0 writes are stored
table to manage subpage writes. The subpage
PPN 1
in sector data region which is a small partPPN 1
of NAND flash chip
in SSD. The full page writes useInvalid
page mapping table and store
the others NAND flash chips which called page data region.
When subpage writes are issued, sector log merges subpage
writes into full page in page buffer. Because sector log can
merge non contiguous sectors within a different page, it can
reduce the number of NAND writes. However, because sector
log uses limited small part of NAND flash chip for sector
data region, it has to flush the sector data region. The flush
operation is that evicts merged physical page and assembles
sectors base on same LPN and write to page data region. The
assembly process is that reads different physical pages which
are scattered sectors in sector data region. Figure 5a shows this
process. To generate LPN 0, sector log reads PPN 0 and PPN
1 in sector data region. Because the sector log. Because the
...
...
subpage writes split into sector unitYesand are written Yes
in sector
Is under pair data region, even
though the subpage write request with same
exist?
No
Is not RMW?
Is full page?
LPN is coming, the
Yes data can be stored in different physical
page. This can
incur
read
response
time
which
is called
read
Yes
Move to under No
pair
overhead of sector
log. In this
example,
before
flush
operation
No
Setting virtual Is pair exist?
occurs, when read request which
indicates LPN 0pair
is coming,
Yes pages which are PPN 0
the sector log must
read
two
physical
Yes
No
and PPN 1.
No Is next pair Is not RMW?
exist?
The PBB modified the write buffer. The subpage writes are
buffered in the write buffer whereasYesthe full page writes are
Merge victim directly written in NAND flash and pair
chips. In the write buffer, the
subpage writes can be updated and can generate full page
writes. Size aware LRU lists
When the write buffer is full, PPB searches full page
Victim writes0.5Kby LPN 548
LRU policy.
is no full
page writs
in LPN 0
write
LPN 0 If there
LPN 0
1 RMW Victim
(Full) request. Figure
buffer, the subpage writes is selected as victim
Virtual Pair
LPN 487the LPN 123
Victim subpage writes,
1K
5b illustrates
PPB scheme.
By buffering
LPN 0
LPN 0
or Under Pair
2 RMW (Sub)
the PPB can
reduce
subpage
writes
which
can
generate
RMW
LPN 2
LPN 42
LPN 1
1.5K
Pair
operation in NAND flash chips. Thus,
they
LPN 1the
LPN 1reduce
Pair can
3
RMW
number
ofLPN 77
subpageLPN 232
writesNext Pair
and write response
time by reducing
2K
the number of RMW. However, because the PPB does not
buffer full page writes, it generates invalid page when the full
page writes are updated. Moreover, if the write buffer does
not has the full page writes, the subpage writes are written in
NAND flash chips.
Figure 5c shows our proposed subpage-based FTL. Our
proposed subpage-based FTL exploits the write buffer. The
write buffer attempts a full or sub merge which merged size is
full page or subpage respectively. To present merged page request, we added page state information which indicates sector
data within a page to mapping table. The page state can also
detect unnecessary RMW because it can lookup stored data in
physical page based on sector unit. In comparison with sector
log scheme, our proposed scheme does not use a particular
NAND flash chip for merged page. The merged page can be
written to all NAND flash chips and managed by page state
mapping table. This means that there is no flush overhead of
sector log. Moreover, because our proposed scheme attempts a
full or sub merge in write buffer by searching subpage writes,
Page Mapping Table
Size aware LRU lists
LPN 0
0.5K
LPN 487
Write Buffer
1K
New write requests
LPN 487
1K
LPN 21
LPN 20
1.5K
LPN 232
LPN
PPN
LPN 0, 1
LPN 0, 1
No
Is under pair exist?
Page 0
0
...1
...
LPN
PPN
No
PPN 1
State
(1110) 1
...
0001
F
...
...
Merged write
Page Page notPPNsplit.State
Thus,Flag
our
PPN 0
PPN 1
PPN 1
LPN 548
Size aware LRU lists
0.5K
1K
LPN 548
1K
0.5K
LPN 548
2K
LPN 487
1.5K 1K LPN 42
2K
LPN 42
1.5KLPN 77
Yes
2K
and pair
Victim 1 RMW (Full)
Victim
LPN 2
LPN 2
LPN 123
LPN 1
LPN 1
Virtual Pair
or Under Pair
LPN 232
(Full)
LPN 1
Next Pair
Victim LPN 0
2 RMW Victim Pair Victim 3
LPN 0
LPN 0 LPN 0
RMW
1 RMW 2 RMW Next Pair(Full)(Sub)
Pair
Victim LPN 0
LPN 0
LPN 1
Pair 2 RMW 3
(Sub)
RMW
Pair
Pair RMW
LPN 1
LPN 0
LPN 0
LPN 0
LPN 0
LPN 1
LPN 1
LPN 0
(Sub)
Pair
Virtual Pair
or Under Pair
Victim
LPN 232
LPN 77
LPN 232
LPN 2
Virtual Pair
Victim or Under Pair
1 RMW 3
LPN 77
Yes
Yes
Victim
LPN 0
pair
No
and pair
LPN 123
LPN 42
pair
Merge victim LPN 0
LPN 123
No Setting virtual Yes
Setting virtual Setting virtual Merge victim pair
Fig. 7: Merging
Control Flow
Merge victim 1.5K
Size aware LRU lists
LPN 487
No
and pair
Is not RMW?
LPN 0
LPN 487
Is full page?
LPN 0
LPN 1
LPN 1
Next Pair
...
...
Flag
0 merged
0
0011 unlike
T
schemeLPNdoesPPNnot have
read
overheadMerged write
of
page
State Page State Mapping Table
Recover Read LPN 1,0
1000 
0
NULL
0000
F with PPB,
Page sector
log.
In
comparison
our
scheme
can
achieve
1
T
1  0
Page State Mapping Table
(1110) LPN 2
LPN 1,2
LPN 2
PPN
Flag
1100
1
1
1000 LPN T
State
2
1 NAND
T
1110
high reduction
writes
andPage State Mapping Table
write
by
2 latency
1
1110merging
T
Page PN
PPN
Flag
0 ...
0
0011
T
...
...
State
Page ... RMW.
...
... scheme
...
subpage
writes
and...eliminate
unnecessary
Our
Page State Mapping Table
LPN
PPN
Flag
1000 
State
0
NULL
0000
F
NAND Flash Chips
1
T addition, because
1  0
NAND Flash Chips
Page canLPNreduce
number
of
NAND
writes.
In
1100
1
1
1000
T PPN the
Flag
0
0
0011
T
State
PPN 0 F 2
2
1 Read T NULL 0000
1110
PPN 0
we 0eliminate
unnecessary
RMW,
our
scheme
can
decrease
1
1110
T
1000 
1
T
1  0
LPN 2
..
...
... 1 ... 1
1100
1000
T
...
...
...
...
write
response
time
and
the
number
of
NAND
reads.
Even
2
1
1110
PPN 1 T
2
1
1110 PPN 1T
...
...
... table
...
NAND Flash Chips
Invalid
though
mapping
sizeNAND Flash Chips
is increased,
it can
be... acceptable
...
...
...
current DRAM size. PPN 0 NAND Flash Chips
PPN 0 base onNAND Flash Chips
Yes
Is full page?
Is not RMW?
Is not RMW?
No
No
Is full page?
Yes
Is pair exist?
0.5K
Page State Mapping Table
Recover LPN 1,0
Page State Mapping Table
Unnecessary RMW
LPN
(1110) theLPN 1,2
data of
subpage are written entirely
(LPN2 covers PPN2)
No
No
Is next pair exist?
Size aware LRU lists
PPN 1
...
2
...
LPN 1,2
Yes
No
Is not RMW?
Is next pair Yes
exist?
No
Yes
Is pair exist?
Yes
Yes
Write full page
Write full page
Is pair exist?
Yes
PPN 0
Merged write
Yes
No
No
exist?
0
0
1000
T
Recover LPN 1,0 FTL
Overall
Architecture
of Subpage-based
1
0
1110
T
No
Is subpage? No
Yes
Move to under Is next pair pair
No
Is not RMW?
Is not RMW?
Yes
Yes
PPN 0
1000
T
1110
T
LPN 77
0001
PPN 0 F
...
...
Flag
Choose Victim
No
No
Yes
Yes
Is under pair exist?
PPN 1
Write full page
Yes
Move to under pair
Yes
Merge fail
PPN 0
Yes
Is under pair exist?
Move to under pair No
PPN
Flag
Data shift State
NAND Flash Chips
No
No Is subpage?
...
Unnecessary RMW
Fig. 6:
(LPN2 covers PPN2)
LPN 0, 1
LPN 2
0
Page 1
Flag
LPN 232
State
2
2K
...T
1000
Is subpage?
Merge fail
LPN 0
1110
T
PPN 1
Page State Mapping Table
Unnecessary RMW
0001
F
NAND Flash Chips
(LPN2 covers PPN2)
...
...
Page LPN 2
Merge fail
2
...
0
0
1
0
2
1
Victim requests
...
...
Metadata
LPN 77
...
PPN
0
1
2
...
LRU
NAND Flash Chips
Page Flag
State
0
1000
T
0
1110 Choose Victim
T
Choose Victim
1
1111
F
...
...
...
LPN
PPN 1
Merge
LPN 487
LPN 2
Page State Mapping Table
NAND Flash Chips
LPN 0,1
LPN 2
LPN
T
T
F
...
LRU
LPN 1
Page State Mapping Table
1.5K
LPN 2
1
2
1
...
Page State Mapping Table
PPN 0
LPN 77
...
Victim requests 1K
im requests
Data shift LPN 1
... LPN 0,1
LPN 232
Size aware LRU lists
2K
LPN 0
0.5K
2K
0
0
1
...
LPN 0,1
Merge
LPN 1
LPN 83
Flag
Merge
LPN 2
MRU
Data shift LPN 1
1.5K
One list LRU
LPN 2
LPN 232
IDX
...
...
NAND Flash Chips
PPN
0
1
2
LPN 0 ...
...
1
LPN 20
PN 2
LPN
...
NAND Flash Chips
0
1:0
Page Mapping Table
1
0:1
PPN 1
PPN
Write Buffer LPN
2
0:2
Write Buffer
PPN 124
3
0:3
New write requests
...
...
One list LRU
4
NULL
2
124
One list LRU
LPN 20
LPN 21
LPN 232
LPN 83
LPN 1
5
1:1
...
... ...
LPN 232
LPN 0
6
1:2 LPN 83 LPN 0
...MRU LPN 1
7
1:3
...
...
MRU
LRU
Read PPN 0,1
Size aware LRU lists
(flush)
LPN 0
0.5K
PPN 0
PPN 1
IV.Invalid
S UBPAGE - BASED PPN 1
FTL
...
...
A. Merge Subpage
...
...
Invalid
We proposed subpage-based FTL to reduce the number of
subpage writes and eliminate unnecessary RMW. The overall
architecture is illustrated in Figure 6. The proposed FTL
reduces the number of subpage write operations by merging
two subpage writes with different LPN in the write buffer.
We added size aware LRU lists to the one conventional LRU
list in the write buffer. When new write requests are issued,
these requests enqueue the one list and the size aware list
depending on the size of the requests. Note that we only added
data structure for size aware LRU lists. The data in node will
be stored only one time. The size aware lists can be applied
to conventional buffer policy like LRU. We increased the
searching time by implementing a red-black tree for searching
write buffer data like [9]. The data structure of the write buffer
node has two pointers. The first is the one list LRU and the
other is the size aware lists. These pointer are used to update in
write buffer. Because the data of updated node will be changed.
Thus, we have to move changed data node to size lists. The
chosen victim is LRU, but if the victim node size is subpage,
..
2
...
rite requests
Data Shift
2
...
0
LPN 1
Metadata
Metadata
...
6
1:2
1:2
LPN 0
LPN 0
7
1:3
1:3
Sector Mapping Table
...
...
...
NAND Flash Chip
...
Read PPN 0,1
Read PPN 0,1
PPN : (for sector data region)
(flush)
LPN
LSN(flush)
Sector
PPN 0
6
7
...
(a) Merging Example
(b) Three types of RMW
Fig. 8: Merging example and three types of RMW which can
change the size
it attempts to merge this subpage with a different subpage to
generate a full page.
Figure 7 shows the control flow of our merging algorithm,
which attempts two types of merge. The first type is a full
merge of which the merged size is a full page. If the full merge
fails, it tries to perform a sub merge of which the merged size
is a subpage. Our merging algorithm searches a pair list to
create a full page. Before searching the pair list, we check
whether the pair contains sufficient subpage writes to merge.
If not, we have to check the victim node is RMW or not.
Because the RMW can change data size by reading old data.
If the RMW generates full page, it will be directly writes
to NAND flash chips. Even though the RMW retains the old
data, some case cannot generate full page. In this case, we find
virtual pair which is a pair of increased victim size after the
RMW. After then, we also check whether the pair list node
is RMW or not. If the node in pair list is the RMW, then
0
0
1
...
1000
1110
0001
...
T
T
F
...
Yes
PPN 1
Merge victim and pair
...
0
1
2
...
LPN 0, 1
Unnecessary RMW
(LPN2 covers PPN2)
Recover (1110) LPN 2
0 1 2 3
Merged write
Write Buffer
LPN 1,0
LPN 1
LPN 1,2
LPN 0
0 1 2 3
Page State Mapping Table
Page State Mapping Table
Read LPN 2
Size aware LRU lists
LPN
PPN
0
1
2
...
NULL
1
1
...
Page State
0000
1000
1110
...
LPN
PPN
Flag
F
T
T
...
NAND Flash Chips
0
0
1
1  0
2
...
Page State
Flag
0011
T
1
1000 
1100
1110
T
...
...
...
T
NAND Flash Chips
PPN 0
PPN 0
PPN 1
PPN 1
...
Invalid
...
(a) Read
(b) Write
Fig. 9: Read and write example in page state mapping table
we move to next node. If we failed to merge full page, our
merging algorithm attempts to sub merge which merged size is
subpage to reduce the number of NAND writes. The sub merge
moves the pair to under pair which data size is less than pair.
After then, we keep up searching buffer nodes in under pair
until satisfying conditions. When the merge succeeds, because
the location of sectors within a page is not fit between victim
node and pair node, we have to shift the victim sectors to left
and the pair sectors to right. The original sector states in each
node will be represented to mapping table.
Figure 8 shows the merging example and three types of
RMW related to victim and pair. We assume that the page size
is 2KB. The write buffer shows the victim is LPN 0 which
size is 0.5KB. If victim is RMW and changes the size as
shown Figure 8b (1), the victim node will be directly write to
NAND flash chips because of full page. Figure 8b (2) shows
that the victim with the RMW cannot create full page case. In
this case, we find virtual pair which is LPN 123 because the
size of victim after RMW is 1KB. If victim is not RMW, the
first pair node is LPN 1 which size is 1.5KB because the sum
of victim and pair node can generate full page. However, in
Figure 8b (3), the pair node also can be RMW. If pair node is
RMW, we move to next pair which is LPN 2. If we failed to
search all pair nodes to generate full page, we try to perform
sub merge by moving under pair which size is 1KB.
B. Page State Mapping Table
To present merged write requests in NAND flash chip, we
implement page state mapping table. We add size information
to show stored data state within a physical page based on
sector unit. we present the page state as bitmap. The one bit
means sector unit (512 Bytes) within a page. The meaning
of setting one bit to 1 within the page state is that sector
unit has the data. On the other hand, the value is 0 means
that the sector unit is empty. If the page size is 8KB, it can
be separated to 16 bits based on sector unit. Thus, it can be
increased depending on page size. This page state information
provide partially stored data within a page. Thus, the page
Victim state detects
unnecessary
RMW because 1 we
canLPN 0
see stored
LPN 0
0.5K LPN 548
LPN 0
RMW Victim
(Full)
data state within a physical page. When update
occurs in
Virtual Pairthe old Victim LPN 123to invalidate
LPN 487
merged1Kpage,
we have
data. Because
the
LPN 0
LPN 0
or Under Pair
2 RMW remaining half merged data are still available,
(Sub) we add valid
LPN 2
LPN 42
LPN 1
Pair
state. 1.5K
It expresses the valid state within
a Pair merged
LPN 1
LPN 1 physical
3
RMW the required
LPN 77 we LPN 232
page. Because
merge
two
subpage
writes,
Next Pair
2K
memory of valid state is two bits. Each bit presents valid
state of each subpage. The garbage collection can change PPN
by using the physical page information to erase blocks [31].
Thus, each PPN is linked to LPN. We add each merged LPNs
to the physical page information. This information helps the
garbage collection process by directly accessing linked LPNs
and changing PPNs. To check that indicated PPN is merged
page, we add merge flag which is one bit. It can check whether
the PPN is merged or not and determine whether decoding
which is data copy when read operation occurs or not.
Figure 9 shows read and write process in page state mapping
table. We assume that the page size is 2KB which can separated to four sector unit. The read request can directly access
the mapping table using LPN to find PPN as shown in Figure
9a. The LPN 2 indicates PPN 1 in mapping table. However,
the PPN 1 is merged page. Because we cannot partially read
data within a page, we have to read whole page even though
useless data is included. The merged data from PPN 2 move
to temporally buffer in DRAM. Because the physical page
information have each LPNs (1,2) in merged PPN 1, we can
identify which sectors within PPN 2 indicate the LPN 2. Also,
the page state presents the sectors of requested LPN which is
different from actually stored sectors in physical page. Thus,
because the read request requires sectors 1 to 3 within the
PPN 1, we have to copy the sectors to the read buffer base
on the page state which is sectors 0 to 2 (1110) in LPN 2
entry. This process is decoding when we access merged page to
read the data. Figure 9b shows write process in merged page.
As the process of decoding in read operation, the selected
two subpage writes are shifted left and right respectively and
stored in temporally buffer. After then it can be write to
physical page. This process is encoding in merged write. The
merged write will be indicated free physical page which is
PPN 0. The LPN 0, 1 change the PPN to 0, flag to 1, and the
page state information by using requested LPNs. However,
because the LPN 1 is already stored in PPN 1, we also update
physical page information by changing the valid state 11 into
01. However, the PPN 1 has valid sectors because the LPN 2
is available. Thus, the LPN 2 entry in mapping table does not
change.
V. E XPERIMENT
We evaluated our proposed scheme using a trace-based simulator which is a cycle accurate NANDFlashSim [15]. We also
implemented FTL above multiple NANDFlashSim instances
using OpenSSD [31]. The baseline is page level mapping
policy and an LRU policy write buffer. The scheduling scheme
is FIFO, which is a basic SSD scheduler. The simulation
configuration is shown in Table I. We used NAND flash chip
configuration with Multi Level Cell (MLC) which has variable
Chip configuration
Page size
4KB ∼ 32KB
pages per block
128
blocks per plane
2048
planes per die
2
dies per chip
2
Read Latency
250us (average)
Program Latency
1.3ms (average)
Erase Latency
1.5ms (average)
SSD configuration
chips per channel
8
channels
8
write buffer size
64 MB
TABLE I: System configuration
Description
Searching one node
encoding / decoding one sector
Latency
1297 cycles
482 cycles
TABLE III: Overhead Latency
•
•
nodes.
MergeWindow - Full and sub merge with 100 , 200
window size respectively.
MergeOptimal - Full and sub merge by searching all
buffer nodes.
A. NAND Writes
BS1
BS2
CFS1
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
The number of I/Os
Read
Write
238,329
207,212
123,155
314,188
116,811
39,300
115,523
40,511
795,251
234,275
1,167,838
159,079
97,623
330,123
6,712,172
438,967
501,197
489,467
419,229
577,053
267,468
667,767
22,184
93,743
17,956
118,580
Average size (Bytes)
Read
Write
12,187
15,305
15,991
28,542
9,307
14,181
8,562
13,870
26,876
20,734
15,100
11,127
15,959
20,921
8,553
18,724
12,392
20,125
11,414
17,939
16,165
12,135
10,846
7,677
15,023
8,019
TABLE II: The characteristic of traces
latency [28]. The basic configuration is 8KB page size and
64MB write buffer.
In our experiment, we employed the Microsoft production
server and enterprise traces which have different characteristic
[1] which is shown in Table II. Because write request size
determines the number of subpage writes, we used different
write request size. We also used different read and write
requests ratio traces. Our traces consist of build server (BS),
storage file server (CFS), development tools release (DTR),
exchange server (EXC), and RADIUS authentication server
(RA).
To measure overhead, we exploit gem5 [4] which is a
cycle accurate and integrated DRAMSim2 [26]. We run the
searching node in write buffer and the memory copy for
encoding and decoding part to gem5 simulator as [7]. The
simulated processor and DRAM is the ARM process [8]
with 400 MHz frequency and LPDDR2 respectively. These
configurations are similar to the commercial SSD [29]. The
overhead latency is shown in Table III. The searching one
node means that comparing victim node and pair node by look
up mapping table to check merging conditions. The encoding
and decoding one sector mean that comparing requested sector
state and page state in mapping table and copy one sector.
The performance of our proposed scheme can be affected by
merging policy. To measure the performance of different merging policy, we evaluated the following three buffer schemes.
• Baseline - basic LRU buffer policy.
• MergeFull - Only Full merge by searching all buffer
Figure 10 shows the number of NAND writes with the
different page sizes and breakdown based on full and subpage. The Figure from top to bottom indicates 4KB, 8KB,
16KB, and 32KB page size respectively. In case of 4KB page
size, the subpage write ratio is low which means most of
incoming write requests generate full page writes. The chance
of merging is low in 4KB page size because our merging
schemes try to merge two subpage writes when victim is
subpage write. Even though the NAND writes is few reduced
by our schemes in some traces (BS, RA, and DTR), our
schemes can achieve the reduction NAND writes almost 6%
on average in 4KB page size. Because the subpage write
ratio is increased than 4KB page size, most of traces can
achieve the reduction NAND writes by using our schemes.
Also, it can reduce most subpage writes which can cause
internal fragmentation and endurance problem less than 2%.
Our schemes can reduce the NAND writes up to 23% and
13% on average. We clearly see the difference of MergeFull
and MergeWindow schemes in 16KB page size. Because
MergingFull scheme only searches pair nodes for full merging,
this scheme is failed to merge when victim cannot find satisfied
pair node. Thus, even though subpage write ratio is higher than
8KB page size, most of traces cannot remove subpage writes
as 8KB page size. However, if we exploit sub merge schemes
which are MergeWindow and MergeOptimal, we can merge
two subpage writes by searching more nodes. The number of
NAND writes is reduced by up to 30% and 19% on average
in 16KB page size. The result of 32KB page size shows a
distinct difference between MergeWindow and MergeOptimal
rather than 16KB page size. The searching count for merging is
more increased in 32KB page size than 16KB page size. Even
though MergeOptimal scheme more reduces the NAND writes
than MergeWindow scheme in some traces, the difference
between two schemes is few in average. Furthermore, we have
to limit the number of searching count which is MergeWindow
scheme because of overhead. The number of NAND writes is
reduced by up to 36% and 24% on average.
B. Response Time
Figure 11 shows the write response time and its break- down
which are NAND write, RMW, and overhead with different
page size. The overhead includes searching in write buffer
and memory copy to generate full and sub page when merge
Normalized Writes Normalized Writes Normalized Writes Normalized writes Normalized Read Normalized Write (8KB) Normalized Writes (32KB) Normalized Writes (16KB) Normalized Writes (4KB) Normalized writes Normalized Write Normalized Read Normalized Write Response time (8K) (8KB)
(32KB)
Response Time (8KB)
(16KB)
(4KB)
Normalized Write Response time (8K) Response Time (16KB)
Response Time (8KB)
Response Time (16KB)
Fullpage write
1
0.8
0.6
0.4
0.2
0
Fullpage write
Subpage write
Subpage write
B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O
1
BS1
BS2
CFS1
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
Average
0.8
0.6 1
0.40.8
0.20.6
00.4
0.2 B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O
0
BBS1
F W O BBS2
F W O BCFS1
F W O BCFS2
F W O BDTR1
F W O BDTR2
F W O BEXC1
F W O BEXC2
F W O BEXC3
F W O BEXC4
F W O BEXC5
F W O B RA1
F W O B RA2
F W O Average
B F W O
1
0.8
BS1
BS2
CFS1
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
Average
0.6 1
0.40.8
0.20.6
00.4
0.2 B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O
0
BBS1
F W O BBS2
F W O BCFS1
F W O BCFS2
F W O BDTR1
F W O BDTR2
F W O BEXC1
F W O BEXC2
F W O BEXC3
F W O BEXC4
F W O BEXC5
F W O B RA1
F W O B RA2
F W O Average
B F W O
1
0.8
BS1
BS2
CFS1
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
Average
0.6
0.4 1
0.20.8
00.6
0.4 B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O
0.2
0
BS1
BS2
CFS1
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
Average
B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O
1
0.8
BS1
BS2
CFS1
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
Average
0.6
0.4
0.2
0
NAND operation
RMW
Overhead
Fig. 10:1.1Normalized
the
(full
and
B F W O B F W
O Bnumber
F W O BofF NAND
W O B F writes
W O B Fbreakdown
W O B F W O
B F and
W O subpage)
B F W O B with
F W Odifferent
B F W O page
B F W size.
O B FB,
W F,
O BW,
F W
O O
indicate Baseline,
MergeFull,
and MergeOptiaml.
1 BS1
BS2
CFS1MergeWindow,
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
Average
0.9
0.8
0.7
NAND operation
RMW
Overhead
B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O
1.1
1
BS1
BS2
CFS1
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
Average
0.9
0.81.1
0.7 1
B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O
0.9
BS1
BS2
CFS1
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
Average
0.8
1.10.7
B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O
1
0.9
BS1
BS2
CFS1
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
Average
0.8
0.7
NAND Read
1.02
1.01
AverageDecoding
B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O
BS1
BS2
CFS1
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
Average
1
Fig. 11: Normalized write response time and its breakdown
are NAND operation, RMW (in write case), and overhead.
NAND Read which
AverageDecoding
0.99
B, F, 1.02
W, O indicates Baseline, MergeFull, MergeWindow, and MergeOptimal
0.98
1.01
1
B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O
BS1
BS2
CFS1
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
Average
Normalized Writes
Normalized Writes
two subpage
writes is succeed. Because 8KB, and 16KB page write response time up to 14% and 8% on average. Because
0.99
size result
is
similar to 4KB, 32KB page size respectively, the searching count is increased to successfully merge in
0.98
we show 8KB
and
Our
page
B F W
O B16KB
F W O page
B F Wsize
O B result.
F W O B
F Wproposed
O B F W OFTL
B F W 16KB
O B F W
O B size,
F W O the
B F overhead
W O B F Wwhich
O B F searches
W O B F Wbuffer
O B F node
W O is
exploits pageBS1state mapping
table which
keeps
asEXC3
shown EXC4
in below
FigureRA111. WeRA2can see
differBS2
CFS1
CFS2
DTR1 sectors
DTR2 data
EXC1 increased
EXC2
EXC5
Average
within a page. Because this page state information
cap help ence
between MergeWindow and MeregeFull, MergeOptimal
Fullpage Write
Subpage Write
to1 detect unnecessary RMW. Thus, the write response time schemes. Even though our FTL reduces write response time
0.8
is reduced. Because we merge two subpage writes which by eliminating unnecessary RMW, because the MergeFull and
0.6
0.4
have different LPNs which can indicate same NAND flash MergeOptimal schemes searches all buffer node to merge two
0.2
Fullpage Write
Subpage Write
chip
in write buffer, we can reduce average NAND
writes. subpage
writes, the overall write response time is more in0
1
B searching
W B W B W B Wcount
B W B Wto
B Wsuccess
BWBWBW
B W Bmerge
W B W B subpage
W B W B W B writes
W B W B WisB W Bcreased
W B W B Wthan
B W B baseline
W B W B W Bin
W BDTR
W B Wtraces.
B W B W BFurthermore,
WBWBWBWBW
B W Bthough
WBWBWBW
The
full
even
0.8
32 64 in
1284KB,
32 64 8KB
128 32page
64 128
32 64
128 32
128 32 64 128 32 64 128
32 64
128 32 64searches
128 32 64 all
128 buffer
32 64 128
32 64to128reduce
32 64 128
32 64 128
0.6not high
size.
Thus,
the64MergingWindow
these
schemes
node
NAND
0.4
BS1
BS2
CFS1
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
Average
has
similar
reduction
of
write
response,
comparing
with
the
writes,
the
reduction
NAND
writes
is
slight,
comparing
with
0.2
0other schemes except EXC2 trace. Our FTL can reduce the
MergeWindow scheme as we shown in Figure 10. Thus, the
BWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBW
32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128
BS1
BS2
CFS1
CFS2
DTR1
DTR2
EXC1
EXC2
EXC3
EXC4
EXC5
RA1
RA2
Average
MergeWindow scheme has better performance than MergeFull
and MergeOptimal. The write response time is reduced by up
to 16% and 11% on average. Even though merge schemes can
achieve the reduction of write response time on average, it can
increase a particular write request because of the merge fail
which is called worst case. The response time of this case is
more increased as we keep searching buffer node to merge
subpage writes even though the probability of merging is
low. Therefore, MergeWindow can reduce worst case response
time more than half MergeFull and MergeOpmtial schemes.
Our proposed FTL can increase the read response time by
decoding sectors within a page when the request accesses
merged page. We also experimented the read response time
related to decoding overhead. However, the decoding overhead
in read request is slight because this overhead only occurs
when the requests accesses to merged physical page. the read
response time is increased by the decoding overhead up to
0.2% and 0.1% on average. Also, the different page size not
affect decoding overhead because it is unrelated to subpage
writes ratio. Therefore, the different page size results are
similar read response time. Even though, in worst case, the
read response time is increased 5% maximum and 4% on
average
VI. C ONCLUSIONS
Because of limited P/E cycle characteristic of NAND flash
chips which are main component of the SSD, the reducing
NAND writes improve the lifetime of SSD. Nowadays, the
vendors increase the page size which is minimum I/O unit in
NAND flash chip to increase throughput. However, because
current host system uses sector unit which size is less than
page units, the number of subpage write requests is increased.
Furthermore, because NAND flash chips are not allowed in
place update, the RMW which can increase the write response
time is necessary when update operation occurs in subpage
write. The subpage writes cause internal fragmentation, endurance problem which is program disturbance, and increase
the number of RMW.
In this paper, we propose subpage-based FTL to improve
lifetime and performance. The subpage-based FTL can reduce
subpage writes by merging two subpage writes in write buffer
and eliminate unnecessary RMW by using page state mapping
table which keeps stored requested sectors information within
a page. According to our experiment, subpage-based FTL can
reduce subpage writes almost less than 2% and the number
of NAND writes up to 23% and 13% on average. Also, The
subpage-based FTL can reduce the write response time up to
13% and 8% on average.
R EFERENCES
[1] S. repository. http://iotta.snia.org/.
[2] M. Abraham. Nand flash architecture and specification trends. Flash
Memory Summit, 2012.
[3] N. Agrawal, V. Prabhakaran, T. Wobber, J. D. Davis, M. S. Manasse,
and R. Panigrahy. Design tradeoffs for ssd performance. In USENIX
Annual Technical Conference, pages 57–70, 2008.
[4] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu,
J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, et al. The gem5
simulator. ACM SIGARCH Computer Architecture News, 39(2):1–7,
2011.
[5] A. Birrell, M. Isard, C. Thacker, and T. Wobber. A design for highperformance flash disks. ACM SIGOPS Operating Systems Review,
41(2):88–93, 2007.
[6] F. Chen, R. Lee, and X. Zhang. Essential roles of exploiting internal
parallelism of flash memory based solid state drives in high-speed data
processing. In High Performance Computer Architecture (HPCA), 2011
IEEE 17th International Symposium on, pages 266–277. IEEE, 2011.
[7] F. Chen, T. Luo, and X. Zhang. Caftl: A content-aware flash translation
layer enhancing the lifespan of flash memory based solid state drives.
In FAST, volume 11, 2011.
[8] F. A. Endo, D. Couroussé, and H.-P. Charles. Micro-architectural
simulation of embedded core heterogeneity with gem5 and mcpat. In
Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, page 7. ACM, 2015.
[9] Y. Hu, H. Jiang, D. Feng, L. Tian, H. Luo, and S. Zhang. Performance
impact and interplay of ssd parallelism through advanced commands,
allocation strategy and data granularity. In Proceedings of the international conference on Supercomputing, pages 96–107. ACM, 2011.
[10] Intel. Data compression in the intel solid-state drive 520 series technical brief. http://www.intel.com/content/www/us/en/solid-state-drives/
solid-state-drives-ssd.html.
[11] Intel. Partition alignment of intel ssds for achieving maximum performance and endurance. http://www.intel.ph/content/dam/www/public/us/
en/documents/technology-briefs/ssd-partition-alignment-tech-brief.pdf.
[12] S. Jin, J. Kim, J. Kim, J. Huh, and S. Maeng. Sector log: fine-grained
storage management for solid state drives. In Proceedings of the 2011
ACM Symposium on Applied Computing, pages 360–367. ACM, 2011.
[13] H. Jo, J. U. Kang, S. Y. Park, J. S. Kim, and J. Lee. Fab: flashaware buffer management policy for portable media players. Consumer
Electronics, IEEE Transactions on, 52(2):485–493, 2006.
[14] M. Jung and M. T. Kandemir. Sprinkler: Maximizing resource utilization
in many-chip solid state disks. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on, pages
524–535. IEEE, 2014.
[15] M. Jung, E. H. Wilson III, D. Donofrio, J. Shalf, and M. T. Kandemir.
Nandflashsim: Intrinsic latency variation aware nand flash memory
system modeling and simulation at microarchitecture level. In Mass
Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium
on, pages 1–12. IEEE, 2012.
[16] M. Jung, E. H. Wilson III, and M. Kandemir. Physically addressed
queueing (paq): Improving parallelism in solid state disks. In Computer
Architecture (ISCA), 2012 39th Annual International Symposium on,
pages 404–415. IEEE, 2012.
[17] J.-U. Kang, H. Jo, J.-S. Kim, and J. Lee. A superblock-based flash
translation layer for nand flash memory. In Proceedings of the 6th ACM
& IEEE International conference on Embedded software, pages 161–
170. ACM, 2006.
[18] D. Kim and S. Kang. Partial page buffering for consumer devices with
flash storage. In Consumer Electronics Berlin (ICCE-Berlin), 2013.
ICCEBerlin 2013. IEEE Third International Conference on, pages 177–
180. IEEE, 2013.
[19] J. Kim, J. M. Kim, S. H. Noh, S. L. Min, and Y. Cho. A space-efficient
flash translation layer for compactflash systems. Consumer Electronics,
IEEE Transactions on, 48(2):366–375, 2002.
[20] J.-H. Kim, S.-H. Kim, and J.-S. Kim. Subpage programming for
extending the lifetime of nand flash memory. In Proceedings of the
2015 Design, Automation & Test in Europe Conference & Exhibition,
pages 555–560. EDA Consortium, 2015.
[21] J.-Y. Kim, S.-H. Park, H. Seo, K.-W. Song, S. Yoon, and E.-Y. Chung.
Nand flash memory with multiple page sizes for high-performance
storage devices.
[22] Micron. M500 2.5-inch sata nand flash ssd datasheet. http://www.crucial.
com/usa/en/storage-ssd-m500.
[23] Micron. An overview of ssd write caching. http://www.micron.com.
[24] Micron. P420m 2.5-inch pcie nand flash ssd datasheet. https://www.
micron.com/products/datasheets/.
[25] Y. Park and J.-S. Kim. zftl: power-efficient data compression support for
nand flash-based consumer electronics devices. Consumer Electronics,
IEEE Transactions on, 57(3):1148–1156, 2011.
[26] P. Rosenfeld, E. Cooper-Balis, and B. Jacob. Dramsim2: A cycle
accurate memory system simulator. Computer Architecture Letters,
10(1):16–19, 2011.
[27] Samsung.
1g x 8 bit / 2g x 8 bit / 4g x 8 bit nand flash
memory k9xxg08uxa datasheet. http://www.samsung.com/Products/
Semiconductor.
[28] Samsung. 32gb a-die mlc nand flash datasheet k9gbg08u0a-m. http:
//www.samsung.com/uk/business/insights/datasheet/.
[29] Samsung. Samsung ssd 850 pro datasheet. http://www.samsung.com/
global/business/semiconductor/minisite/SSD/global/html/ssd850pro/
specifications.html.
[30] J. Seol, H. Shim, J. Kim, and S. Maeng. A buffer replacement algorithm
exploiting multi-chip parallelism in solid state disks. In Proceedings
of the 2009 international conference on Compilers, architecture, and
synthesis for embedded systems, pages 137–146. ACM, 2009.
[31] Y. Song, S. Jung, S. Lee, and J. Kim. Cosmos openssd: A pcie-based
open source ssd platform. Flash Memory Summit, 2014.
[32] G. Wu and X. He. Delta-ftl: improving ssd lifetime via exploiting
content locality. In Proceedings of the 7th ACM european conference
on Computer Systems, pages 253–266. ACM, 2012.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement