Subpage-based Flash Translation Layer For Solid State Drivers Mincheol Kang,Wonyoung Lee, and Soontae Kim School of Computing KAIST, Daejeon, Korea {mincheolkang, wy lee, kims}@kaist.ac.kr Abstract—Solid State Drives (SSDs), which consist of NAND flash chips, are widely used in current computer systems because of the higher read and write speed than conventional Hard Disk Driver (HDD). However, because NAND flash chips has limited Program and Erase (P/E) cycles, reducing writes to NAND flash chips can improve lifetime of SSD. As vendors increase the page size which is minimum I/O unit in NAND flash chip, the number of subpage which size is less than page unit is also increased because of different I/O unit between host and NAND flash chip. This subpage write requests can cause endurance problem and internal fragmentation. In addition, subpage write increase the number of Read Modify Write (RMW) operation because of out-place-update features in NAND flash chips. Also, it increases write latency because of reading old data first before update write. In this paper, we proposed subpage-based Flash Translation Layer (FTL) to increase lifetime and performance by reducing writes to NAND flash chips and eliminating unnecessary RMW. We modified write buffer in SSD to achieve merging subpage to full page and added size information to mapping table to detect unnecessary RMW. Our proposed scheme reduces the subpage writes ratio less than 2% and is shown to reduce writes to NAND flash chip by up to 23 % and 13% on average and write response time to maximum 14% and 8% on average. I. I NTRODUCTION Solid State Drives (SSDs) are used in a wide range of computer devices as a storage system. Unlike a conventional Hard Disk Drive (HDD), an SSD has many benefits including high access speed, high durability, and low power consumption. Nowadays, the cost which is the main disadvantage of SSD in comparison with HDD is steadily being lowered because the NAND flash density has been increasing. Thus, the SSD is main storage rather than HDD. However, the NAND flash chip, which is the main component of SSD, has limited Program and Erase (P/E) cycle which determines the lifetime of SSD. Also, it cannot update the same physical location which is called out-place-update. Thus, reducing the number of writes to NAND flash chip can improve the lifetime of SSD [32]. An SSD uses a page unit for read and write operations which is different from current host system I/O unit called sector. Because of different I/O unit and limited P/E cycles, the SSD uses Flash Translation Layer (FTL) which translates address using mapping table to improve lifetime the SSD [17]. Furthermore, because of slow write latency than read latency in NAND flash chip, the SSD exploits internal DRAM as write buffer which temporarily stores the write requests [23]. Thus, the SSD can mitigate disadvantages of NAND flash chip. Recently, manufacturers have been maximizing the throughput and capacity of SSD by increasing the page size [2], which currently is 8KB or 16KB. However, current size of sector unit which is designed for HDD is less than page unit. Thus, the requests which are smaller data size than the page unit can occur. This requests are called subpage requests. This subpage write can cause endurance problem [20] and internal fragmentation within a page. Because NAND flash chip has out-place-update feature, the subpage write can produce Read Modify Write (RMW) when update operation with same physical address occurs. This RMW delays the new write request because it have to read old data first and write to new physical location [3][5]. Several techniques have been proposed to address subpage problem. [12] proposed sector log to reduce subpage write to NAND flash chip and elapsed times. They manage subpage write requests to a small part of NAND flash chips using sector mapping table. They can reduce the number of write requests by mering subpage write requests. However, when limited NAND flash chips that store merged subpage write are full, it have to evict merged data to write the other part of NAND flash chips which store full page. This process increases read latency. Compression supported FTL [25] has been proposed. They make hardware compressor which can compress several subpage writes. Even though they can reduce the number of writes to NAND flash chip, it has encoding and decoding overhead and cannot solve internal fragmentation within a page. In this paper, we propose a subpage-based Flash Translation Layer (FTL) to reduce subpage write requests and eliminate unnecessary RMW to improve lifetime and performance. We modified the write buffer which can merge subpage write requests to full-page to reduce subpage writes to NAND flash chips. Also, we added size information to the mapping table to eliminate unnecessary RMW. Our experimental results were obtained using a trace-based simulator and show that subpagebased FTL can reduce the NAND writes up to 23% and 13% on average and subpage writes ratio less than 2%. Also, the write response time is reduced by up to 14% and 8% on average. The rest of this paper is organized as follows. The next section explains the background and motivations behind this work. Section 3 introduces and discusses related work. Our subpage-based FTL is proposed in Section 4. The experimental Host Interface SSD Microprocessor DRAM Write Buffer Buffer Data Flash Translation Layer Mapping Table Die Flash Controller Channel Channel NAND Flash Chip NAND Flash Chip NAND Flash Chip NAND Flash Chip NAND Flash Chip NAND Flash Chip NAND Flash Chip NAND Flash Chip Die Plane Block Page Page Page Page Plane Block Page Page Page Page Plane Block Page Page Page Page Plane Block Page Page Page Page Block Page Page Page Page Block Page Page Page Page Block Page Page Page Page Block Page Page Page Page Multiplexed Bus Chip Internal Fig. 1: SSD Architecture results are discussed in Section 5. Finally, the conclusions are given in Section 6. II. BACKGROUND AND M OTIVATION A. Internal SSD 1) SSD Architecture: Figure 1 shows the overall SSD architecture. The host I/O requests transfer to the SSD through the host interface which is SATA or PCIe. The microprocessor has responsible for running embedded software modules which are write buffer and FTL. These software modules exploit internal DRAM. The write buffer temporarily stores write requests data and mapping table of FTL which is used for address translation. After address translation, flash controller obtains NAND flash commands with physical address which are generated by FTL. The flash controller is connected to multiple channel which can be operated independently. Multiple NAND flash chips share a serial bus to a single channel. These structure can improve parallelisms [6]. A NAND flash chip consists of a set of dies which can operate command independently by interleaving because several dies are connected to a multiplexed bus [16]. Each die has several planes which shares the word-line. A plane consists of thousands of block which is erase unit. The block has several pages which is read or write unit. The page can be separated to sector unit (512 Bytes) which is minimum I/O unit [27][9][28]. However it cannot read or write sector unit within a page. If we want to read a particular sector in a page, NAND flash chip must be read whole page even though the others sector is not meaningful. 2) Write Buffer and FTL: Because NAND flash memory has higher write latency than read latency, the SSD exploits DRAM as the write buffer. The host system notifies the SSD of new requests by providing information including the logical sector number (LSN), size, and command. Because the minimum I/O unit differs between HDD and SSD, the LSN has to be changed to the logical page number (LPN). If the I/O requests exceed the page size, they are split into memory requests based on the page size [14]. In addition, the subpage Free space keeps information sector numberNew data withinOld data a page. Even though PPN 100 PPN 100 Update new PPN 100 the I/O request size fits a page, it can be subpage because of mis-aligned sector number PPN 100 [11]. If the write PPN 100 requests are Read old PPN 100 coming, the write buffer tries to add the memory requests to the buffer list, which is managed by LPN. If some of the PPN 100 Modify new PPN 100 memory requests are found in the buffer list, thesePPN 100 are updated using old PPN 100 to the new data. If the buffer list is full, it tries to choose a victim memory request by using a Least Recently Used (LRU) policy, which is widely used in the case of write buffers [13]. FTL helps the SSD to improve lifetime. Because of different I/O unit between host system and NAND flash chip, Address translation must be required. After translating LSN to LPN, the FTL translates LPN to Physical Page Number (PPN) by using mapping table which keeps physical location of NAND flash chips corresponding LPN. This mapping table is stored in internal DRAM. Previous types of mapping table is block level mapping [19] or hybrid scheme [17] because of limited internal DRAM size. However, those types have overhead which has to data copy and erase block. Nowadays, because the internal DRAM size is increased [29], the page level mapping can be used. Because of out-place-update feature, if the update requests which indicate physical location where data have already been written are coming, FTL has to give free PPN to write data and updates that PPN to mapping table. The previous page with old data changes valid to invalid state. These invalid pages will be erased by Garbage Collection (GC). It selects victim block by their policy and copies valid page to free block and erase victim block. In addition, because similar access pattern can consume a particular limited P/E cycle of NAND flash chip, wear leveling technique is used to evenly erase blocks. Thus, the FTL prolongs the lifetime of the SSD. B. Motivations 1) Subpage Write: As we have seen in Section 2.1, subpage requests are generated because of different host I/O unit and mis-aligned of sector address. The subpage writes ratio gradually is being increased because vendors increase the page size to improve bandwidth. Recent commercial SSDs use 8KB and 16KB page size [24][22]. The subpage write can cause program (write) disturbance problem which affects adjacent cells around programed cell because of different voltage between programed cell and adjacent cell [20]. Also, it increases the number of RMW [30] and causes internal fragmentation within a page. Even through the write buffer can reduce the number of subpage writes by updating new write requests in the write buffer, it cannot cover all subpage writes. Figure 2 shows the subpage write ratio with 64 MB write buffer. As page size is bigger, subpage write ratio is also increased. In 8KB and 16KB page size, the subpage write ratio is by up to 45%, 61% and 26%, 41% on average respectively. 2) Unnecessary RMW operations: Because the NAND flash chip does not allow in-place-update, the FTL has to handle update operation by RMW which have to read old 16KB 4KB 32KB 8KB 16KB 32KB Subpage writes ratio (%) 60 40 20 0 Fig. 2: Subpage Write Ratio New data Old data Free space PPN 100 PPN 100 Read old PPN 100 PPN 100 PPN 100 Modify new PPN 100 using old PPN 100 and Write to PPN 200 PPN 200 PPN 200 Update new PPN 100 (a) RMW (b) Unnecessary RMW Fig. 3: RMW and Unnecessary RMW operations. data before writing new request. It increase write latency. As subpage write ratio is increased, the number of RMW can be increased because particular subpage write which updates old data to new data can cause RMW. As we have seen in Section 2.1, because the page can be separated to sector unit, we can describe subpage write base on sector unit within a page. If sectors within a new subpage write can cover the old sectors that already be written in physical page, it is not necessary to read old data. This is called unnecessary RMW. Figure 3 shows normal RMW and unnecessary RMW cases. The page size is 2KB which has four sector (512 Byte) in this example. The new subpage update operation which indicates PPN 100 is coming, we have to read old data within PPN 100 and modify PPN 100 which size is changed to 2KB and write to free physical location which is PPN 200. However, in unnecessary RMW case, new PPN 100 data can cover old PPN 100 data. Thus, it does not necessary to read old data. We observe divers applications to see unnecessary RMW portion by whole subpage writes. Figure 4 shows this result. The diverse applications have an unnecessary RMW portion of almost 20% on average for different page sizes. The number of unnecessary RMW operations does not affect the page size. However, the EXC4 application has an unnecessary RMW portion of 42% in the worst case. III. R ELATED W ORK Many schemes have been proposed to increase lifetime and performance by reducing NAND writes and RMW overhead. Unnecessary RMW operations ratio (%) 8KB 80 Unnecessary RMW operations ratio (%) 4KB 4KB 60 40 20 0 60 8KB 16KB 32KB4KB 8KB 16KB 32KB 40 20 0 Fig. 4: Unnecessary RMW opearations ratio Previous studies can be separated into two approach. One is software [30][18][12] and the other is hardware [25][21]. In software approach, [30] proposed a new buffer replacement algorithm which exploits chip level parallelism in SSD to reduce RMW overhead. They modified the buffer replacement policy that looks up chip state whether is busy or not and chooses victim requests and allocates the new write request to non busy chip. [18] proposed a new buffer mechanism called partial page buffering to reduce the number of RMW operations. They only keep subpage writes in write buffer to make full page by updating. The full page writes are passed to write buffer and directly write to NAND flash chip. Thus, they can reduce the number of RMW by reducing subpage writes. [12] proposed sector log by managing subpage writes to reduce NAND writes. They managed the merged requests by adding a table for sector mapping, which stores different sectors within a page in a small part of the NAND flash chips. They merged non-contiguous sectors with different LPNs. Once the sector mapping table is full, they merged the different LPNs in the sector log mapping table. Thus, they can achieve the subpage writes and elapsed times. In hardware approach, modern SSD exploits data compression to reduce NAND flash writes to increase lifetime of SSD [10]. [25] proposed zFTL to reduce the number of NAND flash writes by compressing data. They used the page buffer in flash controller which size is same to page size. They added a hardware module to the controller to compress the data, which they managed by adding information to the mapping table to indicate to the location of the compressed data within a page. In addition, they included detail information of the compressed data by inserting the page state to physical page because the compressed data almost generate internal fragmentation within a page. When several writes are issued regardless of subpage or full page writes, they try to compress those requests to a physical page. On the other hand, the read request which points to compressed data must be decompressed. They can achieve the reducing NAND writes by increasing compression rate. However, they cannot solve the internal fragmentation within a page because the compressed data do not fit a sector unit. In addition, it predicts whether data can compressible or not and have compression overhead for almost write requests NAND Flash Chip NAND Flash Chip Multiplexed Bus Chip Internal Subpage Writes LPN 0 Page Buffer Full page write Full page write LPN 0, 1 LPN 2 LPN 2 Write Buffer Subpage Writes LPN 0 LPN 1 Full page write LPN 2 LPN 1 Write Buffer Sector Mapping Table LPN LPN 0 LPN 1 PPN 1 LPN PPN 124 ... 124 ... Page Mapping Table PPN PPN 0 ... 1 2 ... ... 0 1 ... PPN 1 Read PPN 0,1 (flush) (a) Sector Log LPN 83 NAND Flash Chips LPN LPN 0 One list LRU LPN 232 LPN 20 Data Shift (b) Partial Page Buffering LPN 1 Page State Mapping Table Page LPN PPN Flag State 0 0 1000 T 1 0 1110 T 2 1 1111 F ... ... ... ... NAND Flash Chips PPN 0 PPN 1 ... LPN 21 LPN 0, 1 LPN 1 ... ... 2 ... PPN Write Buffer New write requests Update LPN1 NAND Flash Chips Page Mapping Table ... ... PPN 0 ... 1 0 1 2 3 4 5 6 7 ... Merge Subpage writes LPN 1 NAND Flash Chip (for sector data region) ... 0 LSN PPN : Sector 1:0 0:1 0:2 0:3 NULL 1:1 1:2 1:3 ... LPN 0 (c) Subpage-based FTL Choose Victim LPN 0 ... Fig. 5: Comparison of sector log, partial page buffering, and our subpage-based FTL MRU LRU Merge fail Is subpage? Size aware LRU lists LPN 0 0.5K No Write full page No ... because it does not consider data size of write requests Merge and decompression 1Kwhen accesses data which can LPN 487 LPN 2 to merged LPN 0,1 increase the response time. [21] proposed multiple page size Data shift LPN 1 1.5K flash memory to address subpage write problem. The key idea of this scheme is LPN 232 to change the cell array in a die into LPN 77 ... 2K asymmetric cell array which has small page size and large page size. These different page size also has different write latency. Page State Mapping Table NAND Flash Chips The Victim requests small page size area has Page lower write latency then large LPN 2 LPN PPN Flag State page size. By writing subpage write request to PPN 0 small page size, 0 0 1000 T 1 0 1110 T they can reduce write latency. However, the implementation is PPN 1 LPN 0, 1 2 1 0001 F difficult for vendors because they have to modify the peripheral ... ... ... ... part in NAND flash chip. Moreover, because limited capacity Unnecessary RMW of small page(LPN2 covers PPN2) size, they require the migration overhead to large Merged write Write Buffer Recover page size region. (1110) LPN 2 LPN 1,2 LPN 1,0 LPN 1 Among prior studies, the sector log and partial pageLPN 0 buffer0 1 2 3 0 1 2 3 ing (PPB) schemes address the subpage write problem. The Page State Mapping Table Page State Mapping Table Page sector log and PBB modify the FTL LPN and PPN write buffer Flag respecPage State Flag State scheme modifies both. Figure 5 shows a tively.LPN Our PPN proposed 0 0 0011 T Read 0 NULL 0000 F overview and1 comparison of sector1 log,1 0 partial pageTbuffering, 1000 LPN 2 1 1000 T 1001 1 T 1110 and our...2 proposed scheme to show 2difference amongT them. We ... ... ... 1 1110 assume that page size is 2KB and flash ...the number ... ...of NAND ... NAND Flash Chips NAND Flash Chips chip is four inPPN 0 this figure. The sector log uses a sector mapping PPN 0 writes are stored table to manage subpage writes. The subpage PPN 1 in sector data region which is a small partPPN 1 of NAND flash chip in SSD. The full page writes useInvalid page mapping table and store the others NAND flash chips which called page data region. When subpage writes are issued, sector log merges subpage writes into full page in page buffer. Because sector log can merge non contiguous sectors within a different page, it can reduce the number of NAND writes. However, because sector log uses limited small part of NAND flash chip for sector data region, it has to flush the sector data region. The flush operation is that evicts merged physical page and assembles sectors base on same LPN and write to page data region. The assembly process is that reads different physical pages which are scattered sectors in sector data region. Figure 5a shows this process. To generate LPN 0, sector log reads PPN 0 and PPN 1 in sector data region. Because the sector log. Because the ... ... subpage writes split into sector unitYesand are written Yes in sector Is under pair data region, even though the subpage write request with same exist? No Is not RMW? Is full page? LPN is coming, the Yes data can be stored in different physical page. This can incur read response time which is called read Yes Move to under No pair overhead of sector log. In this example, before flush operation No Setting virtual Is pair exist? occurs, when read request which indicates LPN 0pair is coming, Yes pages which are PPN 0 the sector log must read two physical Yes No and PPN 1. No Is next pair Is not RMW? exist? The PBB modified the write buffer. The subpage writes are buffered in the write buffer whereasYesthe full page writes are Merge victim directly written in NAND flash and pair chips. In the write buffer, the subpage writes can be updated and can generate full page writes. Size aware LRU lists When the write buffer is full, PPB searches full page Victim writes0.5Kby LPN 548 LRU policy. is no full page writs in LPN 0 write LPN 0 If there LPN 0 1 RMW Victim (Full) request. Figure buffer, the subpage writes is selected as victim Virtual Pair LPN 487the LPN 123 Victim subpage writes, 1K 5b illustrates PPB scheme. By buffering LPN 0 LPN 0 or Under Pair 2 RMW (Sub) the PPB can reduce subpage writes which can generate RMW LPN 2 LPN 42 LPN 1 1.5K Pair operation in NAND flash chips. Thus, they LPN 1the LPN 1reduce Pair can 3 RMW number ofLPN 77 subpageLPN 232 writesNext Pair and write response time by reducing 2K the number of RMW. However, because the PPB does not buffer full page writes, it generates invalid page when the full page writes are updated. Moreover, if the write buffer does not has the full page writes, the subpage writes are written in NAND flash chips. Figure 5c shows our proposed subpage-based FTL. Our proposed subpage-based FTL exploits the write buffer. The write buffer attempts a full or sub merge which merged size is full page or subpage respectively. To present merged page request, we added page state information which indicates sector data within a page to mapping table. The page state can also detect unnecessary RMW because it can lookup stored data in physical page based on sector unit. In comparison with sector log scheme, our proposed scheme does not use a particular NAND flash chip for merged page. The merged page can be written to all NAND flash chips and managed by page state mapping table. This means that there is no flush overhead of sector log. Moreover, because our proposed scheme attempts a full or sub merge in write buffer by searching subpage writes, Page Mapping Table Size aware LRU lists LPN 0 0.5K LPN 487 Write Buffer 1K New write requests LPN 487 1K LPN 21 LPN 20 1.5K LPN 232 LPN PPN LPN 0, 1 LPN 0, 1 No Is under pair exist? Page 0 0 ...1 ... LPN PPN No PPN 1 State (1110) 1 ... 0001 F ... ... Merged write Page Page notPPNsplit.State Thus,Flag our PPN 0 PPN 1 PPN 1 LPN 548 Size aware LRU lists 0.5K 1K LPN 548 1K 0.5K LPN 548 2K LPN 487 1.5K 1K LPN 42 2K LPN 42 1.5KLPN 77 Yes 2K and pair Victim 1 RMW (Full) Victim LPN 2 LPN 2 LPN 123 LPN 1 LPN 1 Virtual Pair or Under Pair LPN 232 (Full) LPN 1 Next Pair Victim LPN 0 2 RMW Victim Pair Victim 3 LPN 0 LPN 0 LPN 0 RMW 1 RMW 2 RMW Next Pair(Full)(Sub) Pair Victim LPN 0 LPN 0 LPN 1 Pair 2 RMW 3 (Sub) RMW Pair Pair RMW LPN 1 LPN 0 LPN 0 LPN 0 LPN 0 LPN 1 LPN 1 LPN 0 (Sub) Pair Virtual Pair or Under Pair Victim LPN 232 LPN 77 LPN 232 LPN 2 Virtual Pair Victim or Under Pair 1 RMW 3 LPN 77 Yes Yes Victim LPN 0 pair No and pair LPN 123 LPN 42 pair Merge victim LPN 0 LPN 123 No Setting virtual Yes Setting virtual Setting virtual Merge victim pair Fig. 7: Merging Control Flow Merge victim 1.5K Size aware LRU lists LPN 487 No and pair Is not RMW? LPN 0 LPN 487 Is full page? LPN 0 LPN 1 LPN 1 Next Pair ... ... Flag 0 merged 0 0011 unlike T schemeLPNdoesPPNnot have read overheadMerged write of page State Page State Mapping Table Recover Read LPN 1,0 1000 0 NULL 0000 F with PPB, Page sector log. In comparison our scheme can achieve 1 T 1 0 Page State Mapping Table (1110) LPN 2 LPN 1,2 LPN 2 PPN Flag 1100 1 1 1000 LPN T State 2 1 NAND T 1110 high reduction writes andPage State Mapping Table write by 2 latency 1 1110merging T Page PN PPN Flag 0 ... 0 0011 T ... ... State Page ... RMW. ... ... scheme ... subpage writes and...eliminate unnecessary Our Page State Mapping Table LPN PPN Flag 1000 State 0 NULL 0000 F NAND Flash Chips 1 T addition, because 1 0 NAND Flash Chips Page canLPNreduce number of NAND writes. In 1100 1 1 1000 T PPN the Flag 0 0 0011 T State PPN 0 F 2 2 1 Read T NULL 0000 1110 PPN 0 we 0eliminate unnecessary RMW, our scheme can decrease 1 1110 T 1000 1 T 1 0 LPN 2 .. ... ... 1 ... 1 1100 1000 T ... ... ... ... write response time and the number of NAND reads. Even 2 1 1110 PPN 1 T 2 1 1110 PPN 1T ... ... ... table ... NAND Flash Chips Invalid though mapping sizeNAND Flash Chips is increased, it can be... acceptable ... ... ... current DRAM size. PPN 0 NAND Flash Chips PPN 0 base onNAND Flash Chips Yes Is full page? Is not RMW? Is not RMW? No No Is full page? Yes Is pair exist? 0.5K Page State Mapping Table Recover LPN 1,0 Page State Mapping Table Unnecessary RMW LPN (1110) theLPN 1,2 data of subpage are written entirely (LPN2 covers PPN2) No No Is next pair exist? Size aware LRU lists PPN 1 ... 2 ... LPN 1,2 Yes No Is not RMW? Is next pair Yes exist? No Yes Is pair exist? Yes Yes Write full page Write full page Is pair exist? Yes PPN 0 Merged write Yes No No exist? 0 0 1000 T Recover LPN 1,0 FTL Overall Architecture of Subpage-based 1 0 1110 T No Is subpage? No Yes Move to under Is next pair pair No Is not RMW? Is not RMW? Yes Yes PPN 0 1000 T 1110 T LPN 77 0001 PPN 0 F ... ... Flag Choose Victim No No Yes Yes Is under pair exist? PPN 1 Write full page Yes Move to under pair Yes Merge fail PPN 0 Yes Is under pair exist? Move to under pair No PPN Flag Data shift State NAND Flash Chips No No Is subpage? ... Unnecessary RMW Fig. 6: (LPN2 covers PPN2) LPN 0, 1 LPN 2 0 Page 1 Flag LPN 232 State 2 2K ...T 1000 Is subpage? Merge fail LPN 0 1110 T PPN 1 Page State Mapping Table Unnecessary RMW 0001 F NAND Flash Chips (LPN2 covers PPN2) ... ... Page LPN 2 Merge fail 2 ... 0 0 1 0 2 1 Victim requests ... ... Metadata LPN 77 ... PPN 0 1 2 ... LRU NAND Flash Chips Page Flag State 0 1000 T 0 1110 Choose Victim T Choose Victim 1 1111 F ... ... ... LPN PPN 1 Merge LPN 487 LPN 2 Page State Mapping Table NAND Flash Chips LPN 0,1 LPN 2 LPN T T F ... LRU LPN 1 Page State Mapping Table 1.5K LPN 2 1 2 1 ... Page State Mapping Table PPN 0 LPN 77 ... Victim requests 1K im requests Data shift LPN 1 ... LPN 0,1 LPN 232 Size aware LRU lists 2K LPN 0 0.5K 2K 0 0 1 ... LPN 0,1 Merge LPN 1 LPN 83 Flag Merge LPN 2 MRU Data shift LPN 1 1.5K One list LRU LPN 2 LPN 232 IDX ... ... NAND Flash Chips PPN 0 1 2 LPN 0 ... ... 1 LPN 20 PN 2 LPN ... NAND Flash Chips 0 1:0 Page Mapping Table 1 0:1 PPN 1 PPN Write Buffer LPN 2 0:2 Write Buffer PPN 124 3 0:3 New write requests ... ... One list LRU 4 NULL 2 124 One list LRU LPN 20 LPN 21 LPN 232 LPN 83 LPN 1 5 1:1 ... ... ... LPN 232 LPN 0 6 1:2 LPN 83 LPN 0 ...MRU LPN 1 7 1:3 ... ... MRU LRU Read PPN 0,1 Size aware LRU lists (flush) LPN 0 0.5K PPN 0 PPN 1 IV.Invalid S UBPAGE - BASED PPN 1 FTL ... ... A. Merge Subpage ... ... Invalid We proposed subpage-based FTL to reduce the number of subpage writes and eliminate unnecessary RMW. The overall architecture is illustrated in Figure 6. The proposed FTL reduces the number of subpage write operations by merging two subpage writes with different LPN in the write buffer. We added size aware LRU lists to the one conventional LRU list in the write buffer. When new write requests are issued, these requests enqueue the one list and the size aware list depending on the size of the requests. Note that we only added data structure for size aware LRU lists. The data in node will be stored only one time. The size aware lists can be applied to conventional buffer policy like LRU. We increased the searching time by implementing a red-black tree for searching write buffer data like [9]. The data structure of the write buffer node has two pointers. The first is the one list LRU and the other is the size aware lists. These pointer are used to update in write buffer. Because the data of updated node will be changed. Thus, we have to move changed data node to size lists. The chosen victim is LRU, but if the victim node size is subpage, .. 2 ... rite requests Data Shift 2 ... 0 LPN 1 Metadata Metadata ... 6 1:2 1:2 LPN 0 LPN 0 7 1:3 1:3 Sector Mapping Table ... ... ... NAND Flash Chip ... Read PPN 0,1 Read PPN 0,1 PPN : (for sector data region) (flush) LPN LSN(flush) Sector PPN 0 6 7 ... (a) Merging Example (b) Three types of RMW Fig. 8: Merging example and three types of RMW which can change the size it attempts to merge this subpage with a different subpage to generate a full page. Figure 7 shows the control flow of our merging algorithm, which attempts two types of merge. The first type is a full merge of which the merged size is a full page. If the full merge fails, it tries to perform a sub merge of which the merged size is a subpage. Our merging algorithm searches a pair list to create a full page. Before searching the pair list, we check whether the pair contains sufficient subpage writes to merge. If not, we have to check the victim node is RMW or not. Because the RMW can change data size by reading old data. If the RMW generates full page, it will be directly writes to NAND flash chips. Even though the RMW retains the old data, some case cannot generate full page. In this case, we find virtual pair which is a pair of increased victim size after the RMW. After then, we also check whether the pair list node is RMW or not. If the node in pair list is the RMW, then 0 0 1 ... 1000 1110 0001 ... T T F ... Yes PPN 1 Merge victim and pair ... 0 1 2 ... LPN 0, 1 Unnecessary RMW (LPN2 covers PPN2) Recover (1110) LPN 2 0 1 2 3 Merged write Write Buffer LPN 1,0 LPN 1 LPN 1,2 LPN 0 0 1 2 3 Page State Mapping Table Page State Mapping Table Read LPN 2 Size aware LRU lists LPN PPN 0 1 2 ... NULL 1 1 ... Page State 0000 1000 1110 ... LPN PPN Flag F T T ... NAND Flash Chips 0 0 1 1 0 2 ... Page State Flag 0011 T 1 1000 1100 1110 T ... ... ... T NAND Flash Chips PPN 0 PPN 0 PPN 1 PPN 1 ... Invalid ... (a) Read (b) Write Fig. 9: Read and write example in page state mapping table we move to next node. If we failed to merge full page, our merging algorithm attempts to sub merge which merged size is subpage to reduce the number of NAND writes. The sub merge moves the pair to under pair which data size is less than pair. After then, we keep up searching buffer nodes in under pair until satisfying conditions. When the merge succeeds, because the location of sectors within a page is not fit between victim node and pair node, we have to shift the victim sectors to left and the pair sectors to right. The original sector states in each node will be represented to mapping table. Figure 8 shows the merging example and three types of RMW related to victim and pair. We assume that the page size is 2KB. The write buffer shows the victim is LPN 0 which size is 0.5KB. If victim is RMW and changes the size as shown Figure 8b (1), the victim node will be directly write to NAND flash chips because of full page. Figure 8b (2) shows that the victim with the RMW cannot create full page case. In this case, we find virtual pair which is LPN 123 because the size of victim after RMW is 1KB. If victim is not RMW, the first pair node is LPN 1 which size is 1.5KB because the sum of victim and pair node can generate full page. However, in Figure 8b (3), the pair node also can be RMW. If pair node is RMW, we move to next pair which is LPN 2. If we failed to search all pair nodes to generate full page, we try to perform sub merge by moving under pair which size is 1KB. B. Page State Mapping Table To present merged write requests in NAND flash chip, we implement page state mapping table. We add size information to show stored data state within a physical page based on sector unit. we present the page state as bitmap. The one bit means sector unit (512 Bytes) within a page. The meaning of setting one bit to 1 within the page state is that sector unit has the data. On the other hand, the value is 0 means that the sector unit is empty. If the page size is 8KB, it can be separated to 16 bits based on sector unit. Thus, it can be increased depending on page size. This page state information provide partially stored data within a page. Thus, the page Victim state detects unnecessary RMW because 1 we canLPN 0 see stored LPN 0 0.5K LPN 548 LPN 0 RMW Victim (Full) data state within a physical page. When update occurs in Virtual Pairthe old Victim LPN 123to invalidate LPN 487 merged1Kpage, we have data. Because the LPN 0 LPN 0 or Under Pair 2 RMW remaining half merged data are still available, (Sub) we add valid LPN 2 LPN 42 LPN 1 Pair state. 1.5K It expresses the valid state within a Pair merged LPN 1 LPN 1 physical 3 RMW the required LPN 77 we LPN 232 page. Because merge two subpage writes, Next Pair 2K memory of valid state is two bits. Each bit presents valid state of each subpage. The garbage collection can change PPN by using the physical page information to erase blocks [31]. Thus, each PPN is linked to LPN. We add each merged LPNs to the physical page information. This information helps the garbage collection process by directly accessing linked LPNs and changing PPNs. To check that indicated PPN is merged page, we add merge flag which is one bit. It can check whether the PPN is merged or not and determine whether decoding which is data copy when read operation occurs or not. Figure 9 shows read and write process in page state mapping table. We assume that the page size is 2KB which can separated to four sector unit. The read request can directly access the mapping table using LPN to find PPN as shown in Figure 9a. The LPN 2 indicates PPN 1 in mapping table. However, the PPN 1 is merged page. Because we cannot partially read data within a page, we have to read whole page even though useless data is included. The merged data from PPN 2 move to temporally buffer in DRAM. Because the physical page information have each LPNs (1,2) in merged PPN 1, we can identify which sectors within PPN 2 indicate the LPN 2. Also, the page state presents the sectors of requested LPN which is different from actually stored sectors in physical page. Thus, because the read request requires sectors 1 to 3 within the PPN 1, we have to copy the sectors to the read buffer base on the page state which is sectors 0 to 2 (1110) in LPN 2 entry. This process is decoding when we access merged page to read the data. Figure 9b shows write process in merged page. As the process of decoding in read operation, the selected two subpage writes are shifted left and right respectively and stored in temporally buffer. After then it can be write to physical page. This process is encoding in merged write. The merged write will be indicated free physical page which is PPN 0. The LPN 0, 1 change the PPN to 0, flag to 1, and the page state information by using requested LPNs. However, because the LPN 1 is already stored in PPN 1, we also update physical page information by changing the valid state 11 into 01. However, the PPN 1 has valid sectors because the LPN 2 is available. Thus, the LPN 2 entry in mapping table does not change. V. E XPERIMENT We evaluated our proposed scheme using a trace-based simulator which is a cycle accurate NANDFlashSim [15]. We also implemented FTL above multiple NANDFlashSim instances using OpenSSD [31]. The baseline is page level mapping policy and an LRU policy write buffer. The scheduling scheme is FIFO, which is a basic SSD scheduler. The simulation configuration is shown in Table I. We used NAND flash chip configuration with Multi Level Cell (MLC) which has variable Chip configuration Page size 4KB ∼ 32KB pages per block 128 blocks per plane 2048 planes per die 2 dies per chip 2 Read Latency 250us (average) Program Latency 1.3ms (average) Erase Latency 1.5ms (average) SSD configuration chips per channel 8 channels 8 write buffer size 64 MB TABLE I: System configuration Description Searching one node encoding / decoding one sector Latency 1297 cycles 482 cycles TABLE III: Overhead Latency • • nodes. MergeWindow - Full and sub merge with 100 , 200 window size respectively. MergeOptimal - Full and sub merge by searching all buffer nodes. A. NAND Writes BS1 BS2 CFS1 CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 The number of I/Os Read Write 238,329 207,212 123,155 314,188 116,811 39,300 115,523 40,511 795,251 234,275 1,167,838 159,079 97,623 330,123 6,712,172 438,967 501,197 489,467 419,229 577,053 267,468 667,767 22,184 93,743 17,956 118,580 Average size (Bytes) Read Write 12,187 15,305 15,991 28,542 9,307 14,181 8,562 13,870 26,876 20,734 15,100 11,127 15,959 20,921 8,553 18,724 12,392 20,125 11,414 17,939 16,165 12,135 10,846 7,677 15,023 8,019 TABLE II: The characteristic of traces latency [28]. The basic configuration is 8KB page size and 64MB write buffer. In our experiment, we employed the Microsoft production server and enterprise traces which have different characteristic [1] which is shown in Table II. Because write request size determines the number of subpage writes, we used different write request size. We also used different read and write requests ratio traces. Our traces consist of build server (BS), storage file server (CFS), development tools release (DTR), exchange server (EXC), and RADIUS authentication server (RA). To measure overhead, we exploit gem5 [4] which is a cycle accurate and integrated DRAMSim2 [26]. We run the searching node in write buffer and the memory copy for encoding and decoding part to gem5 simulator as [7]. The simulated processor and DRAM is the ARM process [8] with 400 MHz frequency and LPDDR2 respectively. These configurations are similar to the commercial SSD [29]. The overhead latency is shown in Table III. The searching one node means that comparing victim node and pair node by look up mapping table to check merging conditions. The encoding and decoding one sector mean that comparing requested sector state and page state in mapping table and copy one sector. The performance of our proposed scheme can be affected by merging policy. To measure the performance of different merging policy, we evaluated the following three buffer schemes. • Baseline - basic LRU buffer policy. • MergeFull - Only Full merge by searching all buffer Figure 10 shows the number of NAND writes with the different page sizes and breakdown based on full and subpage. The Figure from top to bottom indicates 4KB, 8KB, 16KB, and 32KB page size respectively. In case of 4KB page size, the subpage write ratio is low which means most of incoming write requests generate full page writes. The chance of merging is low in 4KB page size because our merging schemes try to merge two subpage writes when victim is subpage write. Even though the NAND writes is few reduced by our schemes in some traces (BS, RA, and DTR), our schemes can achieve the reduction NAND writes almost 6% on average in 4KB page size. Because the subpage write ratio is increased than 4KB page size, most of traces can achieve the reduction NAND writes by using our schemes. Also, it can reduce most subpage writes which can cause internal fragmentation and endurance problem less than 2%. Our schemes can reduce the NAND writes up to 23% and 13% on average. We clearly see the difference of MergeFull and MergeWindow schemes in 16KB page size. Because MergingFull scheme only searches pair nodes for full merging, this scheme is failed to merge when victim cannot find satisfied pair node. Thus, even though subpage write ratio is higher than 8KB page size, most of traces cannot remove subpage writes as 8KB page size. However, if we exploit sub merge schemes which are MergeWindow and MergeOptimal, we can merge two subpage writes by searching more nodes. The number of NAND writes is reduced by up to 30% and 19% on average in 16KB page size. The result of 32KB page size shows a distinct difference between MergeWindow and MergeOptimal rather than 16KB page size. The searching count for merging is more increased in 32KB page size than 16KB page size. Even though MergeOptimal scheme more reduces the NAND writes than MergeWindow scheme in some traces, the difference between two schemes is few in average. Furthermore, we have to limit the number of searching count which is MergeWindow scheme because of overhead. The number of NAND writes is reduced by up to 36% and 24% on average. B. Response Time Figure 11 shows the write response time and its break- down which are NAND write, RMW, and overhead with different page size. The overhead includes searching in write buffer and memory copy to generate full and sub page when merge Normalized Writes Normalized Writes Normalized Writes Normalized writes Normalized Read Normalized Write (8KB) Normalized Writes (32KB) Normalized Writes (16KB) Normalized Writes (4KB) Normalized writes Normalized Write Normalized Read Normalized Write Response time (8K) (8KB) (32KB) Response Time (8KB) (16KB) (4KB) Normalized Write Response time (8K) Response Time (16KB) Response Time (8KB) Response Time (16KB) Fullpage write 1 0.8 0.6 0.4 0.2 0 Fullpage write Subpage write Subpage write B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O 1 BS1 BS2 CFS1 CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 Average 0.8 0.6 1 0.40.8 0.20.6 00.4 0.2 B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O 0 BBS1 F W O BBS2 F W O BCFS1 F W O BCFS2 F W O BDTR1 F W O BDTR2 F W O BEXC1 F W O BEXC2 F W O BEXC3 F W O BEXC4 F W O BEXC5 F W O B RA1 F W O B RA2 F W O Average B F W O 1 0.8 BS1 BS2 CFS1 CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 Average 0.6 1 0.40.8 0.20.6 00.4 0.2 B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O 0 BBS1 F W O BBS2 F W O BCFS1 F W O BCFS2 F W O BDTR1 F W O BDTR2 F W O BEXC1 F W O BEXC2 F W O BEXC3 F W O BEXC4 F W O BEXC5 F W O B RA1 F W O B RA2 F W O Average B F W O 1 0.8 BS1 BS2 CFS1 CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 Average 0.6 0.4 1 0.20.8 00.6 0.4 B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O 0.2 0 BS1 BS2 CFS1 CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 Average B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O 1 0.8 BS1 BS2 CFS1 CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 Average 0.6 0.4 0.2 0 NAND operation RMW Overhead Fig. 10:1.1Normalized the (full and B F W O B F W O Bnumber F W O BofF NAND W O B F writes W O B Fbreakdown W O B F W O B F and W O subpage) B F W O B with F W Odifferent B F W O page B F W size. O B FB, W F, O BW, F W O O indicate Baseline, MergeFull, and MergeOptiaml. 1 BS1 BS2 CFS1MergeWindow, CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 Average 0.9 0.8 0.7 NAND operation RMW Overhead B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O 1.1 1 BS1 BS2 CFS1 CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 Average 0.9 0.81.1 0.7 1 B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O 0.9 BS1 BS2 CFS1 CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 Average 0.8 1.10.7 B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O 1 0.9 BS1 BS2 CFS1 CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 Average 0.8 0.7 NAND Read 1.02 1.01 AverageDecoding B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O BS1 BS2 CFS1 CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 Average 1 Fig. 11: Normalized write response time and its breakdown are NAND operation, RMW (in write case), and overhead. NAND Read which AverageDecoding 0.99 B, F, 1.02 W, O indicates Baseline, MergeFull, MergeWindow, and MergeOptimal 0.98 1.01 1 B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O B F W O BS1 BS2 CFS1 CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 Average Normalized Writes Normalized Writes two subpage writes is succeed. Because 8KB, and 16KB page write response time up to 14% and 8% on average. Because 0.99 size result is similar to 4KB, 32KB page size respectively, the searching count is increased to successfully merge in 0.98 we show 8KB and Our page B F W O B16KB F W O page B F Wsize O B result. F W O B F Wproposed O B F W OFTL B F W 16KB O B F W O B size, F W O the B F overhead W O B F Wwhich O B F searches W O B F Wbuffer O B F node W O is exploits pageBS1state mapping table which keeps asEXC3 shown EXC4 in below FigureRA111. WeRA2can see differBS2 CFS1 CFS2 DTR1 sectors DTR2 data EXC1 increased EXC2 EXC5 Average within a page. Because this page state information cap help ence between MergeWindow and MeregeFull, MergeOptimal Fullpage Write Subpage Write to1 detect unnecessary RMW. Thus, the write response time schemes. Even though our FTL reduces write response time 0.8 is reduced. Because we merge two subpage writes which by eliminating unnecessary RMW, because the MergeFull and 0.6 0.4 have different LPNs which can indicate same NAND flash MergeOptimal schemes searches all buffer node to merge two 0.2 Fullpage Write Subpage Write chip in write buffer, we can reduce average NAND writes. subpage writes, the overall write response time is more in0 1 B searching W B W B W B Wcount B W B Wto B Wsuccess BWBWBW B W Bmerge W B W B subpage W B W B W B writes W B W B WisB W Bcreased W B W B Wthan B W B baseline W B W B W Bin W BDTR W B Wtraces. B W B W BFurthermore, WBWBWBWBW B W Bthough WBWBWBW The full even 0.8 32 64 in 1284KB, 32 64 8KB 128 32page 64 128 32 64 128 32 128 32 64 128 32 64 128 32 64 128 32 64searches 128 32 64 all 128 buffer 32 64 128 32 64to128reduce 32 64 128 32 64 128 0.6not high size. Thus, the64MergingWindow these schemes node NAND 0.4 BS1 BS2 CFS1 CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 Average has similar reduction of write response, comparing with the writes, the reduction NAND writes is slight, comparing with 0.2 0other schemes except EXC2 trace. Our FTL can reduce the MergeWindow scheme as we shown in Figure 10. Thus, the BWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBWBW 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 32 64 128 BS1 BS2 CFS1 CFS2 DTR1 DTR2 EXC1 EXC2 EXC3 EXC4 EXC5 RA1 RA2 Average MergeWindow scheme has better performance than MergeFull and MergeOptimal. The write response time is reduced by up to 16% and 11% on average. Even though merge schemes can achieve the reduction of write response time on average, it can increase a particular write request because of the merge fail which is called worst case. The response time of this case is more increased as we keep searching buffer node to merge subpage writes even though the probability of merging is low. Therefore, MergeWindow can reduce worst case response time more than half MergeFull and MergeOpmtial schemes. Our proposed FTL can increase the read response time by decoding sectors within a page when the request accesses merged page. We also experimented the read response time related to decoding overhead. However, the decoding overhead in read request is slight because this overhead only occurs when the requests accesses to merged physical page. the read response time is increased by the decoding overhead up to 0.2% and 0.1% on average. Also, the different page size not affect decoding overhead because it is unrelated to subpage writes ratio. Therefore, the different page size results are similar read response time. Even though, in worst case, the read response time is increased 5% maximum and 4% on average VI. C ONCLUSIONS Because of limited P/E cycle characteristic of NAND flash chips which are main component of the SSD, the reducing NAND writes improve the lifetime of SSD. Nowadays, the vendors increase the page size which is minimum I/O unit in NAND flash chip to increase throughput. However, because current host system uses sector unit which size is less than page units, the number of subpage write requests is increased. Furthermore, because NAND flash chips are not allowed in place update, the RMW which can increase the write response time is necessary when update operation occurs in subpage write. The subpage writes cause internal fragmentation, endurance problem which is program disturbance, and increase the number of RMW. In this paper, we propose subpage-based FTL to improve lifetime and performance. The subpage-based FTL can reduce subpage writes by merging two subpage writes in write buffer and eliminate unnecessary RMW by using page state mapping table which keeps stored requested sectors information within a page. According to our experiment, subpage-based FTL can reduce subpage writes almost less than 2% and the number of NAND writes up to 23% and 13% on average. Also, The subpage-based FTL can reduce the write response time up to 13% and 8% on average. R EFERENCES [1] S. repository. http://iotta.snia.org/. [2] M. Abraham. Nand flash architecture and specification trends. Flash Memory Summit, 2012. [3] N. Agrawal, V. Prabhakaran, T. Wobber, J. D. Davis, M. S. Manasse, and R. Panigrahy. Design tradeoffs for ssd performance. In USENIX Annual Technical Conference, pages 57–70, 2008. [4] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, et al. The gem5 simulator. ACM SIGARCH Computer Architecture News, 39(2):1–7, 2011. [5] A. Birrell, M. Isard, C. Thacker, and T. Wobber. A design for highperformance flash disks. ACM SIGOPS Operating Systems Review, 41(2):88–93, 2007. [6] F. Chen, R. Lee, and X. Zhang. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on, pages 266–277. IEEE, 2011. [7] F. Chen, T. Luo, and X. Zhang. Caftl: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In FAST, volume 11, 2011. [8] F. A. Endo, D. Couroussé, and H.-P. Charles. Micro-architectural simulation of embedded core heterogeneity with gem5 and mcpat. In Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, page 7. ACM, 2015. [9] Y. Hu, H. Jiang, D. Feng, L. Tian, H. Luo, and S. Zhang. Performance impact and interplay of ssd parallelism through advanced commands, allocation strategy and data granularity. In Proceedings of the international conference on Supercomputing, pages 96–107. ACM, 2011. [10] Intel. Data compression in the intel solid-state drive 520 series technical brief. http://www.intel.com/content/www/us/en/solid-state-drives/ solid-state-drives-ssd.html. [11] Intel. Partition alignment of intel ssds for achieving maximum performance and endurance. http://www.intel.ph/content/dam/www/public/us/ en/documents/technology-briefs/ssd-partition-alignment-tech-brief.pdf. [12] S. Jin, J. Kim, J. Kim, J. Huh, and S. Maeng. Sector log: fine-grained storage management for solid state drives. In Proceedings of the 2011 ACM Symposium on Applied Computing, pages 360–367. ACM, 2011. [13] H. Jo, J. U. Kang, S. Y. Park, J. S. Kim, and J. Lee. Fab: flashaware buffer management policy for portable media players. Consumer Electronics, IEEE Transactions on, 52(2):485–493, 2006. [14] M. Jung and M. T. Kandemir. Sprinkler: Maximizing resource utilization in many-chip solid state disks. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on, pages 524–535. IEEE, 2014. [15] M. Jung, E. H. Wilson III, D. Donofrio, J. Shalf, and M. T. Kandemir. Nandflashsim: Intrinsic latency variation aware nand flash memory system modeling and simulation at microarchitecture level. In Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium on, pages 1–12. IEEE, 2012. [16] M. Jung, E. H. Wilson III, and M. Kandemir. Physically addressed queueing (paq): Improving parallelism in solid state disks. In Computer Architecture (ISCA), 2012 39th Annual International Symposium on, pages 404–415. IEEE, 2012. [17] J.-U. Kang, H. Jo, J.-S. Kim, and J. Lee. A superblock-based flash translation layer for nand flash memory. In Proceedings of the 6th ACM & IEEE International conference on Embedded software, pages 161– 170. ACM, 2006. [18] D. Kim and S. Kang. Partial page buffering for consumer devices with flash storage. In Consumer Electronics Berlin (ICCE-Berlin), 2013. ICCEBerlin 2013. IEEE Third International Conference on, pages 177– 180. IEEE, 2013. [19] J. Kim, J. M. Kim, S. H. Noh, S. L. Min, and Y. Cho. A space-efficient flash translation layer for compactflash systems. Consumer Electronics, IEEE Transactions on, 48(2):366–375, 2002. [20] J.-H. Kim, S.-H. Kim, and J.-S. Kim. Subpage programming for extending the lifetime of nand flash memory. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, pages 555–560. EDA Consortium, 2015. [21] J.-Y. Kim, S.-H. Park, H. Seo, K.-W. Song, S. Yoon, and E.-Y. Chung. Nand flash memory with multiple page sizes for high-performance storage devices. [22] Micron. M500 2.5-inch sata nand flash ssd datasheet. http://www.crucial. com/usa/en/storage-ssd-m500. [23] Micron. An overview of ssd write caching. http://www.micron.com. [24] Micron. P420m 2.5-inch pcie nand flash ssd datasheet. https://www. micron.com/products/datasheets/. [25] Y. Park and J.-S. Kim. zftl: power-efficient data compression support for nand flash-based consumer electronics devices. Consumer Electronics, IEEE Transactions on, 57(3):1148–1156, 2011. [26] P. Rosenfeld, E. Cooper-Balis, and B. Jacob. Dramsim2: A cycle accurate memory system simulator. Computer Architecture Letters, 10(1):16–19, 2011. [27] Samsung. 1g x 8 bit / 2g x 8 bit / 4g x 8 bit nand flash memory k9xxg08uxa datasheet. http://www.samsung.com/Products/ Semiconductor. [28] Samsung. 32gb a-die mlc nand flash datasheet k9gbg08u0a-m. http: //www.samsung.com/uk/business/insights/datasheet/. [29] Samsung. Samsung ssd 850 pro datasheet. http://www.samsung.com/ global/business/semiconductor/minisite/SSD/global/html/ssd850pro/ specifications.html. [30] J. Seol, H. Shim, J. Kim, and S. Maeng. A buffer replacement algorithm exploiting multi-chip parallelism in solid state disks. In Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, pages 137–146. ACM, 2009. [31] Y. Song, S. Jung, S. Lee, and J. Kim. Cosmos openssd: A pcie-based open source ssd platform. Flash Memory Summit, 2014. [32] G. Wu and X. He. Delta-ftl: improving ssd lifetime via exploiting content locality. In Proceedings of the 7th ACM european conference on Computer Systems, pages 253–266. ACM, 2012.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement