MTD | F8 | CELFJamboree30-UBIFS_update

Evaluation of UBI and UBIFS
TOSHIBA CORPORATION
Core Technology Center
Embedded System Technology Development Dept.
UWATOKO Katsuki
Oct 2, 2009
Copyright 2009, Toshiba Corporation.
Agenda
• Background
• Topic
1. Boot time
2. Flash space overhead
3. UBI scrubbing on preempt kernel
4. Sub-page write verify
5. Error handling
• Summary
• References
2
Background
• Large size NAND flash memory is commonly used in embedded
systems & Flash File systems for Larger size NAND are necessary.
• UBI : Unsorted Block Images
(Artem Bityutskiy, Adrian Hunter)
– Mainlined in 2.6.22 in Jul 2007.
– Works on top of UBI volumes.
– Write-back, Compression, Journal, …
• UBIFS : Unsorted Block Image File System (Artem Bityutskiy)
– Mainlined in 2.6.27 in Oct 2008.
– Works on top of MTD partition.
– Logical Volume, Global Wear-leveling, Scrubbing, Journal, …
⇒ UBI/UBIFS is one of the best file system. But one would run into some
issues when evaluating it. This reports those issues.
3
Background
System Call I/F
VFS
ext2 / FAT
JFFS2
YAFFS2
LogFS
UBIFS
UBI
Block
Device
HDD
MTD device, MTD API
NAND
NOR
DataFlash
AG-AND
OneNAND
ECC’d NOR
Flash memory
VFS: Virtual File System
MTD: Memory Technology Device
4
Background
• Software
– Linux kernel : Vanilla kernel 2.6.20.19 + Original patch for embedded systems
– UBIFS / UBI : Vanilla kernel 2.6.30
– MTD
: Vanilla kernel 2.6.20.19
– MTD NAND driver : txx9ndfmc (Original patch for the product’s CPU)
• Hardware
NAND chip
– Board
: Digital product development board
– CPU
: MIPS 528 MHz (I$/D$: 64KB/64KB)
– RAM(kernel) : 256 MB (64 MB)
– NAND
:
Page
Hynix
8 bit
32 MB
1 MB
16 KB
512 bytes
256 byte
Subpage
…
Toshiba
8 bit
64 MB
2 MB
16 KB
512 bytes
256 byte
..
Bus
Total Size Data
OOB
Erasing block
Page
Sub-page
Erasing
Block
5
Topic 1 : Boot time
The correlation between the time and the size
MTD 32MB partition, UBI 1 volume (Toshiba NAND)
Command
insmod MTD NAND
insmod ubi
insmod ubifs
mount
MTD 32 MB
428
295
61
30
MTD 16 MB
429
181
62
30
[ms]
MTD 8 MB
428
123
62
32
900
800
Time [ms]
700
600
mount
insmod ubifs
insmod ubi
insmod MTD NAND
500
400
300
200
100
0
MTD 32 MB
MTD 16 MB
MTD 8 MB
6
Topic 1 : Boot time
The MTD NAND driver initialization is slow.
The time is not depend on the size of MTD partition,
is depend on the size of NAND chip.
• Cause
The building BBT (Bad Block Table) is slow.
• Reading 1 OOB (16 Byte) per a EraseBlock
• BBT of whole chips (MTD0) is build in the MTD NAND driver
initialize.
• Ideas of an improvement
• Building BBT per MTD partitions.
• Using “BBT on Flash” - BBT is in a separate internal volume.
• Using Large Page Size NAND
7
Topic 1 : Boot time
The UBI initialization is slow.
The time is depend on the size of MTD partition.
• Cause
The reading of EC (Erase Counter) and VID (Volume Identifier ) is slow.
•
900
800
700
Time [ms]
Reading 1 page (512 Byte) per
a EraseBlock (2048 PEB * 512 = 1 MB)
• BBT of whole chips (MTD0) is build in
the MTD NAND driver initialize.
• http://www.linux-mtd.infradead.org/
doc/ubi.html#L_scalability
“Scalability issues”
600
mount
insmod ubifs
insmod ubi
insmod MTD NAND
500
400
300
200
100
0
MTD 32 MB
MTD 16 MB
MTD 8 MB
• The ideas of an improvement
•
•
Whole EC/VID are in a separate internal volume like “BBT on Flash”
http://www.linux-mtd.infradead.org/faq/ubi.html#L_attach_faster
“How do I speed up UBI initialization ”
8
Topic 2 : Flash space overhead
•
Problem
“df” command reports too few free space.
The result of df on 32 MB (32768 1k-blocks) MTD partition...
$ df
Filesystem
ubi0_0
•
1k-blocks
22960
Used Available Use% Mounted on
20
21468 0% /mnt/omote
Factors
“http://www.linux-mtd.infradead.org/faq/ubifs.html#L_df_report” describes the details.
“Why df reports too few free space?”
- compression
- write-back
→ The accurate size is unknown because the compression ratio depends on each file.
- space wastage at the end of logical eraseblock
→ as more fully discussed herein after
- garbage-collection
→ The accurate result of garbage-collection is not predictable.
9
Topic 2 : Flash space overhead
•
The factor of UBI
for Internal Volume
(2 PEB)
for Wear-leveling
(1 PEB)
for Scrubbing etc.
(1PEB)
NAND (MTD 32MB, UBI 1 volume)
used
used
UBI Header
512 bytes
reserved
reserved
usable
usable
PEB
size
LEB
size
…
UBI Header
usable
usable
usable
…
usable
reserved
…
for BadBlock
(changeable by
kernel config
(default 1 % = 20PEB)
UBI Header
reserved
10
Topic 2 : Flash space overhead
•
The factor of UBIFS
NAND (MTD 32MB, UBI 1 volume)
24 EBs
reserved by UBI
Superblock/Masterblock (3)
Log/LPT/... (14)
LEB overhead (3440)
Max node (4K + 48)
Max node (4K + 48)
16KB
(1EB)
Max node (4K + 48)
UBI Hdr (512B)
2024 EBs
Rough Calcucation
main area (2007)
2007 EBs x (16KB – 512 – 3440)
= 24366 (1K-blocks)
available size
= 24366 – Indexing overhead (=1.4MB)
= 23 MB (72%)
11
Topic 2 : Flash space overhead
• The actual total size of user data
Conditions:
– MTD partition size: 32MB, UBI volume number: 1
– LZO compression enable
– Random Data Random Data (The compression rate is low)
– “df” command reports “available 22960 KB” when empty.
Toshiba
File Size
file size
1MB
256 KB
64 KB
16 KB
4 KB
1 KB
512 bytes
256 bytes
writable data [MB] (%)
Toshiba (MTD 32MB)
23.5 (71.8)
23.5 (71.8)
23.4 (71.6)
23.2 (70.9)
22.1 (67.5)
18.9 (57.8)
13.6 (41.7)
8.6 (26.3)
0
10
20
MTD partition region [%]
30
40
50
60
70
1MB
256 KB
64 KB
16 KB
4 KB
1 KB
512 bytes
256 bytes
12
80
Description
UBI (2048 EBs)
For Internal-Vol,
# of blocks
4
Wearleveing, Scrubbing
For Bad Block
UBIFS
Super Block/Master Block
(2024
EBs)
LOG/LPT..
Main Area Empty Block
(2007 EBs) Index Block
Data Block
20
3
14
24
19
1964
• Typical Data Block
LEB 893 lprops: free 3072, dirty 368 (used 12432), flags 0x0
13
Topic 2 : Flash space overhead
• Cause
Free space at the tail of PEB
Toshiba
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_spaceacc
“Flash space accounting issues ” – “Wastage”
0
10
20
MTD partition region [%]
30
40
50
60
70
80
1MB
256 KB
4KB
4KB
4KB
4KB
64 KB
File Size
4KB
...
Data :
16 KB
4 KB
1 KB
512 bytes
After
compression :
...
256 bytes
NAND :
1 PEB
Wastage
Wastage
...
1 LEB (15.5KB)
The ideas of an improvement
•
change the unit size (UBIFS_BLOCK_SIZE) of UBIFS to smaller (must be power of
two)
→ influence to the compression ratio and the performance
•
use Large Page size NAND
14
Topic 3 : UBI scrubbing on preempt kernel
•
UBI “Scrubbing” function
–
–
When a bit-flip (ECC 1 bit error) is detected in a physical erase block, the block
is move to other physical erase block.
LEB
It is beneficial for the “read disturb” issue.
–
Worker thread execute “scrubbing” asynchronously.
NAND
PEB 1
•
PEB 2
Symptom
–
In the case of Preempt Kernel, the scrubbing could be canceled. Because the
read thread holding the lock of LEB could be preempted by the wearleveing
thread.
READ THREAD
Worker THREAD
Scheduled Scrubbing
Error!!
LEB
Cancel Scrubbing
15
Topic 3 : UBI scrubbing on preempt kernel
• Patch
– The lock of LEB is released before queuing a scrubbing.
– It is not necessary to hold the lock when queuing it.
•
[drivers/mtd/ubi/eba.c]
int ubi_eba_read_leb(struct ubi_device *ubi, struct ubi_volume *vol, int lnum,
void *buf, int offset, int len, int check)
{
(中略)
+
leb_read_unlock(ubi, vol_id, lnum);
if (scrub)
err = ubi_wl_scrub_peb(ubi, pnum);
-
leb_read_unlock(ubi, vol_id, lnum);
return err;
READ THREAD
Worker THREAD
Scheduled Scrubbing
OK!!
Done Scrubbing
16
Topic 4 : Sub-page write verify
• MTD NAND “Verify NAND page writes” function
– is the write verify function per a NAND physical page.
– is implemented in MTD NAND driver.
• Symptom
– does not support per a sub page.
the another sub page in the same page would not be managed properly.
– UBI uses a sub page write in writing VID header.
–
http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail
“I get "ubi_io_write: error -5 while writing 512 bytes to PEB 5:512“
If you have a 2048 bytes per NAND page device, and have
CONFIG_MTD_NAND_VERIFY_WRITE enabled in your kernel, you will need
to turn it off. The code does not currently (as of 2.6.26) perform verification
of sub-page writes correctly. As UBI is one of the few users of sub-page
writes, not much else seems to be affected by this bug.
※ Not only “a 2048 bytes per NAND page device”.
(ex. This NAND. Page: 512 bytes, Sub-page: 256 bytes)
17
Topic 4 : Sub-page write verify
• Patch
The function for write verifying with column (offset) and size is added to MTD
NAND structure in MTD NAND driver.
[include/linux/mtd/nand.h]
struct nand_chip {
void __iomem
void __iomem
*IO_ADDR_R;
*IO_ADDR_W;
uint8_t
(*read_byte)(struct mtd_info *mtd);
u16
(*read_word)(struct mtd_info *mtd);
void
(*write_buf)(struct mtd_info *mtd, const uint8_t *buf, int len);
void
(*read_buf)(struct mtd_info *mtd, uint8_t *buf, int len);
int
(*verify_buf)(struct mtd_info *mtd, const uint8_t *buf, int len);
+ #ifdef CONFIG_MTD_NAND_VERIFY_WRITE_SUBPAGE
+
int
(*verify_buf_column)(struct mtd_info *mtd, const uint8_t *buf, int len,
+
int column, int verify_size);
+ #endif
void
(*select_chip)(struct mtd_info *mtd, int chip);
int
(*block_bad)(struct mtd_info *mtd, loff_t ofs, int getchip);
int
(*block_markbad)(struct mtd_info *mtd, loff_t ofs);
18
Topic 4 : Sub-page write verify
[drivers/mtd/mtd/nand/nand_base.c]
static int nand_do_write_ops(struct mtd_info *mtd, loff_t to,
struct mtd_oob_ops *ops)
{
(中略)
/* Partial page write ? */
if (unlikely(column || writelen < (mtd->writesize - 1))) {
cached = 0;
bytes = min_t(int, bytes - column, (int) writelen);
chip->pagebuf = -1;
memset(chip->buffers->databuf, 0xff, mtd->writesize);
memcpy(&chip->buffers->databuf[column], buf, bytes);
wbuf = chip->buffers->databuf;
}
if (unlikely(oob))
oob = nand_fill_oob(chip, oob, ops);
ret = chip->write_page(mtd, chip, wbuf, page, cached,
(ops->mode == MTD_OOB_RAW));
if (ret)
break;
+ #if defined(CONFIG_MTD_NAND_VERIFY_WRITE_SUBPAGE) && !defined(CONFIG_MTD_NAND_VERIFY_WRITE)
+
/* Send command to read back the data */
+
chip->cmdfunc(mtd, NAND_CMD_READ0, 0, page);
+
+
if (chip->verify_buf_column(mtd, wbuf, mtd->writesize, column, bytes))
+
return -EIO;
+ #endif
19
Topic 5 : UBIFS/UBI error handling
• UBIFS became read-only when an error occurs.
– for protect data from any further corruption
– http://www.linux-mtd.infradead.org/faq/ubifs.html#L_sudden_ro
“UBIFS suddenly became read-only – what is this?”
• Problem
– It is possible that NAND accesses (ex. read/write/erase..) will occur.
– There could be no way to recover from R/O status after shipping.
• Current Status
– UBI: do not switch to R/O mode on read errors
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux2.6.git;a=commitdiff;h=b86a2c56e512f46d140a4bcb4e35e8a7d4a99a4b
20
Summary
• Boot Time
– The initializations time of MTD NAND driver and UBI depends on the size.
• The optimization of building BBT in MTD driver and reading UBI
information per erase block in UBI is effective for boot time.
ex. “BBT on Flash”, “Large Page Size NAND”..
• Flash Space Overhead
– The unit size (UBIFS_BLOCK_SIZE) is large for small page NAND (16KB).
• Using Large Page Size NAND
• change the size to smaller
• We are preparing the patches in this to the community.
21
References
– MTD, JFFS2, UBIFS, UBI
http://www.linux-mtd.infradead.org/
– YAFFS2
http://www.yaffs.net/
– LogFS
http://www.logfs.com/logfs/
– CE Linux Forum presentations
• Yutaka Araki, “Flash File system, current development status”
http://www.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree20?action=AttachFile&do
=view&target=celf_flashfs.pdf
• Katsuki Uwatoko, “The comparison of Flash File system performance”
http://www.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree19?action=AttachFile&do
=view&target=celf_flash2.pdf
• Keijiro Yano, “JFFS2 / YAFFS“
http://www.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree17?action=AttachFile&do
=view&target=celf_flashfs.pdf
– ELC 2009
• Toru Homma , “Evaluation of Flash File Systems for Large NAND Flash Memory”
http://www.celinuxforum.org/CelfPubWiki/ELC2009Presentations?action=AttachFile&do=view
&target=ELC2009-FlashFS-Toshiba.pdf
22
23
Download PDF