Storage: HDD, SSD and RAID Why? Computer architecture

Storage: HDD, SSD and RAID Why? Computer architecture
Why?
Storage: HDD, SSD and RAID
Johan Montelius
Give me two reasons why we would like to have secondary storage?
KTH
2017
1 / 33
Computer architecture
2
4
2
6
4
6
2
1
1
4
Computer architecture
GPU
Gigabyte Z170 Gaming
PCIe x16/x4
PCIe x1
USB 3.1
USB 3.0
USB 2.0
SATA-III
SATA Express
M.2
gigabit Ethernet
DDR4 SDRAM
2 / 33
< 1 ns
SDRAM
< 10 ns
CPU
PCIe x16 up to 128Gb/s
memory bus up to 160 Gb/s
USB up to 10Gb/s
PCIe x4
3 / 33
BIOS
Control Hub
keybord
audio
SATA up to 6Gb/s
SAS up to 12Gb/s
network
10 µs - 10 ms
4 / 33
System architecture
how to interact with a device
application
user space
driver
I/O library
I/O
POSIX API
generic block interface
driver
driver
driver
device
HDD
SSD
kernel space
status
command
data
device
A register to read the status of the device.
A register to instruct the device to read or
write.
A register that holds the data.
I/O-bus could be separate from memory
bus (or the same).
The driver will use either special I/O
instructions or regular load/store
instructions.
70 percent of the code of an operating system is code for device drivers.
5 / 33
if you have the time
asynchronous I/O and interrupts
int read_request ( int pid , char * buffer ) {
char read_from_device () {
while ( STATUS == BUSY ) {}
while ( STATUS == BUSY ) {} // do nothing , just wait
COMMAND = READ ;
COMMAND = READ ;
while ( STATUS == BUSY ) {}
6 / 33
interrupt - > process = pid ;
interrupt - > buffer = buffer ;
// do nothing , just wait
return DATA ;
block_process ( pid );
}
}
7 / 33
scheduler ();
8 / 33
asynchronous I/O and interrupts
process state
scheduled
int interrupt_handl er () {
start
int pid = interrupt - > pid ;
*( interrupt - > buffer ) = DATA ;
}
running
ready
exit
timeout
I/O completed
ready_process ( pid );
This is very schematic, more complicated in real life.
Direct Memory Access
exit
blocked
I/O initiate
The kernel is interrupt driven.
9 / 33
the device driver
10 / 33
Allow devices to read and write to buffers in physical memory.
Each physical device is controlled by a device driver that provides the abstraction
of a character device or block device.
int write_request ( int pid , char * string , int size ) {
while ( STATUS == BUSY ) {}
Block devices used as interface to disk drives that provide persistent storage.
memcpy ( string , buffer , size )
All though all storage devices are presented using the same abstraction, they
have very different characteristics.
COMMAND = WRITE ;
blocked - > pid = pid ;
To understand the challenges and options of the operating system, you should
know the basics of how storage devices work.
block_process ( pid );
}
scheduler ();
DMA often limited to lower memory addresses.
11 / 33
12 / 33
Anatomy of a HDD
Sector addressing
Historically sectors address by cylinder-head-sector (CHS), due to
incompatibe standars the limitation was:
track/cylinder
sectors per track varies
sector size: 4K or 512 bytes
platters: 1 to 6
heads: one side or two sides
Only one head at a time is used (no parallel read).
HDD - Hard Disk Drive
cylinder: 1024 (10-bits)
heads: 16 (4-bits)
sectors per cylinder: 63 (6-bits)
number of sectors: 1 Mi
largest disk assuming 512 Byte sectors: 512 MiByte
Today, sectors are addresses linearly 0.. n, Linear Block Addressing (LBA):
28-bit or 48-bit address
up to 256 Ti sectors
largest disk assuming 4 KiByte sectors: 1 PiByte
> sudo hdparm /dev/sda
> dmesg | grep ata2
13 / 33
14 / 33
HDD - Hard Disk Drive
Seagate Cheetah 15K
Seagate Desktop
total capacity: 2 TiByte
form factor: 3.5"
rotational speed: 7.200 rpm
connection: SATA III
cache size: 64 MiByte
read throughput: 156 MByte/s
total capacity: 600 GiByte
form factor: 3.5"
rotational speed: 15.000 rpm
connection: SAS-3
cache size: 16 MiByte
read throughput: 204 MByte/s
ST2000DM001, aprx price, October 2016, 900:-
ST3300657SS, aprx price, October 2016, 2.200:15 / 33
16 / 33
access time
HDD - shoot out
seek time: time to move arm to the
right cylinder
rotation time: time to rotate the
disk
read time: read one or more sectors
read/write performance
17 / 33
If a sector is 512 bytes, it takes 10ms to find and read a sector, and we want to
reaad 512 MiBytes then .....?
Seagate Desktop
rotation speed: 7200 rpm
average seek time: < 10 ms
average time to read a sector: 14ms
capacity: 2 TiByte
aprx. price: 900:cost capacity: 0.44 SEK/GiByte
18 / 33
who’s in control
Historically, the Operating System was
in complete control:
it knew the layout
cylinder-head-sector (CHS),
could order data in segments that
were close to each other and,
would schedule disk operations to
minimize arm movement.
Time to find first sector is less relevant.
If sectors that belong to the same file are close to each other we minimize
movement of arm.
Rotational speed should be high.
The density i.e. how many sectors in each track is important.
The communication with the drive should be fast.
Typical read and write performance is between 150 MiByte/s to 250
MiByte/s.
Seagate Cheeta 15K
rotation speed: 15000 rpm
average seek time: < 4 ms
average time to read a sector: 6ms
capacity: 600 GiByte
aprx. price: 2.200:cost capacity: 3.70 SEK/GiByte
Today, the drive can often make a
better decission:
it knows, but might not reveal, the
layout.
The operating system can help in
grouping operations togheter,
allowing the drive to decide it what
order they should be done (Native
Command Queuing).
There is a reason why MS-DOS is called MS-DOS.
19 / 33
20 / 33
SSD - Solid State Drive
SD cards - flash memory
Samsung 850 EVO
SanDisk Ultra SDXC
total capacity: 250 GiByte
form factor: 2.5"
connection: SATA III
cache size: 64 MiByte
random access: 30 µ s
read throughput: 540 MiByte/s
form factor: SDXC
capacity: 64 GiByte
read performance: 80 MiByte/s
aprx price, October 2016, 300:-
MZ-75E250B/EU, aprx price, October 2016, 1000:21 / 33
NAND - flash storage
22 / 33
price performance
memory bank
erase blocks ~256 KiByte
pages ~4KiByte
Drive
HDD Desktop
HDD Performance
SSD Desktop
Capacity
2 TiByte
600 GiByte
250 GiByte
Price SEK/GiByte
900:44 öre
2.200:3.70:1000:4:-
You have constant time access to any page.
You can only write to (or program) an erased page.
You can only erase a block.
23 / 33
24 / 33
Bus limitations
SSD on the PCIe bus
Intel SSD 750 Series
SATA-III - 6 Gb/s, most internal HDD and SSD today
SAS-3 - 12 Gb/s, enterprise RAID HDD
USB3.1 - 10 Gb/s, everything
PCI Express 3.0 x16 - 128 Gb/s, what is it used for?
total capacity: 400 GiByte
connection: PCI Express 3.0 x4
read performance: 2200 MByte/s
write performance: 900 MByte/s
An SSD has a read througput of 500 MiByte/s which is a .... b/s?
aprx price, October 2016, 4.599:25 / 33
The M.2 connector
Intel 600P Series 512GB
SSD on the memory bus
26 / 33
HP NVDIMM 8GB
total capacity: 512 GiByte
form factor: M-keyed
connection: M.2 - PCI Express 3.0
x4
read performance: 1775 MByte/s
write performance: 560 MByte/s
regular DRAM backued up by Flash
total capacity: 8 GiByte
form factor: DDR4 SDIM
bus speed: 2133 MHz
aprx price, October 2016, ???
aprx price, October 2016, 1799:27 / 33
28 / 33
Next year?
Increase capcity, performance and/or reliability
Redundant Array of Independet Disks
RAID
Intel Optane - 3D XPoint NVDIMM
in the pipe line
total capacity: 512 GiByte
the abstraction layer
29 / 33
Multiple disks that can provide:
capacity: looks like a 20 TiByte
disk but is actually 10 2TiByte disks
performance: spread a file across
ten drives, read and write in
parallell
reliability: write the same file to
several disks, if one crashes - not a
problem
RAID levels
30 / 33
Alternatives:
RAID 0: stripe files across several drives.
RAID 1: keep a complete mirror copy of each file.
RAID 2-6: spread a file plus parity information across several drives.
The cabinet that holds the disks present itself as one drive.
A device driver in the kernel knows that we have several disks but the kernel
presents it as one disk to the application layer.
The application layer knows that we have several disks but provides a API to
other applications that looks a single drive.
31 / 33
32 / 33
Summary
application layer, simple to understand
system calls: open, read, write, lseek ...
all devices have a generic API
device drivers that know what they are doing
now it’s a bit structured
I/O and memory buses, protocols suchs as SATA, SCSI, USB etc
hardware - a complete mess
33 / 33
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising