Chapter 8 Physical Storage System¶
8.1 Classification of Physical Storage Media¶
Can differentiate storage into:
- volatile storage(易失存储): loses contents when power is switched off
- non-volatile storage (非易失存储) : Contents persist even when power is switched off. Includes secondary and tertiary storage, as well as batter-backed up main-memory.
Speed with which data can be accessed
Cost per unit of data
Reliability
- data loss on power failure or system crash
- physical failure of the storage device
8.2 Storage Hierarchy(存储级别)¶
Primary storage: Fastest media but volatile (cache, main memory).
Secondary storage: next level in hierarchy, non-volatile, moderately fast access time, also called on-line storage , E.g. flash memory 闪存, magnetic disks 磁盘
Tertiary storage: lowest level in hierarchy, non-volatile, slow access time, also called off-line storage E.g. optical storage 光盘,magnetic tape 磁带
8.3 Magnetic Hard Disk Mechanism¶
一个磁盘有上十万个 track(磁道), 一个磁道又有上千个 sector(扇区,是计算机和磁盘交换数据的最小单位).
Arm assembly 用来寻道,读写头共进退,寻找数据在哪个磁道上。
等对应扇区旋转到读写头,才开始传输数据。同样磁道组成的柱面。对于大文件,最好存在同一个柱面上,这样可以并行读写。
Read-write head
Surface of platter divided into circular tracks(磁道)
Each track is divided into sectors(扇区)
To read/write a sector
- disk arm swings to position head on right track
- platter spins continually; data is read/written as sector passes under head
Cylinder(柱面) i consists of ith track of all the platters
Disk controller(磁盘控制器)– interfaces between the computer system and the disk drive hardware.
8.3.1 Performance Measures of Disks¶
Access time (访问时间) – the time it takes from when a read or write request is issued to when data transfer begins. Consists of:
(1)Seek time(寻道时间)– time it takes to reposition the arm over the correct track.
- Average seek time is ½ the worst case seek time.
- 4 to 10 milliseconds on typical disks
(2)Rotational latency(旋转延迟)– time it takes for the sector to be accessed to appear under the head.
- Average latency is ½ of the worst case latency.
- 4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.)
Data-transfer rate(数据传输率) – the rate at which data can be retrieved from or stored to the disk.
内存传输是以块为单位的。即使是想要访问一个 byte, 也需要把这个 byte 所在的 4k 内存读进来。
Disk block is a logical unit for storage allocation and retrieval
- Smaller blocks: more transfers from disk
- Larger blocks: more space wasted due to partially filled blocks
Sequential access pattern(顺序访问模式):连续的读写请求只需要第一次访问磁盘
Random access pattern(随机访问模式):慢,希望尽量多一些顺序访问。可以用一个日志把要修改的数据记录下来,后面再进行修改,尽量用顺序访问替换随机访问。
I/O operations per second (IOPS ,每秒I/O操作数):Number of random block reads that a disk can support per second. 每秒可以支持随机读的次数。
Mean time to failure (MTTF,平均故障时间) the average time the disk is expected to run continuously without any failure.
8.3.2 Optimization of Disk-Block Access¶
Buffering: in-memory buffer to cache disk blocks
Read-ahead(Prefetch): Read extra blocks from a track in anticipation that they will be requested soon
Disk-arm-scheduling algorithms re-order block requests so that disk arm movement is minimized
- elevator algorithm
File organization
- Allocate blocks of a file in as contiguous a manner as possible
- Allocation in units of extents(盘区)
- Files may get fragmented
Nonvolatile write buffers (非易失性写缓存) – speed up disk writes by writing blocks to a non-volatile RAM buffer immediately
- Non-volatile RAM: battery backed up RAM or flash memory, Even if power fails, the data is safe and will be written to disk when power returns
Log disk(日志磁盘) – a disk devoted to writing a sequential log of block updates
8.4 Flash Storage¶
NAND flash - used widely for storage, cheaper than NOR flash
- requires page-at-a-time read (page: 512 bytes to 4 KB), Not much difference between sequential and random read
- Page can only be written once, Must be erased to allow rewrite
SSD(Solid State Disks) - Use standard block-oriented disk interfaces, but store data on multiple flash storage devices internally
Feature | Magnetic Disk | Solid State Disk |
---|---|---|
Retrieve a page | 5-10 milliseconds | 20-100 microseconds |
Random access | Random 50 to 200 IOPS | Reads: 10,000 IOPSWrites: 40,000 IOPS |
Data transfer rate | 200M | 500M (SATA), 3G (NVMe) |
Power consumption | Higher | Lower |
Update mode | In place | Erase ➔ Rewrite |
Reliability | MTTF: 500,000 to 1,200,000 hours | Erase blocks: 100,000 to 1,000,000 erases |
Erase happens in units of erase block
Remapping of logical page addresses to physical page addresses avoids waiting for erase
Flash translation table tracks mapping
- also stored in a label field of flash page
- remapping carried out by flash translation layer
wear leveling(磨损均衡)- evenly distributed erase operators across physical blocks
8.5 Storage Class Memory (NVM)¶
DRAM | NVM | SSD | HDD | |
---|---|---|---|---|
Read Latency | 1 x | 2 — 4 x | 500x | 10^5 x |
Write Latency | 1 x | 2 — 8 x | 5000x | 10^5 x |
Persistence | No | Yes | Yes | Yes |
Byte-Addressable | Yes | Yes | No | No |
Endurance | Yes | No | No | Yes |