1 Computer abstractions and Technology¶
冯诺依曼体系结构
- 计算于存储分离
- 数据与指令保存在同一个存储器
- Input and output mechanisms (I/O)
- Instruction set architecture
1.1 Computer Organization¶
Hardware¶
CPU(Processor) : active part of the computer, which contains the datapath and control and which adds numbers, tests numbers, signals I/O devices to activate, and so on.
- Datapath (数据通路): performs arithmetic operation
- Control (控制通路): commands the datapath, memory, and I/O devices according to the instructions of the program
Memory : the storage area programs are kept and that contains the data needed by the running programs
- Main Memory(主存): volatile; used to hold programs while they are running.(e.g. DRAM in computers)
- Second memory: nonvolatile; used to store programs and data between runs. (Flash in PMD, magnetic disks)
针对内存的特性有
Volatile (易失性): - DRAM (Dynamic Random-Access Memory):动态随机存储器 - SRAM (Static Random Access Memory):静态随机存储器
Nonvolatile (非易失性) - Solid state memory (Flash Memory):固态硬盘 or 闪存 - Magnetic disk (Hard disk) :硬盘
Software¶
1.2 Computer design: performance and idea¶
Response time/execution time(响应时间/执行时间):处理任务的时间
Throughput(bandwidth)(吞吐量):单位时间内完成的任务
How are response time and throughput affected by 1. Replacing the processor with a faster version. 2. Adding more processors.
-
换一个更快的处理器可以对这两个指标都有提升
-
增加更多的处理器不能增快执行时间,但可以增加吞吐率
Relative Performance =1/Execution Time
Elapsed time (实际经过的时间)
- Total response time, including all aspects (Processing, I/O, overhead, idle time)
- Determine system performance
CPU time¶
- Time spent processing a given job (Discounts I/O time, other jobs’ shares)
- Comprises user CPU time and system CPU time 包括用户和系统的CPU时间
- Different programs are affected differently by CPU and system performance
\(\text{Clock Rate}\) 就是单位时间CPU能走多少个 \(\text{Cycles}\)
CPU Time Example
Computer A: 2GHz(=\(2\times 10^9 Hz\)) clock, 10s CPU time
Designing Computer B
-
Aim for 6s CPU time
-
Can do faster clock, but causes 1.2 × clock cycles
How fast must Computer B clock be?
Instruction Count (IC) and CPI (Cycles per Instruction)
Instruction Count for a program : determined by program, ISA(指令集) and complier
CPI example
Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much?
CPI in more details
- If different instruction classes take different numbers of cycles 不同的指令拥有不同的CPI
- Weighted average CPI 平均CPI(带权重)
- \(\text{Power} = \text{Capacitive load}\times \text{Voltage}^2\times \text{Frequency}\)
Summary $$ \text{CPU Time}=\frac{\text{Instructions}}{\text{Program}}\times\frac{\text{Clock cycles}}{\text{Instrction}}\times\frac{\text{Seconds}}{\text{Clock cycles}} $$
Performance depends on
- Algorithm: affects \(IC\), possibly \(CPI\)(average)
- Programming language: affects \(IC, CPI\)
- Compiler: affects \(IC, CPI\)
- Instruction set architecture: affects \(IC, CPI, T_c(\text{cycle time})\)
1.3 评价计算机的运行效率¶
1.3.1 SPEC¶
SPEC: Stanard Performance Evaluation Corp
SPEC CPU Benchmark¶
programs used to measure performance
-
Elapsed time to execute a selection of programs, Negligible I/O, so focuses on CPU performance
-
Normalize relative to reference machine
-
Summarize as geometric mean of performance ratios
SPEC Power Benchmark¶
- Performance: ssj_ops/sec
- Power: Watts(Joules/sec)
1.3.2 Some pitfalls¶
Amdahl’s Law¶
仅提升某一块内容的效率对整体效率的提升是有上限的
Improving an aspect of a computer and expecting a proportional improvement in overall performance
multiply accounts for 80s/100s
How much improvement in multiply performance to get 5× overall?
\(20=\frac{80}{n}+20 \Rightarrow\) can't be done
MIPS as a Performance Metric¶
MIPS : Millions of Instructions Per Second
把每秒运行的指令数来作为评价的方法
MIPS as a performance metric
没有解释:
- Differences in ISAs between computers
- Differences in complexity between instructions
1.4 Eight Great Ideas¶
- Design for Moore’s Law (设计紧跟摩尔定律)
Design for where it will be when finishes rather than design for where it starts.
Example
Adding electromagnetic aircraft catapults (which are electrically powered as opposed to current steam-powered models), allowed by the increased power generation offered by the new reactor technology
- Use Abstraction to Simplify Design (采用抽象简化设计)
Lower-level details are hidden to offer a simple model at higher level
Example
Building self-driving cars whose control systems partially rely on existing sensor systems already installed into the base vehicle, such as lane departure systems and smart cruise control systems 将传感器系统进行了抽象
- Make the Common Case Fast (加速大概率事件)
Making the common case fast will tend to enhance performance better than optimizing the rare case
Example
Express elevators in building
- Performance via Parallelism (通过并行提高性能)
Get more performance by performing operation in parallel
Example
Increasing the gate area on a CMOS transistor to decrease its switching time
-
Performance via Pipelining (通过流水线提高性能)
-
Performance via Prediction (通过预测提高性能)
To guess and start working rather than waiting until you know for sure
Example
Aircraft and marine navigation systems that incorporate wind information
- Hierarchy of Memories (存储器层次)
With the fastest, smallest, and most expensive memory per bit at the top of the hierarchy and the slowest, largest, and cheapest per bit at the bottom
Example
Library reserve desk
- Dependability via Redundancy (通过冗余提高可靠性)
Example
Suspension bridge cables