3 Arithmetic for Computer¶

文本统计：约 2376 个字

3.1 Numbers¶

Radix based systems are dominating : decimal, octal, binary

Representation :

ASCII - text character(External)
binary number(Internal)

Number types:

Integer numbers, unsigned
Signed numbers
Floating point numbers

Different grades of precision:

Singe precision (IEEE 574)

Double precision (IEEE 574)

Quadruple precision

3.2 Addition, subtraction and ALU¶

3.2.1 +,- and Overflow¶

加法：Adding bit by bit, carries → next digit

减法：可以直接减，也可以加上相应的补码。

溢出情况：

对于溢出的情况，我们可以：

Ignore
Reaction from the OS
Signaling to application (Python,...)

相应的process：

Hardware detection in the ALU
Save the instruction address (not PC) in special register EPC
Jump the specific routine in OS (将下一条地址改为OS处理异常情况的入口地址)
Correct & return to program
Return to program with error code
Abort program

3.2.2 ALU¶

1 Constructing ALU

两种构建ALU的方法：Modular design(模块化设计)，sharable logic with “select”

Step by step: 从一位的ALU开始逐渐扩展到想要的宽度。

大体可以分为两个模块： Logic AND and OR （逻辑模块）& Arithmetic（算数模块）

下图是一个全加器：

于是我们可以构建 1 bit ALU

有AND, OR, ADD功能

增加一个反码的组件，添加了减法功能

为添加一个比较的功能

我们先在每一位上添加一个less，然后在最高位添加一个Overflow的检查器

将最后一位的检测结果给到第一位的less上，从而输出比较的结果0/1.

进而可以完成ALU的功能，并再添加一个零检测的功能（将result连接到一个或门上即可）。

2 ALU symbol & Control:

Symbol:

Control: Function table

ALU Control Lines	Function
000	And
001	Or
010	Add
110	Sub
111	Set on Less than
100	Nor
101	srl
011	xor

3.3 Multiplication¶

3.3.1 Multiplier V1/V2/V3¶

根据手算乘法的方法，构建了一个简单的乘法器（以下所有数均为无符号数）

64 bits: multiplier
128 bits : multiplicand, product, ALU

根据乘数的末位（每个时钟一直在往右移）来判断被乘数是否要加上原先的结果：如果是1，那么将左移后的被乘数与保存的product相加；如果是0，那么保留原先的product。

取乘数位数的时候取右移的好处：电路更简单。

如何优化？

Product必须要有128位，因为最坏情况两个64位的数相乘会到128位
我们是否可以改进Multiplicand为64位？

我们可以将被乘数左移换为Product右移。

将被乘数与product的左边相加，然后Product为129位的原因：两个64位数相加可能为65位，但这个并不意味着溢出，因为我们是先相加再移位，所以这个65位的结果是没有问题的且是有效的。

Multiplier V2 example

然而注意到乘数和Product均是往右移，所以我们可以将他们合并

于是我们改进第二版乘法器，得到第三版：

Multiplier V3 example

如果是有些符号的，只要将符号位单拎出来先判断，然后剩余部分利用无符号乘法的方法即可。

3.3.2 Improved method: Booth’s Algorithm¶

优势：更快，并且可以直接计算有符号的数

本质：将很多1的组合换为1较少的组合。

先在乘数后面补一位0，然后开始判断bit[1]和bit[0]两位是什么然后对product进行相应的操作

bit[1][0]	含义/操作
10	Beginning, 对product减去被乘数
11	middle of ‘1’ , 对product进行右移
01	End，对product加上被乘数
00	middle of ‘0’ ，对product进行右移

例子

计算\(0010\times 1101=11111010\)

3.4 Division¶

根据手算除法的方法，构建了一个简单的除法器

At first, the divisor is in the left half of the divisor register（一开始被除数是在寄存器的左半边的）
Shift right the divisor register each step（每次除法Divisor往右移一位）

如果Reminder减去Divisor小于0，那么我们将Divisor加回去，并将Quotient推入0；如果Reminder减去Divisor大于等于0，那么将Quotient推入1。进行一次操作后右移Divisor。

Divisor V1 example

根据乘法器的优化方法，得到

Reminder的组成左边是原先的Reminder，右边是原先的Quotient

每次计算与V1相同，只是这次我们右移Reminder，然后将商的部分添加到最后的部分。由于最后一步仍将商的部分推入，余数部分多左移了一位，所以最后还要将余数部分单独向右移一位（观察下面例子的最后一步）。

Divisor V2 example

有符号的除法：余数跟被除数的符号相同，商可以根据被除数和除数符号的关系来定。

3.5 Floating point numbers¶

3.5.1 Representation¶

Binary notation : \(xxxxx⋅2^{yyyyy}\)

Standardized format IEEE 754

由于二进制的科学表示法整数位一定是1，所以这个省略了。

Exponent is biased:

00...000 smallest exponent

11...111 biggest exponent

Bias 127 for single precision
Bias 1023 for double precision

Summary：

\[ (−1)^{\text{sign}}⋅(1+\text{significand})⋅2^{\text{exponent−bias}} \]

一定要注意有bias这个偏移量！！！

将-0.75用IEEE表示为单精度与双精度

3.5.2 Range¶

Single-Precision Range¶

Exponents 00000000 and 11111111 reserved

Smallest value

Exponent: 00000001
Fraction: 000...000
\(±1.0×2^{−126}≈±1.2×10^{−38}\)

Largest value

Exponent: 11111110
Fraction: 111...111
\(±2.0×2^{127}≈±3.4×10^{38}\)

Double-Precision Range¶

Exponents 00000000 and 11111111 reserved

Smallest value

Exponent: 00000000001
Fraction: 000...000
\(±1.0×2^{−1022}≈±2.2×10^{−308}\)

Largest value

Exponent: 11111111110
Fraction: 111...111
\(±2.0×2^{1023}≈±1.8×10^{308}\)

3.5.3 Floating - Point Precision¶

Single: approx \(2^{-23}\)，相当于\(23\times \log_{10}2≈23×0.3≈6\)，十进制下大概是小数点后六位的精度

Double: approx \(2^{-52}\)，相当于\(52×\log_{10}2≈52×0.3≈16\)，十进制下大概是小数点后十六位的精度

所以我们写浮点数时一般使用双精度浮点数

3.5.4 Limitations¶

注意Limitations不仅有上界也有下界，太小的数精度要求太高

Overflow: The number is too big to be represented
Underflow: The number is too small to be represented

对于太小的数和0，我们可以使用非规格数来表示

3.5.5 Infinities and NaNs¶

无限大和无意义的数由之前保留下来的两个数来表示

Exponent = 111...1, Fraction = 000...0

±Infinity
Can be used in subsequent calculations, avoiding need for overflow check

Exponent = 111...1, Fraction ≠ 000...0

Not - a -Number (NaN)
Indicates illegal or undefined result, e.g. 0.0 / 0.0

关于3.5.4和3.5.5中非规格数和特殊数的内容详见IEEE754规范: 四, 非规格数, ±infinity, NaN - 知乎

3.5.6 Floating point addition¶

先对齐，然后再做加法，接着正规化，最后修约

对齐的时候，将小的数右移向大的数对齐，这样做是为了保证正确。如果将大的数向左移可能会把高位移掉，会引发数据的错误；而如果将小的数右移那么可能会将尾数移掉，只影响了精度，对数据的大小不会产生太大的影响。

整体算法流程如下图：

Normalize Significands
Add Significands
Normalize the sum
Over/underflow
Rounding
Normalization 保证最后修约完后仍符合科学计数的原则

Example 0.5+(-0.4375) in binary

下图是整体算法的硬件示意图

3.5.7 Floating point Multiplication and Division¶

乘法的算法流程如下所示：

Add exponents 将指数进行相加
Multiply the significands 将有效数相乘
Normalize 正规化
Over / underflow 检测是否上溢或者下溢
Rounding 修约
Sign 符号位确定

Example of floating point multiplication

假设有两个单精度浮点数，分别表示为：

A = 0 10000011 10100000000000000000000
B = 1 10000010 11000000000000000000000

首先，我们可以得到乘积的符号位为负，因为两个乘数的符号位不同。

接着，将两个乘数的指数位相加，并减去偏移量 127：

指数位 A = 10000011（131）
指数位 B = 10000010（130）
指数位结果 = 131 + 130 - 127 = 134（10000110）

然后，将两个乘数的尾数位相乘。注意添加隐含的 1 位：

1.10100000000000000000000 * 1.11000000000000000000000 = 11.11010000000000000000000000000000

由于乘积的尾数位长度为 48，而单精度浮点数的尾数位长度为 23，所以需要进行规格化和舍入操作：

规格化后的尾数位为：

1.11101000000000000000000

相应地，更新乘积的指数位为：

10000110 + 1 = 10000111

因此，乘积表示为：

P = 1 10000111 11101000000000000000000

除法操作与乘法类似

Subtraction of exponents
Division of the significands
Normalization
Rounding
Sign

3.5.8 Accurate Arithmetic¶

Guard: the first of two extra bits. 有效数字后面一位

Round: method to make the immediate floating point result fit the floating-point format.保留位后面一位

Sticky bit: A bit used in rounding in addition to guard and round that is set whenever there are nonzero bits to the right of the round bit. 检测Round后面有没有非零项，意义在于辨别\(0.50\dots00\)与\(0.50\dots01\)

Units in the last place (ulp): The number of bits in error in the least significant bits of the significant between the actual number and the number that can be represented.

相应解释：ulp(unit in the last place)是什么意思-CSDN博客

Rounding Modes

Always round up (to 正无穷)
Always round down (to 负无穷)
Truncate
Round to nearest even

Round to nearest even

Homework¶

3.32 Calculate \((3.984375 × 10^{−1} + 3.4375 × 10^{−1} ) + 1.771 × 10^3\) by hand, assuming each of the values is stored in the 16-bit half precision format described in Exercise 3.27 (and also described in the text). Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps, and write your answer in both the 16-bit floating point format and in decimal.

Answer

先将所有数都转换成二进制的数（由于这个计数法是省略1的，写成1+尾数的形式）然后对A和B进行相加（包含1）并规范化得到\(1.0111110000\times 2^{-1}\)然后与C相加，向指数较大的数靠拢，变成\(.0000000000 10111110000\)那么此时Guard位就是1,Round位就是0,由于后面非零，那么Sticky位就是1,此时与C相加得到的结果是\(1.1011101011(101)\)修约的时候由于是Round to nearest even超过100直接进位，得到\(1.1011101100\times2^{10}=1772\)

评论区

对你有帮助的话请给我个赞和 star =>

欢迎跟我探讨！！！