跳转至

2 Instructions: Language of the Machine

文本统计:约 4748 个字 • 199 行代码

The process of compiling

  • High-level programming language

用更自然的语言叙述,编译器将它翻译成汇编语言。与汇编语言相比,高级编程语言对于硬件来说更加独立。

  • Assembly language

Assembler 将它翻译为机器指令

  • Machine language

2.1 Introduction

Language of the machine

  • Instructions \(\rightarrow\) Word
  • Instruction set \(\rightarrow\) Vocabulary

我们选择的指令集是RISC-V

Instruction Set

CISC vs. RISC

CISC RISC
Emphasis on hardware(>200 instructions) Emphasis on software(<100 instructions)
Multiple instruction sizes and formats Instructions of same size with few formats
Less registers Uses more registers
More addressing modes Fewer addressing modes(Load/Store)
Extensive use of microprogramming Complexity in compiler
Instructions take a varying amount of cycle time Instructions take one cycle time
Pipelining is difficult Pipelining is easy

Instruction formats

一条指令由Operators与Operands构成

Stored-program concept

  • Instruction are represented as numbers.
  • Programs can be stored in memory to be read or written just like numbers.

2.2 Arithmetic Operations

Each line of this language can contain at most one instruction.

Add and subtract, three operands, like add a,b,c // a gets b+c

All arithmetic operations have this form, 固定了运算数的数量,对硬件的实现更加容易。

Design Principle 1: Simplicity favors regularity

Regularity makes implementation simpler
Simplicity enables higher performance at lower cost

Arithmetic Example

2.3 Operands

Arithmetic instructions use register operands

RISC-V has a \(32\times 64\) -bit register file

  • Use for frequently accessed data
  • 64-bit data is called a “double word”
  • 32-bit data is called a ”word”

\(32\times 64\)-\(bit\) :32个寄存器,每个寄存器是64位的

Design Principle 2: Smaller is faster

A very large number of registers may increase the clock cycle time simply because it takes electronic singnals longer when they must travel farther, 但是这话也并不绝对,31位寄存器就不见得比32位的快,我们需要在craving of programs for more registers 和the designer's desire to keep the clock fast 之间找到平衡。

2.1 RISC-V Registers

Register Operand Example

2.2 Memory Operands

处理器只能保留一小部分数据在寄存器中,大部分数据结构被存储在了内存当中。而Arithmetic operations occur on register in RISC-V instructions,所以RISC-V需要有相关指令来在内存和寄存器之间传输数据,这些指令被称为data transfer instruction。

  • Main memory used for composite data, such as Arrays, structures, dynamic data
  • Memory is byte addressed, each address identifies an 8-bit byte
  • RISC-V is Little Endian. It means least-significant byte at least address of a word
  • RISC-V does not require words to be aligned in memory

Memory Alignment

Endianness/byte order

Big end: leftmost.典型例子有PowerPC,01 02就是513(后面的位置更大)

Little end: Rightmost.典型例子有RISC-V,01 02就是258

Bi-endian 有MIPS,ARM,Alpha,SPARC

Memory Operand Example

C code: A[12]=h+A[8];

Compiled RISC-V code:

ld      x9, 64(x22)//ld 就是loaddoubleword 加载双字,64=8*8
add     x9,x21,x9
sd      x9,96(x22)
其中h在x21中,base address of A 存在x22中

  • offset:偏移量,the constant in a data transfer instruction

  • base register: the register addded to form address,在这里就是x22

  • ld: standing for load doubleword

  • sd: standing for store doubleword

Register vs. Memory

  1. Registers are faster to access than memory
  2. Operating on memory data requires loads and stores with more instructions to be executed
  3. Compiler must use registers for variables as much as possible, only spill to memory for less frequently used variables

2.3 Constant or Immediate Operands

Many time a program will use a constant in an operation

  • Incrementing index to point to next element of array
  • Add the constant 4 to register x9
  • Assuming AddrConstants4 is address pointer of constant 4
ld      x9, AddrConstant4(x3) // x9=constant 4
add     x22, x22, x9

x3+ AddrConstants4是常数4在内存上的地址

另一个不用load指令的方法是引入一个新的arithmetic instruction专门处理有运算数为常数的。这个新的arithmetic instruction 被称为add immediate or addi, 于是相应的代码就可以这样写。

addi x22, x22, 4 // x22= x22 + 4

addi 是几乎所有RISC-V程序中最为常见的指令。

常数0有另一个作用,它可以简化指令集。比如说要对一个值进行取反可以直接使用0作为sub运算的第一个运算符。所以RISC-V将寄存器x0设定值为0。这体现了Design Principle 3: Make common case fast

Summary

2.4 Representing Instructions

2.4.1 RISC-V R-Format Instructions

  • opcode: operation code
  • rd: destination register number
  • funct3: 3-bit function code (additonal opcode)
  • rs1: the first source register number
  • rs2: the second source register number
  • funct7: 7-bit function code (additonal opcode)

R-Format Instruction

2.4.2 RISC-V I-Format Instructions

  • rs1: source or base address register number

  • immediate: constant operand, or offset added to base address (2s-complement, sign extended)

这里就可以解答前面R型指令为什么要把rs2放在rs1前面,因为这样可以保证不同指令结构相似,只需要将R型指令的funct7和rs2读成一个immediate.

这个指令同时可以完成ld指令,只需要把rs1设为基地址和immediate设为偏移量,rd还是目标寄存器

I-Format Instruction

ld x9, 64(x22)

  • 22 (x22) is placed in rs1;

  • 64 is placed in immediate

  • 9 (x9) is placed in rd

2.4.3 RISC-V S-Format Instructions

  • rs1: base address register number

  • rs2: source operand register number

  • immediate: offset added to base address (Split so that rs1 and rs2 fields always in the same place)

特意设置一种S-Format是因为S型和I型数据的流动方向并不一样,I型是rs1+immediate到rd,S型是rs2到rs1+immediate。

2.4.4 From C to Machine Instruction

C code

A[30] = h + A[30] + 1;

RISC-V assembly code

ld x9, 240(x10) // temporary reg x9 gets A[30]
add x9, x21, x9 // temporary reg x9 gets h + A[30]
addi x9, x9, 1 // temporary reg x9 gets h + A[30] + 1
sd x9, 240(x10) // stores h + A[30] + 1 back into A[30]

RISC-V machine language code

ld | immediate | rs1 | funct3 | rd | opcode |
        240      10     3       9      3
add | funct7 | rs2 | rs1 | funct3 | rd | opcode |
        0       9    21      0      9      51
addi | immediate | rs1 | funct3 | rd | opcode |
           1        9      0      9      19
sd | im[11:5] | rs2 | rs1 | funct3 | im[4:0] | opcode |
         7       9    10      3        16        35

Summary

2.5 Logical Operations

2.5.1 Shift Operations (I)

immed: how many positions to shift

Shift left logical: Shift left and fill with 0 bits, slli by i bits multiplies by \(2^i\)

Shift right logical: Shift right and fill with 0 bits, srli by i bits divides by \(2^i\) (unsigned only)

2.5.2 AND Operations

实现按位与操作

可以用来保护某些位,并将其他位清空。

2.5.3 OR Operations

实现按位或的操作

可以用来对一个数的某些位进行置1操作,并保持其他位置上是数不变

2.5.4 XOR Operations

实现按位异或的操作

可以用来标记两个数不同的位置

Summary

2.6 Instructions for making decisions

Branch Instructions: branch to a labled instruction if a condition is true, otherwise, continue sequentially.

  • beq (Branch equal)
beq rs1,rs2,L1  //if (rs1==rs2) branch to instruction labled L1
  • bne (Branch not equal)
bne rs1,rs2,L1  //if (rs1!=rs2) branch to instruction labled L1
  • blt (Branch less than)
blt rs1,rs2,L1  //if (rs1<rs2) branch to instruction labled L1
  • bge (Branch greater equal)
bge rs1,rs2,L1  //if (rs1>=rs2) branch to instruction labled L1

处理有符号数比较的:blt,bge

处理无符号数比较的:bltu,bgeu

跳转指令介绍见跳转指令

2.6.1 if 语句

C code

if ( i = = j ) goto L1 ;
f = g + h ;
L1: f = f - i ;

RISC-V assembly code

beq  x21, x22, L1 # go to L1 if i equals j
add  x19, x20, x21 # f = g + h ( skipped if i equals j )
L1: sub x19, x19, x22 # f = f - i ( always executed )

2.6.2 if-then-else 语句

C code

if ( i = = j ) 
    f = g + h ; 
else 
    f = g - h ;

RISC-V assembly code

bne x22, x23, Else # go to Else if i != j
add x19, x20, x21 # f = g + h ( Executed if i = = j 
if)
beq x0, x0, EXIT # go to Exit if语句执行完直接跳转否则会按顺序执行之后的语句
Else: sub x19, x20, x21 # f = g - h ( Executed if i ≠ j 
else) 
Exit: # the first instruction of the next C

2.6.3 While 语句

C code

while (save[i]==k)
    i = ++i;

RISC-V assembly

Loop:   slli x10, x22, 3 # temp reg $t1 = 8 * i
        add x10, x10, x25 # x10 = address of save[i]
        ld x9, 0(x10) # x9 gets save[i] 直接把地址计算出来了
        bne x9, x24, Exit # go to Exit if save[i] != k
        addi x22, x22, 1 # i += 1
        beq x0, x0, Loop # go to Loop 
Exit:

2.6.4 Switch/Case 语句

C code

switch ( k ) {
    case 0 : f = i + j ; break ; /* k = 0 */
    case 1 : f = g + h ; break ; /* k = 1 */
    case 2 : f = g - h ; break ; /* k = 2 */
    case 3 : f = i - j ; break ; /* k = 3 */
}

Basic Blocks (基本单元)

  • No embedded branches (except at end)
  • No branch targets (except at beginning)

RISC-V assembly

创建了一个跳转表Jumptable,就不用管每一个单位的程序的长度了

2.7 Supporting Procedures

procedure/function be used to strcture programs

  • Place Parameters in a place where the procedure can access them 把参数放在一些进程能访问的地方
  • Transfer control to the procedure:jump to 跳转到相应进程
  • Acquire the storage resources needed for the procedure
  • Perform the desired task
  • Place the result value in a place where the calling program can access it
  • Return control to the point of origin

2.7.1 Procedure Call Instructions

Instruction for procedures: jal (jump-and-link)

jal     x1,ProcedureAddress
  • Address of following instruction put in x1 把返回后需要执行的指令放在x1,就是\(PC+4\rightarrow ra\)

  • Jumps to target address, 并跳转到子进程的地址

Procedure return: jalr(jump and link register)

jalr    x0,0(x1)
  • Like jal, but jumps to 0 + address in x1

  • Use x0 as rd (x0 cannot be changed)

  • Can also be used for computed jumps (比如说case/switch statements)

2.7.2 Using More Registers

\(a_0\)~\(a_7\) (\(x10-x17\)):八个参数寄存器来传参以及返回值(x10)的

\(ra/x1\):储存返回地址的寄存器

但是如果编译器需要超过八个的寄存器,理想的用来处理溢出寄存器的数据结构是堆栈,在RISC-V中Stack pointer被存放在x2中,也被称为sp。同时Stack grow from higher address to lower address。

Compling a leaf procedure

C code:

long long int leaf_example (long long int g, long long int h,long long int i, long long int j)
{ 
    long long int f;
    f = (g + h) - (i + j);
    return f;
}
RISC-V assembly code:
addi sp,sp,-24  //在栈中为这三个项添加三个位置
//push
sd  x5,16(sp)   //把x5,x6,x20d分别存储在栈中的三个位置
sd  x6,8(sp)
sd  x20,0(sp)
//calculate
add x5,x10,x11
add x6,x12,x13
sub x20,x5,x6
addi x10,x20,0  //把想要返回的值赋值到x10(a0)中去
//pop
ld  x20,0(sp)   //还原x5,x6,x20,从栈中把原先存储进去的三个值再取出来
ld  x6,8(sp)
ld  x5,16(sp)
addi sp,sp,24   //还原指针
jalr x0,0(x1)   //回到调用该函数的位置

在上面的例子中,我们使用了一个暂时的寄存器并假设之前的值必须被保存的。为了避免保存一些根本不会再用的值,RISC-V将寄存器分为了两种

  • \(x_5-x_7 \text{ and } x_{28}-x_{31}(t_0-t_6)\): temporary register, by the callee not preserved
  • \(x_8-x_9\text{ and } x_{18}-x_{27}(s_0-s_{11})\): saved register, must be preserved if used

下面的例子是一个Nested procedure

Compiling a recursive procedure

C code for n!

int fact ( int n )
{
    if ( n < 1 ) return ( 1 ) ;
    else return ( n * fact ( n - 1 ) ) ;
}

RISC-V assembly code:

1   fact:   addi sp, sp, -16    # adjust stack for 2 items 开两个空间
2           sd ra, 8(sp)        # save the return address:x1 存放返回地址
3           sd a0, 0(sp)        # save the argument n: x10  存放参数n
4
5           addi t0, a0, -1     # x5 = n - 1    
6           bge t0, zero, L1    # if n >= 1, go to L1(else)
7           addi a0, zero, 1    # return 1 if n <1
8           addi sp, sp, 16     # Recover sp (Why not recover x1and x10 ?)
9           jalr zero, 0(ra)    # return to caller
10
11  L1:     addi a0, a0, -1     # n >= 1: argument gets ( n - 1 )
12          jal ra, fact        # call fact with ( n - 1)   跳转到fact语句并改变x0到PC+4
13          add t1, a0, zero    # move result of fact(n - 1) to x6(t1)
14          ld a0, 0(sp)        # return from jal: restore argument n   找回之前的那个n
15          ld ra, 8(sp)        # restore the return address    
16          add sp, sp, 16      # adjust stack pointer to pop 2 items
17          mul a0, a0, t1      # return n*fact ( n - 1 )
18          jalr zero, 0(ra)    # return to the caller
以计算2!为例跑一遍上述代码 设主函数返回的地址为(x1)A,(1,2)后栈中分别是【A,2】,经过(5,6)比较后跳转到(11)将a0改为1,(12)将x1改为了13的地址(B)并跳转到1,然后经过(1,2)后栈中分别是【A,2,B,1】,再经过(5,6,7)后a0改为1,再经过(8)栈中为【A,2】,经过(9)跳转回(13)因为此时x1为B。此时将上一次的返回值给到暂存的寄存器t1,然后经过(14,15)将a0改为之前的2,并修改x1为A,再经过(16),栈清空,经过(17)计算a0为2,经过(18)返回到A。

上面是书上的代码,我也可以这样写

fact:   #先存下返回地址以及参数
        addi sp,sp,-16
        sd x1,8(sp)
        sd x10,0(sp)
        addi x10,x10,-1
        ble x10,x0,done
        jal x1,fact
        #找回之前的n,此时x10为之前调用的返回值
        ld x5,0(sp)
        #计算新的返回值并给到x10
        mul x10,x10,x5
        #出栈并返回
        ld x1,8(sp)
        addi sp,sp,16
        jalr x0,0(x1)
done:   addi x10,x0,1
        jalr x0,0(x1)

写C语言转化为汇编最难的就是递归,主要可以被转化为

  • 进入新的一层递归先将返回地址(x1)以及参数(x10)先入栈(注意只有递归部分才需要入栈,叶子程序不需要)
  • 对于递归部分首先要将x10改为参数值,递归完成后将返回值x10保存下来待用
  • 经过一系列计算后,将返回值写入x10
  • 然后将相应的栈给推出

在进程调用过程中什么是被保护什么是不被保护的?

Preserved Not preserved
Saved registers: x8-x9, x18-x27 Temporary register: x5-x7,x28-x31
Stack pointer register: x2(sp) Argument/result registers: x10-x17
Frame pointer
Return address
Stack above the stack pointer Stack below thw stack pointer

2.7.3 Memory Layout

2.8 Communicating with People Charactor Data

2.8.1 Some character sets

Byte-encoded character sets

  • ASCII (American Standard Code for Information Interchange) 128 characters:95 graphic, 33 control

  • Latin-1: 256 characters 相比于ASCII, +96 more graphic characters

Unicode: 32-bit character set

  • Used in Java, C++ wide characters, …

  • Most of the world’s alphabets, plus symbols

  • UTF-8, UTF-16: variable-length encodings

2.8.2 RISC-V byte/halfword/word load/store

  • Load byte/halfword/word: Sign extend to 64 bits in rd
lb rd, offset(rs1)

lh rd, offset(rs1)

lw rd, offset(rs1)
  • Load byte/halfword/word unsigned: 0 extend to 64 bits in rd
lbu rd, offset(rs1)

lhu rd, offset(rs1)

lwu rd, offset(rs1)
  • Store byte/halfword/word: Store rightmost 8/16/32 bits
sb rs2, offset(rs1)

sh rs2, offset(rs1)

sw rs2, offset(rs1)

String Copy Example

C code

void strcpy ( char x[ ] , char y[ ] )
{ 
    size_t i ;
    i = 0 ;
    while ( ( x[ i ] = y[ i ] ) != ‘\ 0 ) /* copy and test byte */
    i += 1 ;
}
RISC-V assembly code:
strcpy: addi sp, sp, -8 # adjust stack for 1 doubleword
        sd x19, 0(sp) # save x19
        add x19, x0, x0 # i = 0 

    L1: add x5, x19, x11 # x5 = address of y[ i]
        lbu x6, 0(x5) # x6 = y [ i ]
        add x7, x19, x10 # x7 = address of x[ i ] 
        sb x6, 0(x7) # x[ i ] = y[ i ]
        beq x6, x0, L2 # if y[i] == 0 then exit
        addi x19, x19, 1 # i = i + 1
        jal x0, L1 # next iteration of loop

    L2: ld x19, 0(sp) # restore saved old s3
        addi sp, sp, 8 # pop 1 double word from stack
        jalr zero 0(x1) # return

2.9 Addressing for 32-bit Immediate and Addresses

2.9.1 Addressing for 32-bit Immediate

大多常数都不超过12位,但有些时候会取一些大数。

我们可以利用RISC-V的其中一条指令 Load upper immediate (lui)来设置一个32-bit constant

U-type

Loading a 32-bit constant

要注意的是addi计算的是有符号数,所以计算时先进行了sign extended然后再与lui后的结果相加,然后再加上\(2^{12}\)修正问题。

2.9.2 Branch Addressing

bne rs1 rs2 offset

对于汇编来说的offset与在机器码中的immediate呈现一个2倍关系,因为offset必然为偶数(尽管是4的倍数但是只省略1位的原因是RISC-V的发明者希望能兼容16位的系统)

Branch offset

C code

while(save[i]==k)
    i=i+1
RISC-V assembly code
Loop:   slli x10, x22, 3    # temp reg x10 = 8 * i
        add x10, x10, x25   # x10 = address of save[i]
        ld x9, 0(x10)       # temp reg x9 = save[i]
        bne x9, x24, Exit   # go to Exit if save[i] != k
        addi x22, x22, 1    # i = i + 1
        beq x0, x0, Loop    # go to Loop
Exit:
如果写数值的话第一个Exit为12,第二个Loop为-20

如果有条件跳转需要跳到更远的地方,那么可以将其转化到无条件跳转来处理

比如说:Given a branch: beq x10,x0,L1

可以把它改写为

    bne x10, x0, L2
    jal x0, L1
L2:

这样跳转的距离从12位增加到了20位,当然还可以结合lui升到32位。

2.9.3 Jump Addressing

jal可以跳转小范围,如果需要大范围跳转,可以先使用lui到temp register然后使用jalr加上immed即可。

2.9.4 RISC-V Addressing Summary

四种寻址方式:

2.10 RISC-V Disassembly

Decoding machine code

Summary of RISC-V instruction encoding

2.11 Synchronization in RISC-V

具体解释可参考20|RISC-V指令精讲(五):原子指令实现与调试_risc-v amoswap-CSDN博客

现在的计算机多为多处理器,但这些处理器都共用一片内存,这样就会产生竞争。我们使用指令对来解决这一个问题:

Load reserved: lr.w/d rd,(rs1)

  • Load from address in rs1 to rd
  • Place reservation on memory address

lr 指令是从内存地址 rs1 中加载内容到 rd 寄存器。然后在 rs1 对应地址上设置保留标记(reservation set)。其中 w/d 分别对应 32 位/64 位版本。

Store conditional: sc.w/d rd,rs2,(rs1)

  • Store from rs2 to address in rs1
  • Succeeds if location not changed since the lr.d, then returns 0 in rd
  • Fails if location is changed then returns non-zero value in rd

sc 指令在把 rs2 值写到 rs1 地址之前,会先判断 rs1 内存地址是否有设置保留标记,如果设置了,则把 rs2 值正常写入到 rs1 内存地址里,并把 rd 寄存器设置成 0,表示保存成功。如果 rs1 内存地址没有设置保留标记,则不保存,并把 rd 寄存器设置成 1 表示保存失败。不管成功还是失败,sc 指令都会把当前 hart 保留的所有保留标记全部清除。其中 w/d 分别对应 32 位/64 位版本。

atomic swap

again:  lr.d x10,(x20)
        sc.d x11,x23,(x20) // X11 = status
        bne x11,x0,again // branch if store failed
        addi x23,x10,0 // X23 = loaded value
目标是将寄存器x20中指定的内存位置上的值与x23之间的值进行交换,x10是中间寄存器,x11为状态寄存器

lock and unlock

Lock

        addi x12,x0,1 // copy locked value
again:  lr.d x10,(x20) // read lock
        bne x10,x0,again // check if it is 0 yet
        sc.d x11,x12,(x20) // attempt to store
        bne x11,x0,again // branch if fails
Unlock:
        sd x0,0(x20) // free lock
我们在(x20)中存储锁变量,其中0代表锁变量空闲,1代表锁变量被占用 我们从(x20)中读取锁变量x10,如果不等于0,说明这个锁变量还在被别人用,则回到读取操作继续读 如果等于0,那么就讲x12也就是1给入x20,表示这个锁变量已经被我占用了,(这个时候x11作为存储状态的变量) 怎么释放?就直接把0给到(x20)j

2.12 Translating and starting a program

2.12.1 Assembler

pseudoinstructions(伪指令): 汇编指令的变体,硬件不需要实现这些指令,是为了帮助程序转换与编程的。

比如说RISC-V汇编器可以识别

li  x9,123      //load immediate value 123 into register x9

汇编器会将此汇编语言转换成以下机器语言

addi x9,x0,123  //register x9 gets register x0+123

同样的,汇编器可以识别mv并将其转化

mv  x10,x11     //register x10 gets register x11
变为
addi x10,x11,0

汇编器还接受j label作为jal x0,Label替代,并且汇编器还允许大常量加载导寄存器中,尽管立即数指令位数有限。

最后,它可以通过确定程序员想要的指令变体来简化指令系统。例如,算术和逻辑指令使用常量时,RISC-V汇编器不要求程序员指定指令的立即数版本,它只是生成正确的操作码。从而将

and x9,x10,15

变为

andi x9,x10,15

汇编器最终将汇编语言转换成目标文件,是机器指令、数据和将指令正确放入内存所需信息的组合

UNIX系统的目标文件通常包含六个部分

  • Header: described contents of object module

  • Text segment: translated instructions

  • Static data segment: data allocated for the life of the program

  • Relocation info: for contents that depend on absolute location of loaded program

  • Symbol table: global definitions and external refs

  • Debug info: for associating with source code

2.12.2 Linker

Object modules(including library routine) → executable program

分为三个步骤

  • Place code and data modules symbolically in memory 将同类型的部分合并在一起
  • Determine the addresses of data and instruction labels 决定数据与指令标签的地址
  • Patch both the internal and external references (Address of invoke) 修正内部和外部引用

合并A、B目标文件

2.12.3 Loader

现在可执行文件在磁盘上,操作系统将其读取到内存并启动它。加载器在UNIX系统中遵循以下步骤:

1.读取可执行文件首部以确定正文段和数据段的大小。

2.为正文和数据创建足够大的地址空间。

3.将可执行文件中的指令和数据复制到内存中。

4.将主程序的参数(如果有)复制到栈顶。

5.初始化处理器寄存器并将栈指针指向第一个空闲位置。

6.跳转到启动例程,将参数复制到参数寄存器中并调用程序的主例程。当主例程返回时,启动例程通过exit系统调用终止程序。

2.12.4 Dynamic Linking

Only link/load library procedure when it is called

在DLL最初版本中,加载器运行一个动态链接程序,使用文件中的额外信息来查找相应的库并更新所有外部应用。但是它仍然链接可能调用的库的所有历程,而不是程序运行期间的那些例程。

现在介绍DLL的延迟过程链接版本(Lazy linkage) 其中每个例程仅在被调用后才链接。

第一次调用库例程时,程序调用虚入口并执行间接跳转。这个跳转指向一段代码,它将一个数字ID放入寄存器来识别所需的库例程,然后跳转到动态链接器/加载器。链接器/加载器找到所需的例程,重新映射它,并更改间接跳转位置中的地址以指向该例程。然后跳转到这个例程。例程完成后,它将返回到初始调用点。此后,它都会间接跳转到该例程而不需额外的中间过程。

总之,DLL需要额外的空间来存储动态链接所需的信息,但不要求复制或链接整个库。它们在第一次调用例程时会付出大量的开销,但此后只需一个间接跳转。请注意,从库返回不会产生额外的开销。

2.12.5 Starting Java Applications

就是java开发出来的目标是适用于任何计算机。所以java增加了一个JVM(Java virtual Machine) 的东西。编译器先将java编译成易于解释的指令:Java字节码 (Java Bytecode) 指令系统,编译速度比较快,然后通过JVM将Java字节码再进行转换。

优点是可移植性,缺点是速度慢

2.13 Arrays versus Pointers

Array indexing involves

  • Multiplying index by element size 计算偏移量

  • Adding to array base address 添加到基地址上

Pointers correspond directly to memory addresses

  • Can avoid indexing complexity

2.14* Real Stuff

2.14.1 MIPS Instructions

MIPS: commercial predecessor to RISC-V

Similar basic set of instructions

  • 32-bit instructions

  • 32 general purpose registers, register 0 is always 0

  • 32 floating-point registers

  • Memory accessed only by load/store instructions, consistent use of addressing modes for all data sizes

Different conditional branches for <, <=, >, >=

  • RISC-V: blt, bge, bltu, bgeu

  • MIPS: slt, sltu (set less than, result is 0 or 1), Then use beq, bne to complete the branch

2.14.2 The Intel x86 ISA

这是一种CISC的指令集类型,尽可能减少指令的数量,增加不同的指令。

x86寄存器

x86算术和逻辑指令中必须有一个操作数既作为源操作数,又作为目的操作数;RISC-V和MIPS允许源和目的操作数的寄存器分开。x86的这种限制对有限寄存器带来了更大的压力,因为必须修改一个源寄存器。第二个重要的区别在于其中一个操作数可以在存储器中。因此,与RISC-V和MIPS不同,实际上任何指令都可能有一个操作数在存储器中。

同时x86的指令解码也比较复杂,指令长度不一,从很短到很长,解码比较困难。

2.14.3 Other RISC-V Instructions

Base integer instructions (RV64I)

  • Those previously described, plus
  • auipc rd, immed // rd = (imm<<12) + pc, follow by jalr (adds 12-bit immed) for long jump,与lui很像
  • slt, sltu, slti, sltui: set less than (like MIPS)
  • addw, subw, addiw: 32-bit add/sub
  • sllw, srlw, srlw, slliw, srliw, sraiw: 32-bit shift

32-bit variant: RV32I

  • registers are 32-bits wide, 32-bit operations

Instruction Set Extensions (指令集的扩张)

  • M: integer multiply, divide, remainder

  • A: atomic memory operations

  • F: single-precision floating point

  • D: double-precision floating point

  • C: compressed instructions: 16-bit encoding for frequently used instructions

2.15 A C Sort Example To Put it All Together

2.15.1 Procedure swap

C code

void swap ( long long v[ ] , size_t k )
{
    long long temp ;
    temp = v[ k ] ;
    v[ k ] = v[ k + 1 ] ;
    v[ k + 1 ] = temp ;
}

Register allocation for swap

v ---- x10 k ---- x11 temp ---- x5

swap is a leaf procedure, nothing to preserve

RISC-V code for the procedure swap

swap:   slli x6, x11, 3 // x6 = k * 8 
        add x6, x10, x6 // x6 = v + ( k * 8 )
        ld x5, 0(x6) // x5 ← v[ k ]
        ld x7, 8(x6) // x7 ← v[ k + 1 ] 
        sd x7, 0(x6) // v[k+1] → v[ k ]
        sd x5, 8(x6) // v[k] → v[ k + 1 ]
        jalr x0, 0(x1) // return to calling routine

2.15.2 Procedure sort

C code

void sort (long long v[ ] , size_t n )
{
    size_t i , j ;
    for ( i = 0 ; i < n ; i + = 1 ) 
    {
        for ( j = i - 1 ; j >= 0 && v[j] > v[j+1] ; j -= 1 )
            swap ( v , j ) ;
    }
}

Register allocation for sort

v ---- x10 n ---- x11 i ---- x19 j ---- x20

Passing parameters in sort

Preserving registers in sort

x1, x19, x20, x21, x22

The Outer Loop

li x19,0 // i = 0
for1tst:
    bge x19,x11,exit1 // go to exit1 if x19 ≥ x11 (i≥n)

    (body of outer for-loop)

    addi x19,x19,1 // i += 1
    j for1tst // branch to test of outer loop
exit1:

The Inner Loop

addi x20,x19,-1 // j = i −1
for2tst: 
    blt x20,x0,exit2 // go to exit2 if X20 < 0 (j < 0)
    slli x5,x20,3 // reg x5 = j * 8
    add x5,x10,x5 // reg x5 = v + (j * 8)
    ld x6,0(x5) // reg x6 = v[j]
    ld x7,8(x5) // reg x7 = v[j + 1]
    ble x6,x7,exit2 // go to exit2 if x6 ≤ x7
    mv x10, x21 // first swap parameter is v
    mv x11, x20 // second swap parameter is j
    jal x1,swap // call swap
    addi x20,x20,-1 // j –= 1
    j for2tst // branch to test of inner loop
exit2:

2.15.3 The Full Procedure

评论区

对你有帮助的话请给我个赞和 star => GitHub stars
欢迎跟我探讨!!!