2 Instructions: Language of the Machine¶
The process of compiling
- High-level programming language
用更自然的语言叙述,编译器将它翻译成汇编语言。与汇编语言相比,高级编程语言对于硬件来说更加独立。
- Assembly language
Assembler 将它翻译为机器指令
- Machine language
2.1 Introduction¶
Language of the machine
- Instructions \(\rightarrow\) Word
- Instruction set \(\rightarrow\) Vocabulary
我们选择的指令集是RISC-V
Instruction Set
CISC vs. RISC
CISC | RISC |
---|---|
Emphasis on hardware(>200 instructions) | Emphasis on software(<100 instructions) |
Multiple instruction sizes and formats | Instructions of same size with few formats |
Less registers | Uses more registers |
More addressing modes | Fewer addressing modes(Load/Store) |
Extensive use of microprogramming | Complexity in compiler |
Instructions take a varying amount of cycle time | Instructions take one cycle time |
Pipelining is difficult | Pipelining is easy |
Instruction formats
一条指令由Operators与Operands构成
Stored-program concept
- Instruction are represented as numbers.
- Programs can be stored in memory to be read or written just like numbers.
2.2 Arithmetic Operations¶
Each line of this language can contain at most one instruction.
Add and subtract, three operands, like add a,b,c // a gets b+c
All arithmetic operations have this form, 固定了运算数的数量,对硬件的实现更加容易。
Design Principle 1: Simplicity favors regularity
Regularity makes implementation simpler Simplicity enables higher performance at lower cost
2.3 Operands¶
Arithmetic instructions use register operands
RISC-V has a \(32\times 64\) -bit register file
- Use for frequently accessed data
- 64-bit data is called a “double word”
- 32-bit data is called a ”word”
\(32\times 64\)-\(bit\) :32个寄存器,每个寄存器是64位的
Design Principle 2: Smaller is faster
A very large number of registers may increase the clock cycle time simply because it takes electronic singnals longer when they must travel farther, 但是这话也并不绝对,31位寄存器就不见得比32位的快,我们需要在craving of programs for more registers 和the designer's desire to keep the clock fast 之间找到平衡。
2.1 RISC-V Registers¶
2.2 Memory Operands¶
处理器只能保留一小部分数据在寄存器中,大部分数据结构被存储在了内存当中。而Arithmetic operations occur on register in RISC-V instructions,所以RISC-V需要有相关指令来在内存和寄存器之间传输数据,这些指令被称为data transfer instruction。
- Main memory used for composite data, such as Arrays, structures, dynamic data
- Memory is byte addressed, each address identifies an 8-bit byte
- RISC-V is Little Endian. It means least-significant byte at least address of a word
- RISC-V does not require words to be aligned in memory
Endianness/byte order
Big end: leftmost.典型例子有PowerPC,01 02就是513(后面的位置更大)
Little end: Rightmost.典型例子有RISC-V,01 02就是258
Bi-endian 有MIPS,ARM,Alpha,SPARC
Memory Operand Example
C code: A[12]=h+A[8];
Compiled RISC-V code:
其中h在x21中,base address of A 存在x22中-
offset:偏移量,the constant in a data transfer instruction
-
base register: the register addded to form address,在这里就是x22
-
ld: standing for load doubleword
-
sd: standing for store doubleword
Register vs. Memory
- Registers are faster to access than memory
- Operating on memory data requires loads and stores with more instructions to be executed
- Compiler must use registers for variables as much as possible, only spill to memory for less frequently used variables
2.3 Constant or Immediate Operands¶
Many time a program will use a constant in an operation
- Incrementing index to point to next element of array
- Add the constant 4 to register x9
- Assuming
AddrConstants4
is address pointer of constant 4
x3+ AddrConstants4
是常数4在内存上的地址
另一个不用load指令的方法是引入一个新的arithmetic instruction专门处理有运算数为常数的。这个新的arithmetic instruction 被称为add immediate or addi, 于是相应的代码就可以这样写。
addi 是几乎所有RISC-V程序中最为常见的指令。
常数0有另一个作用,它可以简化指令集。比如说要对一个值进行取反可以直接使用0作为sub运算的第一个运算符。所以RISC-V将寄存器x0设定值为0。这体现了Design Principle 3: Make common case fast
Summary¶
2.4 Representing Instructions¶
2.4.1 RISC-V R-Format Instructions¶
- opcode: operation code
- rd: destination register number
- funct3: 3-bit function code (additonal opcode)
- rs1: the first source register number
- rs2: the second source register number
- funct7: 7-bit function code (additonal opcode)
2.4.2 RISC-V I-Format Instructions¶
-
rs1: source or base address register number
-
immediate: constant operand, or offset added to base address (2s-complement, sign extended)
这里就可以解答前面R型指令为什么要把rs2放在rs1前面,因为这样可以保证不同指令结构相似,只需要将R型指令的funct7和rs2读成一个immediate.
这个指令同时可以完成ld指令,只需要把rs1设为基地址和immediate设为偏移量,rd还是目标寄存器
I-Format Instruction
ld x9, 64(x22)
-
22 (x22) is placed in rs1;
-
64 is placed in immediate
-
9 (x9) is placed in rd
2.4.3 RISC-V S-Format Instructions¶
-
rs1: base address register number
-
rs2: source operand register number
-
immediate: offset added to base address (Split so that rs1 and rs2 fields always in the same place)
特意设置一种S-Format是因为S型和I型数据的流动方向并不一样,I型是rs1+immediate到rd,S型是rs2到rs1+immediate。
2.4.4 From C to Machine Instruction¶
C code
RISC-V assembly code
ld x9, 240(x10) // temporary reg x9 gets A[30]
add x9, x21, x9 // temporary reg x9 gets h + A[30]
addi x9, x9, 1 // temporary reg x9 gets h + A[30] + 1
sd x9, 240(x10) // stores h + A[30] + 1 back into A[30]
RISC-V machine language code
ld | immediate | rs1 | funct3 | rd | opcode |
240 10 3 9 3
add | funct7 | rs2 | rs1 | funct3 | rd | opcode |
0 9 21 0 9 51
addi | immediate | rs1 | funct3 | rd | opcode |
1 9 0 9 19
sd | im[11:5] | rs2 | rs1 | funct3 | im[4:0] | opcode |
7 9 10 3 16 35
Summary¶
2.5 Logical Operations¶
2.5.1 Shift Operations (I)¶
immed: how many positions to shift
Shift left logical: Shift left and fill with 0 bits, slli
by i bits multiplies by \(2^i\)
Shift right logical: Shift right and fill with 0 bits, srli
by i bits divides by \(2^i\) (unsigned only)
2.5.2 AND Operations¶
实现按位与操作
可以用来保护某些位,并将其他位清空。
2.5.3 OR Operations¶
实现按位或的操作
可以用来对一个数的某些位进行置1操作,并保持其他位置上是数不变
2.5.4 XOR Operations¶
实现按位异或的操作
可以用来标记两个数不同的位置
Summary¶
2.6 Instructions for making decisions¶
Branch Instructions: branch to a labled instruction if a condition is true, otherwise, continue sequentially.
- beq (Branch equal)
- bne (Branch not equal)
- blt (Branch less than)
- bge (Branch greater equal)
处理有符号数比较的:blt,bge
处理无符号数比较的:bltu,bgeu
跳转指令介绍见跳转指令
2.6.1 if 语句¶
C code
RISC-V assembly code
beq x21, x22, L1 # go to L1 if i equals j
add x19, x20, x21 # f = g + h ( skipped if i equals j )
L1: sub x19, x19, x22 # f = f - i ( always executed )
2.6.2 if-then-else 语句¶
C code
RISC-V assembly code
bne x22, x23, Else # go to Else if i != j
add x19, x20, x21 # f = g + h ( Executed if i = = j
if)
beq x0, x0, EXIT # go to Exit if语句执行完直接跳转否则会按顺序执行之后的语句
Else: sub x19, x20, x21 # f = g - h ( Executed if i ≠ j
else)
Exit: # the first instruction of the next C
2.6.3 While 语句¶
C code
RISC-V assembly
Loop: slli x10, x22, 3 # temp reg $t1 = 8 * i
add x10, x10, x25 # x10 = address of save[i]
ld x9, 0(x10) # x9 gets save[i] 直接把地址计算出来了
bne x9, x24, Exit # go to Exit if save[i] != k
addi x22, x22, 1 # i += 1
beq x0, x0, Loop # go to Loop
Exit:
2.6.4 Switch/Case 语句¶
C code
switch ( k ) {
case 0 : f = i + j ; break ; /* k = 0 */
case 1 : f = g + h ; break ; /* k = 1 */
case 2 : f = g - h ; break ; /* k = 2 */
case 3 : f = i - j ; break ; /* k = 3 */
}
Basic Blocks (基本单元)
- No embedded branches (except at end)
- No branch targets (except at beginning)
RISC-V assembly
创建了一个跳转表Jumptable,就不用管每一个单位的程序的长度了
2.7 Supporting Procedures¶
procedure/function be used to strcture programs
- Place Parameters in a place where the procedure can access them 把参数放在一些进程能访问的地方
- Transfer control to the procedure:jump to 跳转到相应进程
- Acquire the storage resources needed for the procedure
- Perform the desired task
- Place the result value in a place where the calling program can access it
- Return control to the point of origin
2.7.1 Procedure Call Instructions¶
Instruction for procedures: jal (jump-and-link)
-
Address of following instruction put in x1 把返回后需要执行的指令放在x1,就是\(PC+4\rightarrow ra\)
-
Jumps to target address, 并跳转到子进程的地址
Procedure return: jalr(jump and link register)
-
Like jal, but jumps to 0 + address in x1
-
Use x0 as rd (x0 cannot be changed)
-
Can also be used for computed jumps (比如说case/switch statements)
2.7.2 Using More Registers¶
\(a_0\)~\(a_7\) (\(x10-x17\)):八个参数寄存器来传参以及返回值(x10)的
\(ra/x1\):储存返回地址的寄存器
但是如果编译器需要超过八个的寄存器,理想的用来处理溢出寄存器的数据结构是堆栈,在RISC-V中Stack pointer被存放在x2中,也被称为sp。同时Stack grow from higher address to lower address。
Compling a leaf procedure
C code:
long long int leaf_example (long long int g, long long int h,long long int i, long long int j)
{
long long int f;
f = (g + h) - (i + j);
return f;
}
addi sp,sp,-24 //在栈中为这三个项添加三个位置
//push
sd x5,16(sp) //把x5,x6,x20d分别存储在栈中的三个位置
sd x6,8(sp)
sd x20,0(sp)
//calculate
add x5,x10,x11
add x6,x12,x13
sub x20,x5,x6
addi x10,x20,0 //把想要返回的值赋值到x10(a0)中去
//pop
ld x20,0(sp) //还原x5,x6,x20,从栈中把原先存储进去的三个值再取出来
ld x6,8(sp)
ld x5,16(sp)
addi sp,sp,24 //还原指针
jalr x0,0(x1) //回到调用该函数的位置
在上面的例子中,我们使用了一个暂时的寄存器并假设之前的值必须被保存的。为了避免保存一些根本不会再用的值,RISC-V将寄存器分为了两种
- \(x_5-x_7 \text{ and } x_{28}-x_{31}(t_0-t_6)\): temporary register, by the callee not preserved
- \(x_8-x_9\text{ and } x_{18}-x_{27}(s_0-s_{11})\): saved register, must be preserved if used
下面的例子是一个Nested procedure
Compiling a recursive procedure
C code for n!
RISC-V assembly code:
1 fact: addi sp, sp, -16 # adjust stack for 2 items 开两个空间
2 sd ra, 8(sp) # save the return address:x1 存放返回地址
3 sd a0, 0(sp) # save the argument n: x10 存放参数n
4
5 addi t0, a0, -1 # x5 = n - 1
6 bge t0, zero, L1 # if n >= 1, go to L1(else)
7 addi a0, zero, 1 # return 1 if n <1
8 addi sp, sp, 16 # Recover sp (Why not recover x1and x10 ?)
9 jalr zero, 0(ra) # return to caller
10
11 L1: addi a0, a0, -1 # n >= 1: argument gets ( n - 1 )
12 jal ra, fact # call fact with ( n - 1) 跳转到fact语句并改变x0到PC+4
13 add t1, a0, zero # move result of fact(n - 1) to x6(t1)
14 ld a0, 0(sp) # return from jal: restore argument n 找回之前的那个n
15 ld ra, 8(sp) # restore the return address
16 add sp, sp, 16 # adjust stack pointer to pop 2 items
17 mul a0, a0, t1 # return n*fact ( n - 1 )
18 jalr zero, 0(ra) # return to the caller
上面是书上的代码,我也可以这样写
写C语言转化为汇编最难的就是递归,主要可以被转化为
- 进入新的一层递归先将返回地址(x1)以及参数(x10)先入栈(注意只有递归部分才需要入栈,叶子程序不需要)
- 对于递归部分首先要将x10改为参数值,递归完成后将返回值x10保存下来待用
- 经过一系列计算后,将返回值写入x10
- 然后将相应的栈给推出
在进程调用过程中什么是被保护什么是不被保护的?
Preserved | Not preserved |
---|---|
Saved registers: x8-x9, x18-x27 | Temporary register: x5-x7,x28-x31 |
Stack pointer register: x2(sp) | Argument/result registers: x10-x17 |
Frame pointer | |
Return address | |
Stack above the stack pointer | Stack below thw stack pointer |
2.7.3 Memory Layout¶
2.8 Communicating with People Charactor Data¶
2.8.1 Some character sets¶
Byte-encoded character sets
-
ASCII (American Standard Code for Information Interchange) 128 characters:95 graphic, 33 control
-
Latin-1: 256 characters 相比于ASCII, +96 more graphic characters
Unicode: 32-bit character set
-
Used in Java, C++ wide characters, …
-
Most of the world’s alphabets, plus symbols
-
UTF-8, UTF-16: variable-length encodings
2.8.2 RISC-V byte/halfword/word load/store¶
- Load byte/halfword/word: Sign extend to 64 bits in rd
- Load byte/halfword/word unsigned: 0 extend to 64 bits in rd
- Store byte/halfword/word: Store rightmost 8/16/32 bits
String Copy Example
C code
void strcpy ( char x[ ] , char y[ ] )
{
size_t i ;
i = 0 ;
while ( ( x[ i ] = y[ i ] ) != ‘\ 0’ ) /* copy and test byte */
i += 1 ;
}
strcpy: addi sp, sp, -8 # adjust stack for 1 doubleword
sd x19, 0(sp) # save x19
add x19, x0, x0 # i = 0
L1: add x5, x19, x11 # x5 = address of y[ i]
lbu x6, 0(x5) # x6 = y [ i ]
add x7, x19, x10 # x7 = address of x[ i ]
sb x6, 0(x7) # x[ i ] = y[ i ]
beq x6, x0, L2 # if y[i] == 0 then exit
addi x19, x19, 1 # i = i + 1
jal x0, L1 # next iteration of loop
L2: ld x19, 0(sp) # restore saved old s3
addi sp, sp, 8 # pop 1 double word from stack
jalr zero 0(x1) # return
2.9 Addressing for 32-bit Immediate and Addresses¶
2.9.1 Addressing for 32-bit Immediate¶
大多常数都不超过12位,但有些时候会取一些大数。
我们可以利用RISC-V的其中一条指令 Load upper immediate (lui)来设置一个32-bit constant
U-type
Loading a 32-bit constant
要注意的是addi计算的是有符号数,所以计算时先进行了sign extended然后再与lui后的结果相加,然后再加上\(2^{12}\)修正问题。
2.9.2 Branch Addressing¶
对于汇编来说的offset与在机器码中的immediate呈现一个2倍关系,因为offset必然为偶数(尽管是4的倍数但是只省略1位的原因是RISC-V的发明者希望能兼容16位的系统)
Branch offset
C code
RISC-V assembly codeLoop: slli x10, x22, 3 # temp reg x10 = 8 * i
add x10, x10, x25 # x10 = address of save[i]
ld x9, 0(x10) # temp reg x9 = save[i]
bne x9, x24, Exit # go to Exit if save[i] != k
addi x22, x22, 1 # i = i + 1
beq x0, x0, Loop # go to Loop
Exit:

如果有条件跳转需要跳到更远的地方,那么可以将其转化到无条件跳转来处理
比如说:Given a branch: beq x10,x0,L1
可以把它改写为
这样跳转的距离从12位增加到了20位,当然还可以结合lui升到32位。
2.9.3 Jump Addressing¶
jal可以跳转小范围,如果需要大范围跳转,可以先使用lui到temp register然后使用jalr加上immed即可。
2.9.4 RISC-V Addressing Summary¶
四种寻址方式:
2.10 RISC-V Disassembly¶
Summary of RISC-V instruction encoding¶
2.11 Synchronization in RISC-V¶
具体解释可参考20|RISC-V指令精讲(五):原子指令实现与调试_risc-v amoswap-CSDN博客
现在的计算机多为多处理器,但这些处理器都共用一片内存,这样就会产生竞争。我们使用指令对来解决这一个问题:
Load reserved: lr.w/d rd,(rs1)
- Load from address in rs1 to rd
- Place reservation on memory address
lr 指令是从内存地址 rs1 中加载内容到 rd 寄存器。然后在 rs1 对应地址上设置保留标记(reservation set)。其中 w/d 分别对应 32 位/64 位版本。
Store conditional: sc.w/d rd,rs2,(rs1)
- Store from rs2 to address in rs1
- Succeeds if location not changed since the lr.d, then returns 0 in rd
- Fails if location is changed then returns non-zero value in rd
sc 指令在把 rs2 值写到 rs1 地址之前,会先判断 rs1 内存地址是否有设置保留标记,如果设置了,则把 rs2 值正常写入到 rs1 内存地址里,并把 rd 寄存器设置成 0,表示保存成功。如果 rs1 内存地址没有设置保留标记,则不保存,并把 rd 寄存器设置成 1 表示保存失败。不管成功还是失败,sc 指令都会把当前 hart 保留的所有保留标记全部清除。其中 w/d 分别对应 32 位/64 位版本。
atomic swap
目标是将寄存器x20中指定的内存位置上的值与x23之间的值进行交换,x10是中间寄存器,x11为状态寄存器lock and unlock
Lock
Unlock: 我们在(x20)中存储锁变量,其中0代表锁变量空闲,1代表锁变量被占用 我们从(x20)中读取锁变量x10,如果不等于0,说明这个锁变量还在被别人用,则回到读取操作继续读 如果等于0,那么就讲x12也就是1给入x20,表示这个锁变量已经被我占用了,(这个时候x11作为存储状态的变量) 怎么释放?就直接把0给到(x20)j2.12 Translating and starting a program¶
2.12.1 Assembler¶
pseudoinstructions(伪指令): 汇编指令的变体,硬件不需要实现这些指令,是为了帮助程序转换与编程的。
比如说RISC-V汇编器可以识别
汇编器会将此汇编语言转换成以下机器语言
同样的,汇编器可以识别mv
并将其转化
汇编器还接受j label
作为jal x0,Label
替代,并且汇编器还允许大常量加载导寄存器中,尽管立即数指令位数有限。
最后,它可以通过确定程序员想要的指令变体来简化指令系统。例如,算术和逻辑指令使用常量时,RISC-V汇编器不要求程序员指定指令的立即数版本,它只是生成正确的操作码。从而将
变为
汇编器最终将汇编语言转换成目标文件,是机器指令、数据和将指令正确放入内存所需信息的组合
UNIX系统的目标文件通常包含六个部分
-
Header: described contents of object module
-
Text segment: translated instructions
-
Static data segment: data allocated for the life of the program
-
Relocation info: for contents that depend on absolute location of loaded program
-
Symbol table: global definitions and external refs
-
Debug info: for associating with source code
2.12.2 Linker¶
Object modules(including library routine) → executable program
分为三个步骤
- Place code and data modules symbolically in memory 将同类型的部分合并在一起
- Determine the addresses of data and instruction labels 决定数据与指令标签的地址
- Patch both the internal and external references (Address of invoke) 修正内部和外部引用
2.12.3 Loader¶
现在可执行文件在磁盘上,操作系统将其读取到内存并启动它。加载器在UNIX系统中遵循以下步骤:
1.读取可执行文件首部以确定正文段和数据段的大小。
2.为正文和数据创建足够大的地址空间。
3.将可执行文件中的指令和数据复制到内存中。
4.将主程序的参数(如果有)复制到栈顶。
5.初始化处理器寄存器并将栈指针指向第一个空闲位置。
6.跳转到启动例程,将参数复制到参数寄存器中并调用程序的主例程。当主例程返回时,启动例程通过exit系统调用终止程序。
2.12.4 Dynamic Linking¶
Only link/load library procedure when it is called
在DLL最初版本中,加载器运行一个动态链接程序,使用文件中的额外信息来查找相应的库并更新所有外部应用。但是它仍然链接可能调用的库的所有历程,而不是程序运行期间的那些例程。
现在介绍DLL的延迟过程链接版本(Lazy linkage) 其中每个例程仅在被调用后才链接。
第一次调用库例程时,程序调用虚入口并执行间接跳转。这个跳转指向一段代码,它将一个数字ID放入寄存器来识别所需的库例程,然后跳转到动态链接器/加载器。链接器/加载器找到所需的例程,重新映射它,并更改间接跳转位置中的地址以指向该例程。然后跳转到这个例程。例程完成后,它将返回到初始调用点。此后,它都会间接跳转到该例程而不需额外的中间过程。
总之,DLL需要额外的空间来存储动态链接所需的信息,但不要求复制或链接整个库。它们在第一次调用例程时会付出大量的开销,但此后只需一个间接跳转。请注意,从库返回不会产生额外的开销。
2.12.5 Starting Java Applications¶
就是java开发出来的目标是适用于任何计算机。所以java增加了一个JVM(Java virtual Machine) 的东西。编译器先将java编译成易于解释的指令:Java字节码 (Java Bytecode) 指令系统,编译速度比较快,然后通过JVM将Java字节码再进行转换。
优点是可移植性,缺点是速度慢
2.13 Arrays versus Pointers¶
Array indexing involves
-
Multiplying index by element size 计算偏移量
-
Adding to array base address 添加到基地址上
Pointers correspond directly to memory addresses
- Can avoid indexing complexity
2.14* Real Stuff¶
2.14.1 MIPS Instructions¶
MIPS: commercial predecessor to RISC-V
Similar basic set of instructions
-
32-bit instructions
-
32 general purpose registers, register 0 is always 0
-
32 floating-point registers
-
Memory accessed only by load/store instructions, consistent use of addressing modes for all data sizes
Different conditional branches for <, <=, >, >=
-
RISC-V: blt, bge, bltu, bgeu
-
MIPS: slt, sltu (set less than, result is 0 or 1), Then use beq, bne to complete the branch
2.14.2 The Intel x86 ISA¶
这是一种CISC的指令集类型,尽可能减少指令的数量,增加不同的指令。
x86寄存器
x86算术和逻辑指令中必须有一个操作数既作为源操作数,又作为目的操作数;RISC-V和MIPS允许源和目的操作数的寄存器分开。x86的这种限制对有限寄存器带来了更大的压力,因为必须修改一个源寄存器。第二个重要的区别在于其中一个操作数可以在存储器中。因此,与RISC-V和MIPS不同,实际上任何指令都可能有一个操作数在存储器中。
同时x86的指令解码也比较复杂,指令长度不一,从很短到很长,解码比较困难。
2.14.3 Other RISC-V Instructions¶
Base integer instructions (RV64I)
- Those previously described, plus
auipc rd, immed // rd = (imm<<12) + pc
, follow by jalr (adds 12-bit immed) for long jump,与lui很像slt, sltu, slti, sltui
: set less than (like MIPS)addw, subw, addiw
: 32-bit add/subsllw, srlw, srlw, slliw
, srliw, sraiw: 32-bit shift
32-bit variant: RV32I
- registers are 32-bits wide, 32-bit operations
Instruction Set Extensions (指令集的扩张)
-
M: integer multiply, divide, remainder
-
A: atomic memory operations
-
F: single-precision floating point
-
D: double-precision floating point
-
C: compressed instructions: 16-bit encoding for frequently used instructions
2.15 A C Sort Example To Put it All Together¶
2.15.1 Procedure swap¶
C code
void swap ( long long v[ ] , size_t k )
{
long long temp ;
temp = v[ k ] ;
v[ k ] = v[ k + 1 ] ;
v[ k + 1 ] = temp ;
}
Register allocation for swap
swap is a leaf procedure, nothing to preserve
RISC-V code for the procedure swap
swap: slli x6, x11, 3 // x6 = k * 8
add x6, x10, x6 // x6 = v + ( k * 8 )
ld x5, 0(x6) // x5 ← v[ k ]
ld x7, 8(x6) // x7 ← v[ k + 1 ]
sd x7, 0(x6) // v[k+1] → v[ k ]
sd x5, 8(x6) // v[k] → v[ k + 1 ]
jalr x0, 0(x1) // return to calling routine
2.15.2 Procedure sort¶
C code
void sort (long long v[ ] , size_t n )
{
size_t i , j ;
for ( i = 0 ; i < n ; i + = 1 )
{
for ( j = i - 1 ; j >= 0 && v[j] > v[j+1] ; j -= 1 )
swap ( v , j ) ;
}
}
Register allocation for sort
Passing parameters in sort
Preserving registers in sort
x1, x19, x20, x21, x22
The Outer Loop¶
li x19,0 // i = 0
for1tst:
bge x19,x11,exit1 // go to exit1 if x19 ≥ x11 (i≥n)
(body of outer for-loop)
addi x19,x19,1 // i += 1
j for1tst // branch to test of outer loop
exit1:
The Inner Loop¶
addi x20,x19,-1 // j = i −1
for2tst:
blt x20,x0,exit2 // go to exit2 if X20 < 0 (j < 0)
slli x5,x20,3 // reg x5 = j * 8
add x5,x10,x5 // reg x5 = v + (j * 8)
ld x6,0(x5) // reg x6 = v[j]
ld x7,8(x5) // reg x7 = v[j + 1]
ble x6,x7,exit2 // go to exit2 if x6 ≤ x7
mv x10, x21 // first swap parameter is v
mv x11, x20 // second swap parameter is j
jal x1,swap // call swap
addi x20,x20,-1 // j –= 1
j for2tst // branch to test of inner loop
exit2: