My notes:
RISC-V is a specification of an instruction set architecture (ISA) for 32-bit, 64-bit, and 128-bit microprocessors. RISC-V is an open ISA that allows everyone to build processors conforming to RISC-V without license fees.
Volume I: User-Level ISA contains general information about RISC-V, base instruction sets for 32, 64, and 128 integer architectures, standard extensions of the base instruction sets, and conventions.
Volume II: Privileged Architecture covers information required for programming operating systems or bare metal embedded systems.
Assembly language is the human-readable and writable representation of machine code. It is a hardware-/processor-dependent language.
In general, a processor has a control unit, an arithmetic logic unit, registers, and signal/data lines (bus) for input and output, e.g., to access volatile memory. The control unit has the task of encoding instruction and controlling the program flow. The computer program is located in memory. A special register, the program counter, holds the current instruction location to carry out. An address is used for accessing a concrete storage unit - usually in the size of bytes or multiple of bytes.
A typical RISC processor performs the classic five-stage RISC pipeline:
- instruction fetch (IF)
- instruction decode (ID)
- instruction execute (EX)
- memory access (MEM)
- write back (WB)
Using pipelining, parallel execution of the stages can be achieved.
An assembler or cross assembler for the target architecture translates the source code in an object file. The linker takes the object file and a linker script that specifies how the segments given in the object file should be put together in memory for execution. The result is an executable file.
Ripes is a simulator for illustrating machine code execution on RV32IMC and RV64IMC architectures.
Qemu is a machine emulator which allows you to emulate a full-system or a single program.
Install qemu in Debian:
$ sudo apt install qemu-system-misc qemu-user-static binfmt-support opensbi u-boot-qemu
Install the crosscompiler toolchain:
sudo apt install gcc-riscv64-linux-gnu
Go to Debian Quick Image Baker pre-baked images and download the image for riscv64-virt.
Rename the downloaded file to riscv.qcow2.
Emulate:
$ qemu-system-riscv64 -machine virt -cpu rv64 -m 1G -device virtio-blk-device,drive=hd -drive file=riscv.qcow2,if=none,id=hd -device virtio-net-device,netdev=net -netdev user,id=net,hostfwd=tcp::2222-:22 -bios /usr/lib/riscv64-linux-gnu/opensbi/generic/fw_jump.elf -kernel /usr/lib/u-boot/qemu-riscv64_smode/uboot.elf -object rng-random,filename=/dev/urandom,id=rng -device virtio-rng-device,rng=rng -nographic -append "root=LABEL=rootfs console=ttyS0"
This command is failing. I’m using Ubuntu instead: RISC-V cheat sheet
Install the debugger:
$ sudo apt install gdb-multiarch
Test creating the assembler file example.s with this contents:
.text
.globl _start
_start:
addi x10, x0, 7
addi x17, x0, 93
ecall
Assemble:
$ riscv64-linux-gnu-as -o example.o example.s
Link:
$ riscv64-linux-gnu-ld -o example example.o
Execute:
qemu-riscv64-static example
Check. In bash:
$ echo $?
In fish:
$ echo $status
You should get the 7 as the result.
Disassemble the binary:
$ riscv64-linux-gnu-objdump --full-contents --disassemble example
Debug:
$ qemu-riscv64-static -g 1234 example &
$ gdb-multiarch example
(gdb) target remote :1234:
(gdb) display /3i $pc
The command display /3i $pc shows the next three instructions, the command si (for step instruction) steps one instruction and continue continues the program being debugged. Type q to quit the debugger.
The RISC-V unprivileged ISA describes:
- RV32I (32-bit integer)
- RV32E (32-bit embedded)
- RV64I (64-bit integer)
- RV128I (128-bit integer)
The following extensions are common:
- M: integer multiplication and division
- A: atomic instructions
- F: single-precision floating point
- D: double-precision floating point
- Q: quad-precision floating point
- C: compressed instructions
- V: vector operations
The base ISAs specify 32 registers and the program counter. The registers are named x0 to x31. Extensions can have further registers. The application binary interface (ABI) contains a convention on how the registers should be used when a compiler translates a program in a higher-level language into machine language.
| Register | ABI Name | Description |
|---|---|---|
| x0 | zero | Zero constant |
| x1 | ra | Return address |
| x2 | sp | Stack pointer |
| x3 | gp | Global pointer |
| x4 | tp | Thread pointer |
| x5-x7 | t0-t2 | Temporaries |
| x8 | s0 / fp | Saved / Frame pointer |
| x9 | s1 | Saved register |
| x10-x11 | a0-a1 | Function args. / return values |
| x12-x17 | a2-a7 | Function arguments |
| x18-x27 | s2-s11 | Saved registers |
| x28-x31 | t3-t6 | Temporaries |
| pc | - | Program counter |
To modify data from memory, you have to load it to a register, perform operations with the data, and store it back to memory.
Encoding:
Instructions which use immediate values:
| instruction | name | format | opcode | funct3 | description |
|---|---|---|---|---|---|
addi |
ADD Immediate | I | 0010011 | 0ⅹ0 | rd = rs1 + imm |
xori |
XORImmediate | I | 0010011 | 0ⅹ4 | rd = rs1 ^ imm |
ori |
OR Immediate | I | 0010011 | 0ⅹ6 | rd = rs1 imm |
andi |
AND Immediate | I | 0010011 | 0ⅹ7 | rd = rs1 & imm |
slli |
Shift Left Logical Imm. | I | 0010011 | 0ⅹ1 | imm[11:5]=0x00, rd = rs1 << imm[4:0] |
srli |
Shift Right Logical Imm. | I | 0010011 | 0ⅹ5 | imm[11:5]=0x00, rd = rs1 << imm[4:0] |
srai |
Shift Right Arith. Imm. | I | 0010011 | 0ⅹ5 | imm[11:5]=0x20, rd = rs1 >> imm[4:0] |
slti |
Set Less Than Imm. | I | 0010011 | 0ⅹ2 | rd = (rs1 < imm)? 0:1 |
sltiu |
Set Less Than Imm. Un. | I | 0010011 | 0ⅹ3 | rd = (rs1 < imm)? 0:1 |
Arithmetic and logical operations that use two registers as source and one register as destination:
| instruction | name | format | opcode | funct3 | funct7 | description |
|---|---|---|---|---|---|---|
add |
ADD | R | 0110011 | 0ⅹ0 | 0ⅹ00 | rd = rs1 + rs2 |
sub |
SUB | R | 0110011 | 0ⅹ0 | 0ⅹ20 | rd = rs1 - rs2 |
xor |
XOR | R | 0110011 | 0ⅹ4 | 0ⅹ00 | rd = rs1 ^ rs2 |
or |
OR | R | 0110011 | 0ⅹ6 | 0ⅹ00 | rd = rs1 rs2 |
and |
AND | R | 0110011 | 0ⅹ7 | 0ⅹ00 | rd = rs1 & rs2 |
sll |
Shift Left Logical | R | 0110011 | 0ⅹ1 | 0ⅹ00 | rd = rs1 << rs2 |
srl |
Shift Right Logical | R | 0110011 | 0ⅹ5 | 0ⅹ00 | rd = rs1 >> rs2 |
sra |
Set Right Arith. | R | 0110011 | 0ⅹ5 | 0ⅹ20 | rd = rs1 >> rs2 |
slt |
Set Less Than | R | 0110011 | 0ⅹ2 | 0ⅹ00 | rd = (rs1 < rs2)? 0:1 |
sltu |
Set Less Than Un. | R | 0110011 | 0ⅹ3 | 0ⅹ00 | rd = (rs1 < rs2)? 0:1 |
Load instructions follow the I-format, to load data from memory into a register. Save instructions follow the S-format, to store a register value into memory.
| instruction | name | format | opcode | funct3 | description |
|---|---|---|---|---|---|
lb |
Load Byte | I | 0000011 | 0ⅹ0 | rd = M[rs1+imm][7:0] |
lh |
Load Half | I | 0000011 | 0ⅹ1 | rd = M[rs1+imm][15:0] |
lw |
Load Word | I | 0000011 | 0ⅹ2 | rd = M[rs1+imm][31:0] |
lbu |
Load Byte Un. | I | 0000011 | 0ⅹ0 | rd = M[rs1+imm][7:0] |
lhu |
Load Half Un. | I | 0010011 | 0ⅹ0 | rd = M[rs1+imm][15:0] |
sb |
Store Byte | S | 0010011 | 0ⅹ0 | M[rs1+imm][7:0] = rs2[7:0] |
sh |
Store Half | S | 0100011 | 0ⅹ1 | M[rs1+imm][15:0] = rs2[15:0] |
sw |
Store Word | S | 0100011 | 0ⅹ2 | M[rs1+imm][31:0] = rs2[31:0] |
The instructions of the U format make it easier to set an address in a register.
| instruction | name | format | opcode | funct3 | description |
|---|---|---|---|---|---|
lui |
Load Upper Imm. | U | 0110111 | - | rd = imm << 12 |
auipc |
Add Upper Imm. to PC | U | 0010111 | - | rd = PC + (imm << 12) |
Control flow instructions using the B-format allow for conditional branching.
| instruction | name | format | opcode | funct3 | description |
|---|---|---|---|---|---|
beq |
Branch == | B | 1100011 | 0ⅹ0 | if (rs1 == rs2) pc+=imm |
bne |
Branch != | B | 1100011 | 0ⅹ1 | if (rs1 != rs2) pc+=imm |
blt |
Branch < | B | 1100011 | 0ⅹ4 | if (rs1 < rs2) pc+=imm |
bge |
Branch >= | B | 1100011 | 0ⅹ5 | if (rs1 >= rs2) pc+=imm |
bltu |
Branch < Un. | B | 1100011 | 0ⅹ6 | if (rs1 < rs2) pc+=imm |
bgeu |
Branch >= Un. | B | 1100011 | 0ⅹ7 | if (rs1 >= rs2) pc+=imm |
Unconditional jumps modify the program counter, writing the possible return address into a register.
| instruction | name | format | opcode | funct3 | description |
|---|---|---|---|---|---|
jal |
Jump and Link | J | 1101111 | - | rd = PC+4; PC += imm |
jalr |
Jump and Link Register | I | 1100111 | 0ⅹ0 | rd = PC+4; PC = rs + imm |
The instruction ecall requests a system call. The ebreak instruction is used for debugging programs.
| instruction | name | format | opcode | funct3 | description |
|---|---|---|---|---|---|
ecall |
Environment Call | I | 1110011 | 0ⅹ0 | imm = 0, rd = rs1 = 0, transfer control to system |
ebreak |
Environment Break | I | 1110011 | 0ⅹ0 | imm = 1, rd = rs1 = 0, transfer control to debugger |
The fence instruction is for synchronizing memory access between multiple processors using the same memory. It divides a program, every instruction before is done by the executing hardware thread, and thus observed by other hardware threads executing code after.
| instruction | name | format | opcode | funct3 | description |
|---|---|---|---|---|---|
fence |
Fence | I | 0001111 | 0ⅹ0 | rd, rs1 reserved. Normal fence for all memory access types has imm = 0b000011111111. |
In RiscV64, values and registers work with 64 bits, and it has w-instruction variants for 32 bits.
M extension is for multiplication and division.
| instruction | name | format | opcode | funct3 | description | instruction |
|---|---|---|---|---|---|---|
mul |
Multiply | R | 0110011 | 0ⅹ0 | 0ⅹ01 | rd = (rs1 * rs2)[31:0] |
mulh |
Multiply High | R | 0110011 | 0ⅹ1 | 0ⅹ01 | rd = (rs1 * rs2)[63:32] |
mulhsu |
Multiply High Sign/Uns. | R | 0110011 | 0ⅹ2 | 0ⅹ01 | rd = (rs1 * rs2)[63:32] |
mulhu |
Multiply Unsigned | R | 0110011 | 0ⅹ3 | 0ⅹ01 | rd = (rs1 * rs2)[63:32] |
div |
Divide | R | 0110011 | 0ⅹ4 | 0ⅹ01 | rd = rs1 / rs2 |
divu |
Divide Unsigned | R | 0110011 | 0ⅹ5 | 0ⅹ01 | rd = rs1 / rs2 |
rem |
Remainder | R | 0110011 | 0ⅹ6 | 0ⅹ01 | rd = rs1 % rs2 |
remu |
Remainder Unsigned | R | 0110011 | 0ⅹ7 | 0ⅹ01 | rd = rs1 % rs2 |
With RV64, values and registers work with 64 bits. It has extra variants of instructions for 32 bits, they end with a ‘w’ for word length. The used 32 bits are the lower bits of the 64-bit registers.
RISC-V assembler
A RISC-V assembler provides pseudo instructions and directives, making it easier for programmers.
Labels
A label is a text that ends with a colon. Labels mark places in the program code. They are used instead of offsets or addresses. The assembler translates a label into a corresponding address.
Numeric labels are local labels that can be referenced with suffixes ‘f’ for forward and ‘b’ for backward.
Addressing might require more than one instruction. The instructions auipc and lui are used for loading the upper immediates into a register, thus allowing addressing the upper part of an address. The lower part can be addressed by an addi instruction with the register. The relocation functions %hi(symbol) and %lo(symbol) split the address of a label into its higher and lower part. The linker relocates the program and assigns addresses to symbols. The functions %pcrel_hi(symbol) and %pcrel_lo(label) work together with the auipc and addi instructions.
Directives
Provide the assembler with information on how the text following a directive should be treated. Directives begin with a dot.
| Directive | Arguments | Description |
|---|---|---|
.text |
change to .text section | |
.data |
change to .data section | |
.rodata |
change to .rodata section | |
.bss |
change to .bss section | |
.section |
.text, .data, .rodata, .bss | change to section given by arguments |
.equ |
name, value | define name for value |
.ascii |
“string” | begin string without null terminator |
.asciz |
“string” | begin string with null terminator |
.string |
“string” | same as .asciz |
.byte |
expression [,expression]* | 8-bit comma separated words |
.half |
expression [,expression]* | 16-bit comma separated words |
.word |
expression [,expression]* | 32-bit comma separated words |
.dword |
expression [,expression]* | 64-bit comma separated words |
.zero |
integer | zero bytes |
.align |
integer | align to the power of 2 |
.globl |
symbol_name | make symbol_name apparent in symbol table |
Example
# define exit as 93
.equ exit, 93
# program code
.section .text
# export _start for linker
.globl _start
_start:
li a7, exit
ecall
# data: init one word (16-bit value) with 1 and read/write.section .data
counter:
.word 1
# rodata: constant text string
.section .rodata
text_begin:
.asciz "Text"
text_end:
# current address minus address of text_begin = length of text
.byte .-text_begin
# non initialized block with same size as the text
.section .bss
# start next part by address aligned to multiple of 2^2 = 4
.align 2
copy_begin:
.zero text_end-text_begin
Save it as example.s.
riscv64-linux-gnu-as -o example.o example.s
riscv64-linux-gnu-ld -o example example.o
riscv64-linux-gnu-objdump -f -d -Mno-aliases,numeric example
riscv64-linux-gnu-objdump -F -s example
Pseudo instructions
Pseudo instructions are part of the assembly language and are translated into machine code.
| Pseudo instruction | Base instruction(s) | Description |
|---|---|---|
la rd, symbol |
auipc rd, symbol[31:12] addi rd, symbol[11:0] |
Load address (non position independent code - non-PIC) |
la rd, symbol |
auipc rd, symbol@GOT[31:12] l{w|d} rd, symbol[11:0](rd) |
Load address (position independent code PIC) |
lla ra, symbol |
auipc rd, symbol[31:12] addi rd, rd, symbol[11:0] |
Load local address |
lga rd, symbol |
auipc rd, symbol@GOT[31:12] l{w|d} rd, symbol@GOT[11:0](rd) |
Load global address |
l{b|h|w|d} rd, symbol |
auipc rd, symbol[31:12] l{b|h|w|d} rd, symbol[11:0](rd) |
Load global |
s{b|h|w|d} rs, symbol, rd |
auipc rd, symbol[31:12] s{b|h|w|d} rs, symbol[11:0](rd) |
Store global |
nop |
addi x0, x0, 0 |
No operation |
li rd, imm |
Different instructions |
Load immediate |
mv rd, rs |
addi rd, rs, 0 |
Copy register |
not rd, rs |
xori rd, rs, -1 |
1’s complement |
neg rd, rs |
sub rd, x0, rs |
2’s complement |
negw rd, rs |
subw rd, x0, rs |
2’s complement word |
The GOT is stored in the executable. It allows the operating system to load libraries at program startup to different memory areas.
Pseudo instructions for extending and conditional bit setting:
| Pseudo instruction | Base instruction(s) | Description |
|---|---|---|
sext.{b|h|w} rd, rs |
different instructions |
sign extend |
zext.{b|h|w} rd, rs |
different instructions |
zero extend |
seqz rd, rs |
sltiu rd, rs, 1 |
rd = (rs == 0)? 1:0 |
snez rd, rs |
sltu rd, x0, rs |
rd = (rs != 0)? 1:0 |
sltz rd, rs |
slt rd, rs, x0 |
rd = (rs < 0)? 1:0 |
sgtz rd, rs |
slt rd, x0, rs |
rd = (rs > 0)? 1:0 |
Pseudo instructions for conditional branching:
| Pseudo instruction | Base instruction(s) | Description |
|---|---|---|
beqz rs, imm |
beq rs, x0, imm |
if (rs == 0) PC+=imm |
bnez rs, imm |
bne rs, x0, imm |
if (rs != 0) PC+=imm |
blez rs, imm |
bge x0, rs, imm |
if (rs <= 0) PC+=imm |
bgez rs, imm |
bge rs, x0, imm |
if (rs >= 0) PC+=imm |
bltz rs, imm |
blt rs, x0, imm |
if (rs < 0) PC+=imm |
bgtz rs, imm |
blt x0, rs, imm |
if (rs > 0) PC+=imm |
bgt rs, rt, imm |
blt rt, rs, imm |
if (rs > rt) PC+=imm |
ble rs, rt, imm |
bge rt, rs, imm |
if (rs <= rt) PC+=imm |
bgtu rs, rt, imm |
bltu rt, rs, imm |
if (rs > rt) PC+=imm, unsign. |
bleu rs, rt, imm |
bgeu rt, rs, imm |
if (rs <= rt) PC+=imm, unsign. |
Pseudo instructions for unconditional jumping:
| Pseudo instruction | Base instruction(s) | Description |
|---|---|---|
j imm |
jal x0, imm |
PC += imm |
jal imm |
jal x1, imm |
x1 = PC+4; PC += imm |
jr rs |
jalr x0, rs, 0 |
PC = rs |
jalr rs |
jalr x1, rs, 0 |
x1 = PC+4; PC = rs |
ret |
jalr x0, x1, 0 |
PC = x1 |
call imm |
auipc x6, imm[31:12] jalr x1, x6, imm[11:0] |
x1 = PC+4; PC = imm |
tail imm |
auipc x6, imm[31:12] jalr x0, x6, imm[11:0] |
PC = imm |
Application Binary Interface
A convention on how registers should be used in a general context. One convention is the responsibility of storing register values when a jump instruction calls a function. This means the return address of the instruction after the jump instruction in memory is stored in a register and the program counter is the beginning of the function. Also which registers can be assumed as changed or not changed when a function returns.
| Register | ABI alias | Description | Saver |
|---|---|---|---|
| x0 | zero | zero constant | - |
| x1 | ra | return address | caller |
| x2 | sp | stack pointer | callee / function |
| x3 | gp | global pointer | - / should not be used from user |
| x4 | tp | thread pointer | - / should not be used from user |
| x5-x7 | t0-t2 | temporaries | caller |
| x8 | s0 / fp | saved / Frame pointer | callee / function |
| x9 | s1 | saved register | callee / function |
| x10-x11 | a0-a1 | function args. / return values | caller |
| x12-x17 | a2-a7 | function arguments | caller |
| x18-x27 | s2-s11 | saved registers | callee / function |
| x28-x31 | t3-t6 | temporaries | caller |
| pc | - | program counter | - |
The global pointer and the thread pointer should not be used except from the operating system.
Stack
A memory area that is used via the stack pointer (register sp) that comes initialized for a user application. The stack grows from a high address to a low address in memory.
