Foundations of RISC-V Assembly Programming

My notes:

RISC-V is a specification of an instruction set architecture (ISA) for 32-bit, 64-bit, and 128-bit microprocessors. RISC-V is an open ISA that allows everyone to build processors conforming to RISC-V without license fees.

Volume I: User-Level ISA contains general information about RISC-V, base instruction sets for 32, 64, and 128 integer architectures, standard extensions of the base instruction sets, and conventions.

Volume II: Privileged Architecture covers information required for programming operating systems or bare metal embedded systems.

Assembly language is the human-readable and writable representation of machine code. It is a hardware-/processor-dependent language.

In general, a processor has a control unit, an arithmetic logic unit, registers, and signal/data lines (bus) for input and output, e.g., to access volatile memory. The control unit has the task of encoding instruction and controlling the program flow. The computer program is located in memory. A special register, the program counter, holds the current instruction location to carry out. An address is used for accessing a concrete storage unit - usually in the size of bytes or multiple of bytes.

A typical RISC processor performs the classic five-stage RISC pipeline:

  • instruction fetch (IF)
  • instruction decode (ID)
  • instruction execute (EX)
  • memory access (MEM)
  • write back (WB)

Using pipelining, parallel execution of the stages can be achieved.

An assembler or cross assembler for the target architecture translates the source code in an object file. The linker takes the object file and a linker script that specifies how the segments given in the object file should be put together in memory for execution. The result is an executable file.

Ripes is a simulator for illustrating machine code execution on RV32IMC and RV64IMC architectures.

Qemu is a machine emulator which allows you to emulate a full-system or a single program.

Install qemu in Debian:

$ sudo apt install qemu-system-misc qemu-user-static binfmt-support opensbi u-boot-qemu

Install the crosscompiler toolchain:

sudo apt install gcc-riscv64-linux-gnu

Go to Debian Quick Image Baker pre-baked images and download the image for riscv64-virt.

Rename the downloaded file to riscv.qcow2.

Emulate:

$ qemu-system-riscv64 -machine virt -cpu rv64 -m 1G -device virtio-blk-device,drive=hd -drive file=riscv.qcow2,if=none,id=hd -device virtio-net-device,netdev=net -netdev user,id=net,hostfwd=tcp::2222-:22 -bios /usr/lib/riscv64-linux-gnu/opensbi/generic/fw_jump.elf -kernel /usr/lib/u-boot/qemu-riscv64_smode/uboot.elf -object rng-random,filename=/dev/urandom,id=rng -device virtio-rng-device,rng=rng -nographic -append "root=LABEL=rootfs console=ttyS0"

This command is failing. I’m using Ubuntu instead: RISC-V cheat sheet

Install the debugger:

$ sudo apt install gdb-multiarch

Test creating the assembler file example.s with this contents:

.text 
.globl _start
_start:
      addi x10, x0,  7
      addi x17, x0, 93
      ecall

Assemble:

$ riscv64-linux-gnu-as -o example.o example.s 

Link:

$ riscv64-linux-gnu-ld -o example example.o

Execute:

qemu-riscv64-static example

Check. In bash:

$ echo $?

In fish:

$ echo $status

You should get the 7 as the result.

Disassemble the binary:

$ riscv64-linux-gnu-objdump --full-contents --disassemble example

Debug:

$ qemu-riscv64-static -g 1234 example &
$ gdb-multiarch example
(gdb) target remote :1234: 
(gdb) display /3i $pc

The command display /3i $pc shows the next three instructions, the command si (for step instruction) steps one instruction and continue continues the program being debugged. Type q to quit the debugger.

The RISC-V unprivileged ISA describes:

  • RV32I (32-bit integer)
  • RV32E (32-bit embedded)
  • RV64I (64-bit integer)
  • RV128I (128-bit integer)

The following extensions are common:

  • M: integer multiplication and division
  • A: atomic instructions
  • F: single-precision floating point
  • D: double-precision floating point
  • Q: quad-precision floating point
  • C: compressed instructions
  • V: vector operations

The base ISAs specify 32 registers and the program counter. The registers are named x0 to x31. Extensions can have further registers. The application binary interface (ABI) contains a convention on how the registers should be used when a compiler translates a program in a higher-level language into machine language.

Register ABI Name Description
x0 zero Zero constant
x1 ra Return address
x2 sp Stack pointer
x3 gp Global pointer
x4 tp Thread pointer
x5-x7 t0-t2 Temporaries
x8 s0 / fp Saved / Frame pointer
x9 s1 Saved register
x10-x11 a0-a1 Function args. / return values
x12-x17 a2-a7 Function arguments
x18-x27 s2-s11 Saved registers
x28-x31 t3-t6 Temporaries
pc - Program counter

To modify data from memory, you have to load it to a register, perform operations with the data, and store it back to memory.

Encoding:

Instructions which use immediate values:

instruction name format opcode funct3 description
addi ADD Immediate I 0010011 0ⅹ0 rd = rs1 + imm
xori XORImmediate I 0010011 0ⅹ4 rd = rs1 ^ imm
ori OR Immediate I 0010011 0ⅹ6 rd = rs1 imm
andi AND Immediate I 0010011 0ⅹ7 rd = rs1 & imm
slli Shift Left Logical Imm. I 0010011 0ⅹ1 imm[11:5]=0x00, rd = rs1 << imm[4:0]
srli Shift Right Logical Imm. I 0010011 0ⅹ5 imm[11:5]=0x00, rd = rs1 << imm[4:0]
srai Shift Right Arith. Imm. I 0010011 0ⅹ5 imm[11:5]=0x20, rd = rs1 >> imm[4:0]
slti Set Less Than Imm. I 0010011 0ⅹ2 rd = (rs1 < imm)? 0:1
sltiu Set Less Than Imm. Un. I 0010011 0ⅹ3 rd = (rs1 < imm)? 0:1

Arithmetic and logical operations that use two registers as source and one register as destination:

instruction name format opcode funct3 funct7 description
add ADD R 0110011 0ⅹ0 0ⅹ00 rd = rs1 + rs2
sub SUB R 0110011 0ⅹ0 0ⅹ20 rd = rs1 - rs2
xor XOR R 0110011 0ⅹ4 0ⅹ00 rd = rs1 ^ rs2
or OR R 0110011 0ⅹ6 0ⅹ00 rd = rs1 rs2
and AND R 0110011 0ⅹ7 0ⅹ00 rd = rs1 & rs2
sll Shift Left Logical R 0110011 0ⅹ1 0ⅹ00 rd = rs1 << rs2
srl Shift Right Logical R 0110011 0ⅹ5 0ⅹ00 rd = rs1 >> rs2
sra Set Right Arith. R 0110011 0ⅹ5 0ⅹ20 rd = rs1 >> rs2
slt Set Less Than R 0110011 0ⅹ2 0ⅹ00 rd = (rs1 < rs2)? 0:1
sltu Set Less Than Un. R 0110011 0ⅹ3 0ⅹ00 rd = (rs1 < rs2)? 0:1

Load instructions follow the I-format, to load data from memory into a register. Save instructions follow the S-format, to store a register value into memory.

instruction name format opcode funct3 description
lb Load Byte I 0000011 0ⅹ0 rd = M[rs1+imm][7:0]
lh Load Half I 0000011 0ⅹ1 rd = M[rs1+imm][15:0]
lw Load Word I 0000011 0ⅹ2 rd = M[rs1+imm][31:0]
lbu Load Byte Un. I 0000011 0ⅹ0 rd = M[rs1+imm][7:0]
lhu Load Half Un. I 0010011 0ⅹ0 rd = M[rs1+imm][15:0]
sb Store Byte S 0010011 0ⅹ0 M[rs1+imm][7:0] = rs2[7:0]
sh Store Half S 0100011 0ⅹ1 M[rs1+imm][15:0] = rs2[15:0]
sw Store Word S 0100011 0ⅹ2 M[rs1+imm][31:0] = rs2[31:0]

The instructions of the U format make it easier to set an address in a register.

instruction name format opcode funct3 description
lui Load Upper Imm. U 0110111 - rd = imm << 12
auipc Add Upper Imm. to PC U 0010111 - rd = PC + (imm << 12)

Control flow instructions using the B-format allow for conditional branching.

instruction name format opcode funct3 description
beq Branch == B 1100011 0ⅹ0 if (rs1 == rs2) pc+=imm
bne Branch != B 1100011 0ⅹ1 if (rs1 != rs2) pc+=imm
blt Branch < B 1100011 0ⅹ4 if (rs1 < rs2) pc+=imm
bge Branch >= B 1100011 0ⅹ5 if (rs1 >= rs2) pc+=imm
bltu Branch < Un. B 1100011 0ⅹ6 if (rs1 < rs2) pc+=imm
bgeu Branch >= Un. B 1100011 0ⅹ7 if (rs1 >= rs2) pc+=imm

Unconditional jumps modify the program counter, writing the possible return address into a register.

instruction name format opcode funct3 description
jal Jump and Link J 1101111 - rd = PC+4; PC += imm
jalr Jump and Link Register I 1100111 0ⅹ0 rd = PC+4; PC = rs + imm

The instruction ecall requests a system call. The ebreak instruction is used for debugging programs.

instruction name format opcode funct3 description
ecall Environment Call I 1110011 0ⅹ0 imm = 0, rd = rs1 = 0, transfer control to system
ebreak Environment Break I 1110011 0ⅹ0 imm = 1, rd = rs1 = 0, transfer control to debugger

The fence instruction is for synchronizing memory access between multiple processors using the same memory. It divides a program, every instruction before is done by the executing hardware thread, and thus observed by other hardware threads executing code after.

instruction name format opcode funct3 description
fence Fence I 0001111 0ⅹ0 rd, rs1 reserved. Normal fence for all memory access types has imm = 0b000011111111.

In RiscV64, values and registers work with 64 bits, and it has w-instruction variants for 32 bits.

M extension is for multiplication and division.

instruction name format opcode funct3 description instruction
mul Multiply R 0110011 0ⅹ0 0ⅹ01 rd = (rs1 * rs2)[31:0]
mulh Multiply High R 0110011 0ⅹ1 0ⅹ01 rd = (rs1 * rs2)[63:32]
mulhsu Multiply High Sign/Uns. R 0110011 0ⅹ2 0ⅹ01 rd = (rs1 * rs2)[63:32]
mulhu Multiply Unsigned R 0110011 0ⅹ3 0ⅹ01 rd = (rs1 * rs2)[63:32]
div Divide R 0110011 0ⅹ4 0ⅹ01 rd = rs1 / rs2
divu Divide Unsigned R 0110011 0ⅹ5 0ⅹ01 rd = rs1 / rs2
rem Remainder R 0110011 0ⅹ6 0ⅹ01 rd = rs1 % rs2
remu Remainder Unsigned R 0110011 0ⅹ7 0ⅹ01 rd = rs1 % rs2

With RV64, values and registers work with 64 bits. It has extra variants of instructions for 32 bits, they end with a ‘w’ for word length. The used 32 bits are the lower bits of the 64-bit registers.

RISC-V assembler

A RISC-V assembler provides pseudo instructions and directives, making it easier for programmers.

Labels

A label is a text that ends with a colon. Labels mark places in the program code. They are used instead of offsets or addresses. The assembler translates a label into a corresponding address.

Numeric labels are local labels that can be referenced with suffixes ‘f’ for forward and ‘b’ for backward.

Addressing might require more than one instruction. The instructions auipc and lui are used for loading the upper immediates into a register, thus allowing addressing the upper part of an address. The lower part can be addressed by an addi instruction with the register. The relocation functions %hi(symbol) and %lo(symbol) split the address of a label into its higher and lower part. The linker relocates the program and assigns addresses to symbols. The functions %pcrel_hi(symbol) and %pcrel_lo(label) work together with the auipc and addi instructions.

Directives

Provide the assembler with information on how the text following a directive should be treated. Directives begin with a dot.

Directive Arguments Description
.text change to .text section
.data change to .data section
.rodata change to .rodata section
.bss change to .bss section
.section .text, .data, .rodata, .bss change to section given by arguments
.equ name, value define name for value
.ascii “string” begin string without null terminator
.asciz “string” begin string with null terminator
.string “string” same as .asciz
.byte expression [,expression]* 8-bit comma separated words
.half expression [,expression]* 16-bit comma separated words
.word expression [,expression]* 32-bit comma separated words
.dword expression [,expression]* 64-bit comma separated words
.zero integer zero bytes
.align integer align to the power of 2
.globl symbol_name make symbol_name apparent in symbol table

Example

# define exit as 93
.equ exit, 93
# program code
.section .text
# export _start for linker
.globl  _start
_start:
        li      a7, exit
        ecall
# data: init one word (16-bit value) with 1 and read/write.section .data
counter:
.word 1
# rodata: constant text string
.section .rodata
text_begin:
.asciz  "Text"
text_end:
# current address minus address of text_begin = length of text
.byte .-text_begin
# non initialized block with same size as the text
.section .bss 
# start next part by address aligned to multiple of 2^2 = 4
.align 2
copy_begin:
.zero text_end-text_begin

Save it as example.s.

riscv64-linux-gnu-as -o example.o example.s 
riscv64-linux-gnu-ld -o example example.o
riscv64-linux-gnu-objdump -f -d -Mno-aliases,numeric example
riscv64-linux-gnu-objdump -F -s example

Pseudo instructions

Pseudo instructions are part of the assembly language and are translated into machine code.

Pseudo instruction Base instruction(s) Description
la rd, symbol auipc rd, symbol[31:12]
addi rd, symbol[11:0]
Load address (non position independent code - non-PIC)
la rd, symbol auipc rd, symbol@GOT[31:12]
l{w|d} rd, symbol[11:0](rd)
Load address (position independent code PIC)
lla ra, symbol auipc rd, symbol[31:12]
addi rd, rd, symbol[11:0]
Load local address
lga rd, symbol auipc rd, symbol@GOT[31:12]
l{w|d} rd, symbol@GOT[11:0](rd)
Load global address
l{b|h|w|d} rd, symbol auipc rd, symbol[31:12]
l{b|h|w|d} rd, symbol[11:0](rd)
Load global
s{b|h|w|d} rs, symbol, rd auipc rd, symbol[31:12]
s{b|h|w|d} rs, symbol[11:0](rd)
Store global
nop addi x0, x0, 0 No operation
li rd, imm Different instructions Load immediate
mv rd, rs addi rd, rs, 0 Copy register
not rd, rs xori rd, rs, -1 1’s complement
neg rd, rs sub rd, x0, rs 2’s complement
negw rd, rs subw rd, x0, rs 2’s complement word

The GOT is stored in the executable. It allows the operating system to load libraries at program startup to different memory areas.

Pseudo instructions for extending and conditional bit setting:

Pseudo instruction Base instruction(s) Description
sext.{b|h|w} rd, rs different instructions sign extend
zext.{b|h|w} rd, rs different instructions zero extend
seqz rd, rs sltiu rd, rs, 1 rd = (rs == 0)? 1:0
snez rd, rs sltu rd, x0, rs rd = (rs != 0)? 1:0
sltz rd, rs slt rd, rs, x0 rd = (rs < 0)? 1:0
sgtz rd, rs slt rd, x0, rs rd = (rs > 0)? 1:0

Pseudo instructions for conditional branching:

Pseudo instruction Base instruction(s) Description
beqz rs, imm beq rs, x0, imm if (rs == 0) PC+=imm
bnez rs, imm bne rs, x0, imm if (rs != 0) PC+=imm
blez rs, imm bge x0, rs, imm if (rs <= 0) PC+=imm
bgez rs, imm bge rs, x0, imm if (rs >= 0) PC+=imm
bltz rs, imm blt rs, x0, imm if (rs < 0) PC+=imm
bgtz rs, imm blt x0, rs, imm if (rs > 0) PC+=imm
bgt rs, rt, imm blt rt, rs, imm if (rs > rt) PC+=imm
ble rs, rt, imm bge rt, rs, imm if (rs <= rt) PC+=imm
bgtu rs, rt, imm bltu rt, rs, imm if (rs > rt) PC+=imm, unsign.
bleu rs, rt, imm bgeu rt, rs, imm if (rs <= rt) PC+=imm, unsign.

Pseudo instructions for unconditional jumping:

Pseudo instruction Base instruction(s) Description
j imm jal x0, imm PC += imm
jal imm jal x1, imm x1 = PC+4; PC += imm
jr rs jalr x0, rs, 0 PC = rs
jalr rs jalr x1, rs, 0 x1 = PC+4; PC = rs
ret jalr x0, x1, 0 PC = x1
call imm auipc x6, imm[31:12]
jalr x1, x6, imm[11:0]
x1 = PC+4; PC = imm
tail imm auipc x6, imm[31:12]
jalr x0, x6, imm[11:0]
PC = imm

Application Binary Interface

A convention on how registers should be used in a general context. One convention is the responsibility of storing register values when a jump instruction calls a function. This means the return address of the instruction after the jump instruction in memory is stored in a register and the program counter is the beginning of the function. Also which registers can be assumed as changed or not changed when a function returns.

Register ABI alias Description Saver
x0 zero zero constant -
x1 ra return address caller
x2 sp stack pointer callee / function
x3 gp global pointer - / should not be used from user
x4 tp thread pointer - / should not be used from user
x5-x7 t0-t2 temporaries caller
x8 s0 / fp saved / Frame pointer callee / function
x9 s1 saved register callee / function
x10-x11 a0-a1 function args. / return values caller
x12-x17 a2-a7 function arguments caller
x18-x27 s2-s11 saved registers callee / function
x28-x31 t3-t6 temporaries caller
pc - program counter -

The global pointer and the thread pointer should not be used except from the operating system.

Stack

A memory area that is used via the stack pointer (register sp) that comes initialized for a user application. The stack grows from a high address to a low address in memory.