rv8

RISC-V simulator for x86-64

rv8 is a RISC-V simulation suite comprising a high performance x86-64 binary translator, a user mode simulator, a full system emulator, an ELF binary analysis tool and ISA metadata:

About | Installation | Optimisations | Benchmarks | Logging | Tracing | Histograms | Linux


About

The rv8 simulator suite contains libraries and command line tools for creating instruction opcode maps, C headers and source containing instruction set metadata, instruction decoders, a JIT assembler, LaTeX documentation, a metadata based RISC-V disassembler, a histogram tool for generating statistics on RISC-V ELF executables, a RISC-V proxy syscall simulator, a RISC-V full system emulator that implements the RISC-V 1.9.1 privileged specification and an x86-64 binary translator.

The current version is available at: https://github.com/rv8-io/rv8

rv8 binary translation

RISC-V to x86-64 binary translator

The rv8 binary translation engine works by interpreting code while profiling it for hot paths. Hot paths are translated on the fly to native code. The translation engine maintains a call stack to allow runtime inlining of hot functions. A jump target cache is used to accelerate returns and indirect calls through function pointers. The translator supports hybrid binary translation and interpretation to handle instructions that do not have native translations. Currently ‘IM’ code is translated and ‘AFD’ is interpreted. The translator supports RVC compressed code.

The rv8 binary translator supports a number of simple optimisations:

RISC-V full system emulator

The rv8 suite includes a full system emulator that implements the RISC-V privileged ISA with support for interrupts, MMIO (memory mapped input output) devices, a soft MMU (memory management unit) with separate instruction and data TLBs (translation lookaside buffers). The full system emulator has a simple integrated debugger that allows setting breakpoints, single stepping and disassembling instructions as they are executed.

The rv8 full system emulator has the following features:

RISC-V user mode simulator

The rv8 user mode simulator is a single address space implementation of the RISC-V ISA that implements a subset of the RISC-V Linux syscall ABI (application binary interface) and delegates system calls to the underlying native host operating system. The user mode simulator can run RISC-V Linux binaries on non-Linux operating systems via system call emulation. The current user mode simulator implements a small number of Linux system calls to allow running RISC-V Linux ELF static binaries.

The rv8 user mode simulator has the following features:

RISC-V instruction set metadata

The rv-bin tool contains a meta-data driven disassembler and a histogram tool for analysing static register usage and static instruction usage.

The rv-meta tool is able to generate opcode maps, instruction decoders, source, headers and instruction set listing LaTeX from ISA metadata. The following is an example of PDF output:

RISC-V Instruction Set Listing

RISC-V instruction set metadata is available here. The linked page shows an example RISC-V Instruction Set Reference generated by the rv-meta tool. rv8 has a simple extensible generator framework that allows reflection on the instruction set metadata to generate a number of different output formats.


Installation

rv8 supports the following target architecture and host operating system combinations:

Please read the RISC-V toolchain installtion instructions in the riscv-gnu-toolchain repository. To experiment with the RISC-V toolchain online try the RISC-V Compiler Explorer.

Building riscv-gnu-toolchain

$ sudo apt-get install autoconf automake autotools-dev curl \
  libmpc-dev libmpfr-dev libgmp-dev gawk build-essential \
  bison flex texinfo gperf libtool patchutils bc zlib1g-dev
$ git clone https://github.com/riscv/riscv-gnu-toolchain.git
$ cd riscv-gnu-toolchain
$ git submodule update --init --recursive
$ ./configure --prefix=/opt/riscv/toolchain
$ make

rv8 has minimal external dependencies besides a C++14 compiler, the C/C++ standard libraries and the asmjit submodule.

Building rv8

$ git clone https://github.com/rv8-io/rv8.git
$ cd rv8
$ git submodule update --init --recursive
$ make
$ sudo make install

Running rv8

The riscv64-unknown-elf newlib toolchain is required for building the rv8 test cases and this build step depends on the RISCV environment variable.

$ cd rv8
$ export RISCV=/opt/riscv/toolchain
$ make test-build
$ make test-sim
$ make test-sys
$ rv-jit build/riscv64-unknown-elf/bin/test-dhrystone

Optimisations

The rv8 binary translator performs JIT (Just In Time) translation of RISC-V code to X86-64 code. This is a challenging problem for many reasons; with the principle challange due to RISC-V having 31 integer registers while x86-64 has only 16 integer registers.

Register allocation

rv8 solves the register set size problem by using a static register allocation and spilling registers to memory (L1 cache) (a future versions may use dynamic register allocator). A significant amount of performance is lost due to register allocations that take advantage of the larger number of available registers and less frequent stack spills. It is not possible for the translator to rearrange memory and registers for optimal stack spills as memory accesses must be translated precisely. The additional registers are translated as x86-64 memory operands (which produce load and store micro-ops) or in some circumstances, explicit mov instructions.

RV   x86   RV   x86   RV   x86   RV   x86
ra rdx   t1 rdi   a2 r10   a5 r13
sp rbx   a0 r8   a3 r11   a6 r14
t0 rsi   a1 r9   a4 r12   a7 r1

The remaining unallocated registers are stored in a memory spill area accessed using the rbp register. e.g. qword [rbp+0xF8] would be used to access t4.

Translator temporaries

The rv8 translator needs to use several host registers to point to translator internal structures and for use as temporary registers for the emulation of many instructions, for example a store instructions require the use of two temporary registers if both register operands are in the spill area. The translator uses the following x86-64 host registers as temporaries leaving 12 registers available for mapping to RISC-V registers:

CISC vs RISC operands

The rv8 translator makes use of CISC memory operands to access registers residing in the memory backed register spill area, which resides in L1 cache. The complex memory operands end up being cracked back into micro-ops in the CISC pipeline however the use of complex memory operands helps increase instruction density, which increases performance due to better use of I$ (instruction cache).

There are many combinations of instruction expansions depending on whether a register is mapped to a live register, is memory backed and whether there are two or three operands. A three operand RISC-V instruction is translated into a move and a destructive two operand x86-64 instruction. Temporary registers are used if both operands are memory backed. The principle is to maintain the densest possible mapping to the x86-64 ISA.

Memory operands are used to access registers in the spill area:

operands

Indirect call acceleration

Indirect calls through function pointers cannot be statically translated as the target address of their translation is not known at the time of translation. rv8 employs a trace cache which is a hashtable of guest program addresses to native code addresses. A full trace cache lookup is relatively slow because it requires saving caller-save registers and calling into C++ code. To accelerate indirect calls through function pointers, a small assembly stub looks up the target address in a sparse 1024 entry direct mapped L1 translation cache, and falls back to a slow translation cache miss path that saves registers and calls into the translator code to populate the L1 translation cache so that the next indirect call can be accelerated.

L1 translation cache

The direct mapped L1 translation cache is indexed by bits[10:1] of the guest address. Bit zero can be ignored because RISC-V instructions must start on a 2-byte boundary

Inline caching

Returns also make use of the L1 translation cache, however a procedure call made inside of a hot trace can be inlined. The translator maintains a call stack to keep track of return addresses. Upon reaching an inlined procedure RET (jalr zero, ra) instruction, the link register (ra in RISC-V, rdx in the x86 translation) is compared against the callers known return address and if it matches, control flow continues along the return path. In the case that the function is not inlined, the regular L1 translation cache is used to lookup the address of the translated code.

An inlined subroutine call needs to test the return address:

inline caching

Branch tail dynamic linking

The translator performs lazy translation of the source program during tracing and when it reaches branches, it can only link both sides of the branch if there exists an existing translation for the not taken side of the branch. To accelerate branch tail exits, the translator emits a relative branch to a trampoline that returns to the tracer main loop, and the tracer adds the branch to a table of branch fixup addresses indexed by target guest address. If the branch target is hot, once it has been translated, all relative branches that point to tail exit trampolines will be relinked to branch directly to the translated native code.

Macro-op fusion

The rv8 translator implements an optimisation known as macro-op fusion whereby specific patterns of adjacent instructions are translated into a smaller sequence of host instructions. The macro-op fusion pattern matcher has potential to increase performance further with the addition of common patterns. The following is a list of macro-op fusion patterns that are currently implemented in rv8:

A technique known as deoptimisation can be employed to allow elision of temporary registers in macro-op fusion patterns assuming the translator sees the register killed within its translation window. Deoptimisation requires that the optimised translation has an accompanying deoptimisation sequence to fill in elided register values, and this is played back in the case of a fault (device or debug interrupt) so that the visible machine state precisely matches that which the ISA dictates. rv8 does not presently implement deoptimisation, however it may be necessary to allow more sophisticated optimisations.

Sign extension versus zero extension

In addition to the register allocation problem, rv8 has to make sure that 32-bit operations on registers are sign extended instead of zero-extended. The normal behaviour of 32-bit operations on x86-64 is to zero extend bit 31 to bit 63 whereas RISC-V sign extends bit 31 to bit 63. One potential optimisation is lazy sign extension. It may be possible in a future version of the JIT translation engine to elide redundant sign extension operations, however it is important that the register state precisely matches the semantics of the ISA before executing an instruction that may cause a fault e.g. loads and stores.

Example of sign-extended vs zero-extended 32-bit arithmetic on RISC-V and x86-64:

sign-extension vs zero-extension

Bit manipulation intrinsics

The bencharks below contain digest algorithms and ciphers which can take advantage of bit manipulation instructions such as rotate and bswap. Present day compilers detect rotate and byte swap bitwise logical operations by matching intermediate representation patterns that can be lowered directly to bit manipulation instructions such as ROR, ROL, BSWAP on x86-64. This approach has the benefit of accelerating code that does not use inline assembly or compiler builtin functions. RISC-V currently lacks bit manipulation instructions however there are proposals to add them in the B extension. The following is a typical byte swap pattern.

rv8 implements rotate macro-op fusion which can translate two shift instructions and one OR instructions with the correct offsets into one shift and one rotate. The rotate macro-op fusion needs to create the residual temporary register side effects so that the register file contents are precisely matched, as it can’t easily prove the residual temporary register is not later used. Deoptimisation would be required to elide the temporary register.

Measurement

A future goal is to quantify the factors that contribute to the performance differences between native x86-64 code and translated RISC-V code, so future benchmarks should measure:


Benchmarks

The following section contains benchmark runtime and instructions per second results comparing the QEMU and rv8 JIT engines against native x86. This section also contains runtime neutral results comparing total retired RISC-V instructions to x86 micro-ops. The benchmark programs are compiled for aarch64, arm32, riscv64, riscv32, x86-64 and x86-32. See the Benchmarks Results page for the complete result set including optimisation level comparisons, macro-op fusion performance, executable file sizes, dynamic register and instruction usage charts.

Benchmark source

The following sources have been used to run the benchmarks:

Benchmark metrics

The following benchmark metrics have been plotted and tabulated:

Benchmark details

The rv8-bench benchmark suite contains the following test programs:

Benchmark Type Description
aes crypto encrypt, decrypt and compare 30MiB of data
bigint numeric compute 23 ^ 111121 and count base 10 digits
dhrystone synthetic synthetic integer workload
miniz compression compress, decompress and compare 8MiB of data
norx crypto encrypt, decrypt and compare 30MiB of data
primes numeric calculate largest prime number below 33333333
qsort sorting sort array containing 50 million items
sha512 digest calculate SHA-512 hash of 64MiB of data

Compiler details

The following compiler architectures, versions, compile options and runtime libraries are used to run the benchmarks:

Architecture Compiler C Library Compile options
x86-32 GCC 7.1.0 musl libc '-O3', '-O2' and '-Os'
x86-64 GCC 7.1.0 musl libc '-O3', '-O2' and '-Os'
riscv32 GCC 7.1.0 musl libc '-O3', '-O2' and '-Os'
riscv64 GCC 7.1.0 musl libc '-O3', '-O2' and '-Os'
arm32 GCC 7.2.0 musl libc '-O3', '-O2' and '-Os'
aarch64 GCC 7.1.0 musl libc '-O3', '-O2' and '-Os'

Measurement details

Runtimes

Runtime results comparing qemu, rv8 and native x86:

benchmark runtimes -O3 64-bit

Figure 1: Benchmark runtimes -O3 64-bit

Runtime 64-bit -O3 (seconds)

program qemu-aarch64 qemu-riscv64 rv8-riscv64 native-x86-64
aes 1.31 2.16 1.49 0.32
bigint 1.38 1.08 0.71 0.38
dhrystone 0.98 0.57 0.20 0.10
miniz 2.66 2.21 1.53 0.77
norx 0.60 1.17 0.99 0.22
primes 2.09 1.26 0.65 0.60
qsort 7.38 4.76 1.21 0.64
sha512 0.64 1.24 0.81 0.24
(Sum) 17.04 14.45 7.59 3.27

Performance Ratio 64-bit -O3 (smaller is better)

program qemu-aarch64 qemu-riscv64 rv8-riscv64 native-x86-64
aes 4.12 6.76 4.68 1.00
bigint 3.62 2.83 1.85 1.00
dhrystone 9.96 5.87 2.03 1.00
miniz 3.46 2.86 1.99 1.00
norx 2.73 5.33 4.51 1.00
primes 3.49 2.11 1.09 1.00
qsort 11.55 7.46 1.90 1.00
sha512 2.66 5.13 3.36 1.00
(Geomean) 4.44 4.39 2.40 1.00

benchmark runtimes -O2 64-bit

Figure 2: Benchmark runtimes -O2 64-bit

Runtime 64-bit -O2 (seconds)

program qemu-aarch64 qemu-riscv64 rv8-riscv64 native-x86-64
aes 1.32 2.18 1.49 0.32
bigint 1.34 1.03 1.44 0.38
dhrystone 1.77 1.06 0.23 0.12
miniz 2.72 2.22 1.52 0.77
norx 0.66 1.16 1.08 0.22
primes 2.11 1.25 0.66 0.59
qsort 7.35 4.74 1.19 0.62
sha512 0.68 1.32 0.96 0.24
(Sum) 17.95 14.96 8.57 3.26

Performance Ratio 64-bit -O2 (smaller is better)

program qemu-aarch64 qemu-riscv64 rv8-riscv64 native-x86-64
aes 4.13 6.84 4.68 1.00
bigint 3.51 2.70 3.76 1.00
dhrystone 14.73 8.81 1.95 1.00
miniz 3.52 2.87 1.96 1.00
norx 2.94 5.20 4.81 1.00
primes 3.58 2.13 1.12 1.00
qsort 11.79 7.60 1.91 1.00
sha512 2.80 5.45 3.96 1.00
(Geomean) 4.75 4.64 2.69 1.00

benchmark runtimes -Os 64-bit

Figure 3: Benchmark runtimes -Os 64-bit

Runtime 64-bit -Os (seconds)

program qemu-aarch64 qemu-riscv64 rv8-riscv64 native-x86-64
aes 1.22 1.91 1.26 0.37
bigint 1.60 1.40 2.85 0.38
dhrystone 5.42 2.59 1.28 0.39
miniz 2.74 2.24 1.73 0.83
norx 1.58 1.53 0.96 0.24
primes 1.97 1.23 0.74 0.59
qsort 7.99 5.27 0.90 0.66
sha512 0.64 1.14 0.67 0.25
(Sum) 23.16 17.31 10.39 3.71

Performance Ratio 64-bit -Os (smaller is better)

program qemu-aarch64 qemu-riscv64 rv8-riscv64 native-x86-64
aes 3.29 5.16 3.39 1.00
bigint 4.22 3.70 7.53 1.00
dhrystone 13.97 6.66 3.30 1.00
miniz 3.30 2.70 2.09 1.00
norx 6.59 6.36 4.00 1.00
primes 3.31 2.07 1.25 1.00
qsort 12.20 8.05 1.37 1.00
sha512 2.56 4.58 2.68 1.00
(Geomean) 5.07 4.49 2.75 1.00

benchmark runtimes -O3 32-bit

Figure 4: Benchmark runtimes -O3 32-bit

Runtime 32-bit -O3 (seconds)

program qemu-arm32 qemu-riscv32 rv8-riscv32 native-x86-32
aes 1.70 1.89 1.47 0.48
bigint 2.98 1.37 1.41 0.88
dhrystone 1.17 1.11 0.39 0.28
miniz 2.99 2.17 1.41 0.88
norx 0.77 0.77 0.78 0.26
primes 4.20 2.34 1.89 1.51
qsort 8.39 4.56 1.15 0.70
sha512 3.82 2.91 1.92 0.63
(Sum) 26.02 17.12 10.42 5.62

Performance Ratio 32-bit -O3 (smaller is better)

program qemu-arm32 qemu-riscv32 rv8-riscv32 native-x86-32
aes 3.56 3.97 3.09 1.00
bigint 3.39 1.56 1.61 1.00
dhrystone 4.13 3.91 1.37 1.00
miniz 3.40 2.47 1.60 1.00
norx 2.98 2.96 3.00 1.00
primes 2.79 1.55 1.25 1.00
qsort 12.04 6.54 1.65 1.00
sha512 6.04 4.60 3.04 1.00
(Geomean) 4.23 3.09 1.95 1.00

benchmark runtimes -O2 32-bit

Figure 5: Benchmark runtimes -O2 32-bit

Runtime 32-bit -O2 (seconds)

program qemu-arm32 qemu-riscv32 rv8-riscv32 native-x86-32
aes 1.73 1.88 1.41 0.48
bigint 2.73 1.46 1.76 0.85
dhrystone 2.19 1.80 0.41 0.36
miniz 2.98 2.16 1.36 0.88
norx 0.84 0.78 0.83 0.27
primes 4.32 2.31 1.91 1.54
qsort 10.53 4.55 1.16 0.68
sha512 3.78 3.95 2.19 0.57
(Sum) 29.10 18.89 11.03 5.63

Performance Ratio 32-bit -O2 (smaller is better)

program qemu-arm32 qemu-riscv32 rv8-riscv32 native-x86-32
aes 3.59 3.92 2.93 1.00
bigint 3.19 1.71 2.06 1.00
dhrystone 6.11 5.02 1.13 1.00
miniz 3.39 2.46 1.54 1.00
norx 3.10 2.87 3.04 1.00
primes 2.81 1.50 1.24 1.00
qsort 15.44 6.67 1.70 1.00
sha512 6.64 6.92 3.84 1.00
(Geomean) 4.63 3.37 2.00 1.00

benchmark runtimes -Os 32-bit

Figure 6: Benchmark runtimes -Os 32-bit

Runtime 32-bit -Os (seconds)

program qemu-arm32 qemu-riscv32 rv8-riscv32 native-x86-32
aes 1.62 1.57 1.13 0.50
bigint 3.62 1.80 3.21 1.02
dhrystone 4.71 2.31 1.43 0.58
miniz 3.06 2.20 1.56 1.26
norx 1.38 1.18 1.00 0.32
primes 4.46 2.20 2.74 1.38
qsort 8.70 5.01 0.81 0.77
sha512 3.52 2.69 2.20 0.79
(Sum) 31.07 18.96 14.08 6.62

Performance Ratio 32-bit -Os (smaller is better)

program qemu-arm32 qemu-riscv32 rv8-riscv32 native-x86-32
aes 3.24 3.15 2.26 1.00
bigint 3.55 1.76 3.14 1.00
dhrystone 8.14 4.00 2.48 1.00
miniz 2.43 1.74 1.24 1.00
norx 4.32 3.70 3.12 1.00
primes 3.22 1.59 1.98 1.00
qsort 11.24 6.47 1.05 1.00
sha512 4.47 3.41 2.78 1.00
(Geomean) 4.47 2.90 2.11 1.00

Instructions Per Second

Instructions per second in millions comparing qemu, rv8 and native x86:

operation counts -O3 64-bit Figure 5: Millions of Instructions Per Second -O3 64-bit

Instructions per second (MIPS) qemu, rv8 and native 64-bit -O3

program qemu-riscv64-mips rv8-riscv64-mips native-x86-mips
aes 2414 3395 11035
bigint 3738 5712 10557
dhrystone 1843 5274 8369
miniz 2625 3622 5530
norx 2223 2167 9112
primes 2438 4421 6100
qsort 644 2518 5780
sha512 2982 4556 12177
(Geomean) 2149 3769 8232

operation counts -Os 64-bit Figure 6: Millions of Instructions Per Second -Os 64-bit

Instructions per second (MIPS) qemu, rv8 and native 64-bit -Os

program qemu-riscv64-mips rv8-riscv64-mips native-x86-mips
aes 2655 3879 10072
bigint 3973 1955 12873
dhrystone 1250 2462 9073
miniz 2650 3427 5052
norx 1817 2439 8852
primes 2226 3555 6101
qsort 572 3340 6063
sha512 3269 5567 12206
(Geomean) 2008 3175 8355

operation counts -O3 32-bit Figure 7: Millions of Instructions Per Second -O3 32-bit

Instructions per second (MIPS) qemu, rv8 and native 32-bit -O3

program qemu-riscv32-mips rv8-riscv32-mips native-x86-mips
aes 2442 2851 9634
bigint 3964 3835 9780
dhrystone 1998 5667 3747
miniz 2195 3379 4988
norx 2824 2554 9146
primes 3039 3652 6368
qsort 671 2658 6259
sha512 2773 3671 11074
(Geomean) 2259 3428 7186

operation counts -Os 32-bit Figure 8: Millions of Instructions Per Second -Os 32-bit

Instructions per second (MIPS) qemu, rv8 and native 32-bit -Os

program qemu-riscv32-mips rv8-riscv32-mips native-x86-mips
aes 2832 3565 9472
bigint 3856 2166 9817
dhrystone 1435 2345 8479
miniz 2177 3062 4171
norx 1965 1970 8129
primes 2928 2355 7105
qsort 576 3528 5892
sha512 2901 3131 8396
(Geomean) 2063 2702 7441

Logging

rv-sim and rv-sys support the ability to log instructions (--log-instructions), register values (--log-operands) and rv-sys can log page table walks (--log-pagewalks).

Sample output from rv-sim with the --log-instructions option

$ rv-sim -l build/riscv64-unknown-elf/bin/hello-world-pcrel
0000000000000000000 core-0   :0000000000010078 (4501    ) mv          a0, zero           
0000000000000000001 core-0   :000000000001007a (00000597) auipc       a1, pc + 0         
0000000000000000002 core-0   :000000000001007e (02658593) addi        a1, a1, 38         
0000000000000000003 core-0   :0000000000010082 (4631    ) addi        a2, zero, 12       
0000000000000000004 core-0   :0000000000010084 (4681    ) mv          a3, zero           
0000000000000000005 core-0   :0000000000010086 (04000893) addi        a7, zero, 64       
Hello World
0000000000000000006 core-0   :000000000001008a (00000073) ecall                          
0000000000000000007 core-0   :000000000001008e (4501    ) mv          a0, zero           
0000000000000000008 core-0   :0000000000010090 (4581    ) mv          a1, zero           
0000000000000000009 core-0   :0000000000010092 (4601    ) mv          a2, zero           
0000000000000000010 core-0   :0000000000010094 (4681    ) mv          a3, zero           
0000000000000000011 core-0   :0000000000010096 (05d00893) addi        a7, zero, 93       

Tracing

The rv-jit program supports the ability to log RISC-V instructions along with the dynamically translated x86-64 assembly and machine code (--log-jit-trace). This mode is useful for JIT translation debugging and optimisation analysis.

Sample output from rv-jit with the --log-jit-trace option

	# 0x0000000000103d70	addi        a0, zero, 1
		mov r8, 1                               ; 41B801000000
		L3:
	# 0x0000000000103d74	slli        a0, a0, 28
		shl r8, 1C                              ; 49C1E01C
		L4:
	# 0x0000000000103d78	addi        a1, zero, -1
		mov r9, FFFFFFFFFFFFFFFF                ; 49C7C1FFFFFFFF
		L5:
	# 0x0000000000103d7c	sb          a1, 0(a0)
		rex mov byte [r8], r9b                  ; 458808
		L6:
	# 0x0000000000103d80	lbu         a2, 0(a0)
		movzx r10d, byte [r8]                   ; 450FB610
		L7:

Histograms

The rv-sim and rv-sys programs support the ability to record and print histograms. Program counter frequency (--pc-usage-histogram), dynamic instruction frequency (--instruction-usage-histogram) and dynamic register usage (--register-usage-histogram) is supported.

The rv-bin program via the histogram subcommand has the ability to print static instruction frequency and static register usage.

Sample output from rv-sim with the --register-usage-histogram and --instruction-usage-histogram options

$ rv-sim --register-usage-histogram \
         --instruction-usage-histogram \
         build/riscv64-unknown-elf/bin/test-aes 

integer register file
~~~~~~~~~~~~~~~~~~~~~
ra       :0x00000000000197a4
sp       :0x000000007fffff68 gp       :0x0000000000020b18
tp       :0xe87f5200d8d3e2fd t0       :0x0000000000011808
t1       :0x0000000000040000 t2       :0x000000000000035c
s0       :0x0000000000000000 s1       :0x0e9e1c7894d54be1
a0       :0x0000000000000000 a1       :0x0000000000000000
a2       :0x0000000000000000 a3       :0x0000000000000000
a4       :0x0000000000000000 a5       :0x0000000000000000
a6       :0x0000000006021440 a7       :0x000000000000005d
s2       :0x94a7319b493ab93c s3       :0xb643788d224af6fb
s4       :0x7269d597ce6e1df9 s5       :0x5f9113739f9b0d72
s6       :0x404c0e868734cf0c s7       :0x2aea8c1ef338fa59
s8       :0xff41772b6673e771 s9       :0x7a5a4806d4a6fe41
s10      :0x0a595462adddb5a8 s11      :0xda895865c86e2f07
t3       :0x0000000000000001 t4       :0x0000000000000000
t5       :0x0000000004000000 t6       :0x0000000000000000

control and status registers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
instret  :        5148514055 time     :0x000290f90f8a9e3b
pc       :0x000000000001944e fcsr     :0x00000000

register usage histogram
~~~~~~~~~~~~~~~~~~~~~~~~
    1. a5         15.11% [1769998250] #######################################
    2. a2         13.35% [1564476600] ##################################
    3. a7         10.03% [1174405210] #########################
    4. a4          9.15% [1071646586] #######################
    5. a6          7.50% [878706827] ###################
    6. t4          5.12% [599785551] #############
    7. t6          4.91% [574619648] ############
    8. t1          4.65% [545259607] ############
    9. a1          2.26% [264241924] #####
   10. s1          2.15% [251658621] #####
   11. s0          2.02% [236978689] #####
   12. s8          2.01% [234881068] #####
   13. t5          1.79% [209715245] ####
   14. t0          1.68% [197132294] ####
   15. s2          1.63% [190840979] ####
   16. a0          1.61% [188744347] ####
   17. s4          1.50% [176160835] ###
   18. s5          1.50% [176160833] ###
   19. s3          1.47% [171966617] ###
   20. sp          1.43% [167773141] ###
   21. s6          1.43% [167772235] ###
   22. t3          1.43% [167772225] ###
   23. s7          1.43% [167772223] ###
   24. t2          1.36% [159383552] ###
   25. zero        0.86% [100664377] ##
   26. s9          0.75% [88080392 ] #
   27. a3          0.68% [79693333 ] #
   28. s11         0.68% [79691785 ] #
   29. s10         0.43% [50331663 ] #
   30. ra          0.07% [8388996  ] 
   31. gp          0.00% [90       ] 

instruction usage histogram
~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1. lw         16.37% [843055543] #######################################
    2. xor        13.85% [713031972] ################################
    3. add        13.69% [704643870] ################################
    4. slli       13.20% [679477645] ###############################
    5. srliw      10.75% [553648296] #########################
    6. andi        9.94% [511705332] #######################
    7. srli        3.05% [157286473] #######
    8. lbu         2.93% [150995155] ######
    9. addi        2.69% [138412722] ######
   10. addiw       2.44% [125829177] #####
   11. sd          1.79% [92275184 ] ####
   12. sb          1.47% [75497501 ] ###
   13. jal         1.14% [58720476 ] ##
   14. beq         1.06% [54526059 ] ##
   15. ld          0.98% [50332182 ] ##
   16. and         0.98% [50331715 ] ##
   17. slliw       0.98% [50331682 ] ##
   18. bne         0.90% [46137534 ] ##
   19. or          0.65% [33554442 ] #
   20. auipc       0.41% [20971539 ] 
   21. lui         0.24% [12583002 ] 
   22. mulw        0.16% [8388608  ] 
   23. lwu         0.16% [8388608  ] 
   24. jalr        0.08% [4194443  ] 
   25. sraiw       0.08% [4194314  ] 
   26. sw          0.00% [213      ] 
   27. bltu        0.00% [78       ] 
   28. bge         0.00% [39       ] 
   29. blt         0.00% [33       ] 
   30. bgeu        0.00% [33       ] 
   31. sub         0.00% [29       ] 
...

Linux

This section describes how to build and boot a Linux image in the full system emulator.

Please read the RISC-V toolchain installation instructions in the riscv-gnu-toolchain repository. The riscv64-unknown-elf newlib toolchain is required for building the rv8 test cases and the riscv64-unknown-linux-gnu glibc toolchain is required for building busybox which is used to create the Linux image that runs in the full system emulator.

The toolchain script will download any required dependencies.

Building riscv-gnu-toolchain for linux

$ cd riscv-gnu-toolchain
$ make linux

The linux build script will download any required dependencies.

Building linux, busybox and bbl-lite

$ cd rv8
$ export RISCV=/opt/riscv/toolchain
$ export PATH=${PATH}:${RISCV}/bin
$ make linux

To start linux, we execute bbl (the Berkeley Boot Loader) which performs early machine set up and then passes control to an embedded linux kernel. After kernel initialisation, busybox is then executed from the initramfs as pid 1 (init). The linux image and the initramfs are combined together and linked into bbl as the boot payload.

Running the full system emulator

$ rv-sys build/riscv64-unknown-elf/bin/bbl
              vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
                  vvvvvvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrr       vvvvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrr      vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrrrr    vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrrrr    vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrrrr    vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrr      vvvvvvvvvvvvvvvvvvvvvv  
rrrrrrrrrrrrr       vvvvvvvvvvvvvvvvvvvvvv    
rr                vvvvvvvvvvvvvvvvvvvvvv      
rr            vvvvvvvvvvvvvvvvvvvvvvvv      rr
rrrr      vvvvvvvvvvvvvvvvvvvvvvvvvv      rrrr
rrrrrr      vvvvvvvvvvvvvvvvvvvvvv      rrrrrr
rrrrrrrr      vvvvvvvvvvvvvvvvvv      rrrrrrrr
rrrrrrrrrr      vvvvvvvvvvvvvv      rrrrrrrrrr
rrrrrrrrrrrr      vvvvvvvvvv      rrrrrrrrrrrr
rrrrrrrrrrrrrr      vvvvvv      rrrrrrrrrrrrrr
rrrrrrrrrrrrrrrr      vv      rrrrrrrrrrrrrrrr
rrrrrrrrrrrrrrrrrr          rrrrrrrrrrrrrrrrrr
rrrrrrrrrrrrrrrrrrrr      rrrrrrrrrrrrrrrrrrrr
rrrrrrrrrrrrrrrrrrrrrr  rrrrrrrrrrrrrrrrrrrrrr

       INSTRUCTION SETS WANT TO BE FREE
[    0.000000] Linux version 4.6.2-00044-g250754b-dirty (mclark@minty) (gcc version 7.1.1 20170509 (GCC) ) #2 Sat Jul 1 22:42:14 NZST 2017
[    0.000000] bootconsole [early0] enabled
[    0.000000] Available physical memory: 1020MB
[    0.000000] Initial ramdisk at: 0xffffffff800100b0 (261728 bytes)
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000080400000-0x00000000bfffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080400000-0x00000000bfffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080400000-0x00000000bfffffff]
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 257550
[    0.000000] Kernel command line: earlyprintk=sbi-console rdinit=/sbin/init 
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.000000] Sorting __ex_table...
[    0.000000] Memory: 1025684K/1044480K available (1701K kernel code, 125K rwdata, 488K rodata, 324K init, 230K bss, 18796K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS:0 nr_irqs:0 0
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 191126044627 ns
[    0.000000] Calibrating delay loop (skipped), value calculated using timer frequency.. 20.00 BogoMIPS (lpj=100000)
[    0.000000] pid_max: default: 32768 minimum: 301
[    0.010000] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes)
[    0.010000] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes)
[    0.010000] devtmpfs: initialized
[    0.010000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.010000] NET: Registered protocol family 16
[    0.010000] clocksource: Switched to clocksource riscv_clocksource
[    0.010000] NET: Registered protocol family 2
[    0.010000] TCP established hash table entries: 8192 (order: 4, 65536 bytes)
[    0.010000] TCP bind hash table entries: 8192 (order: 4, 65536 bytes)
[    0.010000] TCP: Hash tables configured (established 8192 bind 8192)
[    0.010000] UDP hash table entries: 512 (order: 2, 16384 bytes)
[    0.010000] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[    0.010000] NET: Registered protocol family 1
[    0.010000] Unpacking initramfs...
[    0.010000] console [sbi_console0] enabled
[    0.010000] console [sbi_console0] enabled
[    0.010000] bootconsole [early0] disabled
[    0.010000] bootconsole [early0] disabled
[    0.010000] futex hash table entries: 256 (order: 0, 6144 bytes)
[    0.010000] workingset: timestamp_bits=61 max_order=18 bucket_order=0
[    0.010000] 9p: Installing v9fs 9p2000 file system support
[    0.010000] io scheduler noop registered
[    0.010000] io scheduler cfq registered (default)
[    0.010000] 9pnet: Installing 9P2000 support
[    0.010000] Freeing unused kernel memory: 324K (ffffffff80000000 - ffffffff80051000)
[    0.010000] This architecture does not have kernel memory protection.


BusyBox v1.26.1 (2017-06-30 16:46:44 NZST) built-in shell (ash)
Enter 'help' for a list of built-in commands.

/ # 

References