Coral NPU ISA Reference :: Coral NPU Full Stack Docs

ISA String

rv32imf_zve32x_zicsr_zifencei_zbb

Extensions

Extension Version Description

Extension	Version	Description
`rv32i`	2.1	Base 32-bit integer: ALU, load/store, branches, jumps
`m`	2.0	Integer multiply/divide: `mul`, `mulh`, `div`, `rem`
`f`	2.2	Single-precision float: `fadd.s`, `fmul.s`, `fdiv.s`, `fsqrt.s`, `fmadd.s`
`zve32x`	1.0	128-bit SIMD vector: 4×f32, 8×i16, 16×i8 per register
`zicsr`	2.0	Control/status registers: `csrrw`, `csrrs`, `csrrc`
`zifencei`	2.0	Instruction-fetch fence: `fence.i`
`zbb`	1.0	Bit manipulation: `clz`, `ctz`, `cpop`, `min`, `max`, `orc.b`, `rev8`, `rol`, `ror`

rv32i

2.1

Base 32-bit integer: ALU, load/store, branches, jumps

m

2.0

Integer multiply/divide: mul, mulh, div, rem

f

2.2

Single-precision float: fadd.s, fmul.s, fdiv.s, fsqrt.s, fmadd.s

zve32x

1.0

128-bit SIMD vector: 4×f32, 8×i16, 16×i8 per register

zicsr

2.0

Control/status registers: csrrw, csrrs, csrrc

zifencei

2.0

Instruction-fetch fence: fence.i

zbb

1.0

Bit manipulation: clz, ctz, cpop, min, max, orc.b, rev8, rol, ror

ABI

Parameter Value

Parameter	Value
ABI	`ilp32`
Code model	`medany`
Endianness	Little-endian
Pointer size	32-bit
Integer size	32-bit
Long size	32-bit
Float	Hardware single-precision (32-bit)
Double	Not supported (no `d` extension)

ABI

ilp32

Code model

medany

Endianness

Little-endian

Pointer size

32-bit

Integer size

32-bit

Long size

32-bit

Float

Hardware single-precision (32-bit)

Double

Not supported (no d extension)

Register File

Scalar Registers

Register	ABI Name	Usage
x0	zero	Hardwired zero
x1	ra	Return address
x2	sp	Stack pointer
x3	gp	Global pointer
x4	tp	Thread pointer (unused — no OS)
x5-x7	t0-t2	Temporaries
x8	s0/fp	Saved register / frame pointer
x9	s1	Saved register
x10-x11	a0-a1	Function arguments / return values
x12-x17	a2-a7	Function arguments
x18-x27	s2-s11	Saved registers
x28-x31	t3-t6	Temporaries

Register

ABI Name

Usage

x0

zero

Hardwired zero

x1

ra

Return address

x2

sp

Stack pointer

x3

gp

Global pointer

x4

tp

Thread pointer (unused — no OS)

x5-x7

t0-t2

Temporaries

x8

s0/fp

Saved register / frame pointer

x9

s1

Saved register

x10-x11

a0-a1

Function arguments / return values

x12-x17

a2-a7

Function arguments

x18-x27

s2-s11

Saved registers

x28-x31

t3-t6

Temporaries

Vector Registers

Register	Width	Data Types
v0..v63	256-bit	8×i32, 16×i16, 32×i8, 8×f32
acc[8][8]	8×8×32-bit	Accumulator array for outer-product MAC

Register

Width

Data Types

v0..v63

256-bit

8×i32, 16×i16, 32×i8, 8×f32

acc[8][8]

8×8×32-bit

Accumulator array for outer-product MAC

The C extension encoding space is reclaimed to provide 6-bit vector register indices (64 registers) instead of the standard 5-bit (32 registers).

Pipeline

4-stage in-order scalar pipeline:

Fetch → Decode → Execute → Writeback

Fetch: Static branch prediction (backward=taken, forward=not-taken). 1-cycle misprediction penalty.
Decode: 4-way dispatch — scalar ops to execute unit, vector ops to command FIFO.
Execute: ALU, FPU, load/store.
Writeback: Result commit.

The vector backend is decoupled via a FIFO and executes asynchronously.

Stripmining

A single vector instruction in dispatch expands to 4 issue events:

vadd v0 → vadd v0 : vadd v1 : vadd v2 : vadd v3

This provides 4× throughput per dispatch slot.

MAC Operation

Outer-product multiply-accumulate:

8 parallel VDOT units
Each VDOT: 4× int8 multiply → int32 accumulate
Total: 256 MACs/cycle
Accumulator: 8×8 × 32-bit result matrix

CSR Registers

CSR Address Purpose

CSR	Address	Purpose
`mstatus`	`0x300`	Machine status (FP/Vector enable bits)
`mcycle`	`0xB00`	Cycle counter (lower 32 bits)
`mcycleh`	`0xB80`	Cycle counter (upper 32 bits)
`minstret`	`0xB02`	Instructions retired counter
`mtvec`	`0x305`	Trap vector base address
Custom	Vendor-specific	Debug, performance counters

mstatus

0x300

Machine status (FP/Vector enable bits)

mcycle

0xB00

Cycle counter (lower 32 bits)

mcycleh

0xB80

Cycle counter (upper 32 bits)

minstret

0xB02

Instructions retired counter

mtvec

0x305

Trap vector base address

Custom

Vendor-specific

Debug, performance counters

Compiler Flags

# Clang cross-compilation flags
-target riscv32-unknown-elf
-march=rv32imf_zve32x_zicsr_zifencei_zbb
-mabi=ilp32
-mcmodel=medany
-O3
-nostdlib
-fno-exceptions
-fno-rtti

Unsupported Features

No d extension (no double-precision float)
No a extension (no atomic instructions — single-threaded)
No c extension (encoding space reclaimed for vector registers)
No interrupts in run-to-completion mode
No virtual memory (bare-metal, physical addresses only)
No f16 hardware (f16 values must be promoted to f32 in software)