Coral NPU ISA Reference
Extensions
| Extension | Version | Description |
|---|---|---|
|
2.1 |
Base 32-bit integer: ALU, load/store, branches, jumps |
|
2.0 |
Integer multiply/divide: |
|
2.2 |
Single-precision float: |
|
1.0 |
128-bit SIMD vector: 4×f32, 8×i16, 16×i8 per register |
|
2.0 |
Control/status registers: |
|
2.0 |
Instruction-fetch fence: |
|
1.0 |
Bit manipulation: |
ABI
| Parameter | Value |
|---|---|
ABI |
|
Code model |
|
Endianness |
Little-endian |
Pointer size |
32-bit |
Integer size |
32-bit |
Long size |
32-bit |
Float |
Hardware single-precision (32-bit) |
Double |
Not supported (no |
Register File
Scalar Registers
| Register | ABI Name | Usage |
|---|---|---|
x0 |
zero |
Hardwired zero |
x1 |
ra |
Return address |
x2 |
sp |
Stack pointer |
x3 |
gp |
Global pointer |
x4 |
tp |
Thread pointer (unused — no OS) |
x5-x7 |
t0-t2 |
Temporaries |
x8 |
s0/fp |
Saved register / frame pointer |
x9 |
s1 |
Saved register |
x10-x11 |
a0-a1 |
Function arguments / return values |
x12-x17 |
a2-a7 |
Function arguments |
x18-x27 |
s2-s11 |
Saved registers |
x28-x31 |
t3-t6 |
Temporaries |
Pipeline
4-stage in-order scalar pipeline:
Fetch → Decode → Execute → Writeback
-
Fetch: Static branch prediction (backward=taken, forward=not-taken). 1-cycle misprediction penalty.
-
Decode: 4-way dispatch — scalar ops to execute unit, vector ops to command FIFO.
-
Execute: ALU, FPU, load/store.
-
Writeback: Result commit.
The vector backend is decoupled via a FIFO and executes asynchronously.
Stripmining
A single vector instruction in dispatch expands to 4 issue events:
vadd v0 → vadd v0 : vadd v1 : vadd v2 : vadd v3
This provides 4× throughput per dispatch slot.
MAC Operation
Outer-product multiply-accumulate:
-
8 parallel VDOT units
-
Each VDOT: 4× int8 multiply → int32 accumulate
-
Total: 256 MACs/cycle
-
Accumulator: 8×8 × 32-bit result matrix
CSR Registers
| CSR | Address | Purpose |
|---|---|---|
|
|
Machine status (FP/Vector enable bits) |
|
|
Cycle counter (lower 32 bits) |
|
|
Cycle counter (upper 32 bits) |
|
|
Instructions retired counter |
|
|
Trap vector base address |
Custom |
Vendor-specific |
Debug, performance counters |
Compiler Flags
# Clang cross-compilation flags
-target riscv32-unknown-elf
-march=rv32imf_zve32x_zicsr_zifencei_zbb
-mabi=ilp32
-mcmodel=medany
-O3
-nostdlib
-fno-exceptions
-fno-rtti
Unsupported Features
-
No
dextension (no double-precision float) -
No
aextension (no atomic instructions — single-threaded) -
No
cextension (encoding space reclaimed for vector registers) -
No interrupts in run-to-completion mode
-
No virtual memory (bare-metal, physical addresses only)
-
No f16 hardware (f16 values must be promoted to f32 in software)