Memory Map Reference
Address Space
0x00000000 ┌─────────────────────┐
│ ITCM │ 8 KB (0x2000 bytes)
│ .text │ Machine code
│ .rodata │ Read-only data (constants)
│ .init_array │ C++ static constructors
0x00002000 ├─────────────────────┤
│ (unmapped gap) │
0x00010000 ├─────────────────────┤
│ DTCM │ 32 KB (0x8000 bytes)
│ .data │ Initialized global variables
│ .bss │ Zero-initialized globals
│ .heap │ Dynamic allocation (grows up)
│ .stack │ Stack (grows down from top)
0x00018000 ├─────────────────────┤
│ (unmapped gap) │
0x20000000 ├─────────────────────┤
│ EXTMEM │ 4 MB (0x400000 bytes)
│ .extdata │ Overflow initialized data
│ .extbss │ Overflow zero-initialized data
0x20400000 └─────────────────────┘
Memory Regions
| Region | Base | Size | End | Contents |
|---|---|---|---|---|
ITCM |
|
8 KB |
|
Code, read-only data, init arrays |
DTCM |
|
32 KB |
|
Data, BSS, heap (up), stack (down) |
EXTMEM |
|
4 MB |
|
Overflow data (higher latency) |
Linker Script Sections
From coralnpu_tcm.ld.tpl:
ITCM Sections (Code Memory)
| Section | Contents | Alignment |
|---|---|---|
|
Machine code ( |
4 bytes |
|
Read-only constants (weight arrays, string literals) |
4 bytes |
|
C++ static constructor pointers |
4 bytes |
|
C++ static destructor pointers |
4 bytes |
DTCM Sections (Data Memory)
| Section | Contents | Alignment |
|---|---|---|
|
Initialized global variables (input/output arrays with |
4 bytes |
|
Small initialized data (accessed via global pointer) |
4 bytes |
|
Zero-initialized globals (cleared by CRT startup) |
4 bytes |
|
Small zero-initialized data |
4 bytes |
|
Dynamic allocation area (grows upward from |
8 bytes |
|
Stack area (grows downward from |
16 bytes |
Linker Symbols
The linker script exports symbols used by CRT startup and the simulator:
| Symbol | Type | Purpose |
|---|---|---|
|
Address |
Start of |
|
Address |
End of |
|
Address |
Start of |
|
Address |
Start of |
|
Address |
End of |
|
Address |
Bottom of heap (for |
|
Address |
Top of stack (initial SP value) |
|
Variable |
Return value from |
|
Address |
Global pointer for |
|
Variable |
Input arrays (accessed by simulator) |
|
Variable |
Output arrays (read by simulator) |
Memory Budget Example: RGB-to-Grayscale
| Item | Region | Size (bytes) | Size (floats) |
|---|---|---|---|
|
DTCM |
192 |
48 |
|
DTCM |
64 |
16 |
|
ITCM |
12 |
3 |
Code (main + CRT) |
ITCM |
~2,000 |
— |
Stack |
DTCM |
~1,024 |
— |
Total ITCM |
~2,012 / 8,192 (25%) |
||
Total DTCM |
~1,280 / 32,768 (4%) |
Memory Budget Example: 224×224 RGB Image
| Item | Region | Size (bytes) | Fits? |
|---|---|---|---|
Input (1×3×224×224) |
DTCM |
602,112 |
No (32 KB limit) — must use EXTMEM or tiling |
Output (1×1×224×224) |
DTCM |
200,704 |
No — must use EXTMEM or tiling |
Weights (1×3×1×1) |
ITCM |
12 |
Yes |
For production models, tiling is required: process the image in spatial tiles (e.g., 16×16) that fit in DTCM, writing results back to EXTMEM between tiles.
I/O Convention
The NPU uses a shared-memory I/O model:
-
Simulator writes input data to the
input_0symbol address in DTCM -
Program executes
main(), reading frominput_0and writing tooutput_0 -
Simulator reads output data from the
output_0symbol address -
No syscalls, no file I/O, no interrupts — pure memory-mapped data exchange