SKaiNET Compilation Layer
Overview
The SKaiNET compilation layer (skainet-compile-* modules) takes tensor operations defined in Kotlin and produces StableHLO MLIR — the standard portable IR for ML computations. This is a trace-based compilation approach, similar to JAX’s tracing or PyTorch 2.0’s torch.compile.
Tape Recording (Trace-Based Compilation)
When you execute a model function like rgb2GrayScaleMatMul(), the tensor operations don’t compute values immediately. Instead, they record themselves onto a tape — a linear log of operations with their operands and result types.
How It Works
// This Kotlin code...
val weights = constant(floatArrayOf(0.299f, 0.587f, 0.114f), Shape1D(3))
val result = input.convolution(weights, stride=1, padding=0)
…produces this tape:
Op #0: ConstantOp(values=[0.299, 0.587, 0.114], shape=[3], dtype=f32) Op #1: ConvolutionOp(lhs=#input, rhs=#0, stride=[1,1], padding=[[0,0],[0,0]])
Why Trace Instead of Eager?
Eager execution (computing values immediately) prevents optimization across operations. Trace-based compilation captures the full computation graph before execution, enabling:
-
Constant folding: If two constants are added, compute the result at compile time
-
Operation fusion: Combine
conv + bias + reluinto a single fused kernel -
Dead code elimination: Remove operations whose results are never used
-
Memory planning: Know all tensor sizes upfront, enabling optimal memory layout
The trade-off is that trace-based compilation cannot handle data-dependent control flow (if/else based on tensor values). For ML inference, this is rarely a limitation — the computation graph is fixed at model load time.
Graph Construction (DAG Analysis)
The skainet-compile-dag module converts the linear tape into a DAG (directed acyclic graph). Each operation becomes a node; data dependencies become edges.
For the grayscale model, the DAG is trivial:
For complex models (ResNets, YOLO heads), the DAG captures skip connections, multi-scale outputs, and shared parameters.
Type Mapping: Kotlin to MLIR
The TypeMapper converts SKaiNET’s Kotlin type system to MLIR types:
| Kotlin Type | MLIR Type | Notes |
|---|---|---|
|
|
4D tensor with f32 elements |
|
|
Promoted to f32 for Coral NPU (no hardware f16) |
|
|
Used for quantized models |
|
|
Dimensions in NCHW order |
|
|
Element type |
|
|
Half-precision |
StableHLO Conversion
The StableHloConverter maps each graph operation to one or more StableHLO operations. This is a direct translation — no optimization happens here.
Converter Registry
| Converter | Graph Ops | StableHLO Ops |
|---|---|---|
|
add, subtract, multiply, divide |
|
|
matmul, dot, transpose |
|
|
relu, silu, softmax |
|
|
conv2d, batch_norm, pooling |
|
|
constant, parameter |
|
Optimization Framework
The StableHloOptimizer applies passes that transform the MLIR text to reduce operation count, memory traffic, and code size. See Optimization Passes for a detailed breakdown of each pass.
Dual Output Paths
SKaiNET supports two compilation targets from the same computation graph:
Path 1: StableHLO MLIR (for NPU via iree-tools)
./gradlew :skainet-compile:skainet-compile-hlo:generateHlo \
-Pmodel=rgb2grayscale -Poutput=rgb2grayscale.mlir
Produces standard StableHLO that can be consumed by IREE, the Python transpiler, or any MLIR toolchain.
Path 2: C99 Source (for Arduino/embedded)
The skainet-compile-c module generates C99 code with Arduino library conventions — header files, setup()/loop() entry points, and platform-independent math. This path bypasses MLIR entirely and targets microcontrollers that have C compilers but not MLIR toolchains.
The KSP Code Generation Layer
SKaiNET uses Kotlin Symbol Processing (KSP) to generate boilerplate code at compile time:
-
@GenerateTensorOp— generates type-safe tensor operation methods -
@GenerateNetworkDsl— generates thenn { }DSL builder functions -
@GenerateGraphDsl— generates thedag { }DSL builder functions
This means adding a new operation to SKaiNET requires defining it once with annotations, and KSP generates the DSL extensions, tape recording hooks, and type inference code automatically.