DSL Networks vs Hand-Coded Runtimes

Two Approaches to Model Definition

SKaiNET Transformers supports two ways to define a model’s forward pass:

Hand-Coded Runtime (Legacy)

A class that extends DecoderRuntime and implements each layer explicitly:

class LlamaRuntime<T>(/* ... */) : DecoderRuntime<T>(ctx, dtype) {
    override fun runLayer(layerIdx: Int, x: Tensor<T, Float>): Tensor<T, Float> {
        val normed = rmsNorm(x, weights.attnNorm[layerIdx])
        val q = matmul(normed, weights.wq[layerIdx])
        val k = matmul(normed, weights.wk[layerIdx])
        // ... 50+ lines of attention + FFN
    }
}

DSL Network Definition (Current)

A pure function that declares the architecture using the network DSL:

fun <T : DType, V> llamaNetwork(metadata: LlamaModelMetadata): Module<T, V> {
    return sequential<T, V> {
        embedding(vocabSize, dim, id = "token_embd")
        for (layer in 0 until nLayers) {
            rmsNorm(dim, eps, id = "attn_norm")
            multiHeadAttention(dim, nHeads, nKVHeads, causal = true) {
                rope(headDim, seqLen)
                kvCache(seqLen, nKVHeads, headDim)
            }
            residual()
            rmsNorm(dim, eps, id = "ffn_norm")
            swiGluFFN(dim, ffnDim)
            residual()
        }
        rmsNorm(dim, eps, id = "output_norm")
    }
}

Why DSL is Preferred

Compute Graph Optimization

DSL networks can be traced into a ComputeGraph (DAG) and optimized:

TransposeEliminationPass — folds weight transposes into matmul, eliminating O(n^2) copies
LLMFusionPass — fuses RMSNorm (7 ops → 1), SwiGLU FFN (5 ops → 1), QKV projections (3 → 1)
DeadCodeEliminationPass — removes unused intermediate tensors

Hand-coded runtimes cannot benefit from these optimizations because operations are imperative, not declarative.

Weight Loading is Automatic

DSL modules have named parameters (e.g., "blk.0/attn/q_proj"). WeightMapper matches these to GGUF tensor names via WeightNameResolver. No manual weight loading code needed.

Multiple Execution Modes

The same DSL definition supports:

DIRECT: Execute the module tree directly (debugging, correctness testing)
HYBRID: Compile compute-heavy subgraphs, run attention imperatively (best balance)
OPTIMIZED: Full DAG compilation and execution (maximum performance)

Adding New Architectures is Simpler

A new architecture is a single function, not a 500-line class. If the architecture uses standard building blocks (MHA, RMSNorm, FFN), the DSL already has them.

When Hand-Coded Runtimes Are Needed

Some architectures have components the DSL cannot express:

Qwen3.5 DeltaNet — hybrid DeltaNet (linear attention + SSM) layers with causal 1D convolution
Gemma3n — variable FFN dimensions per layer (MatFormer), per-layer embeddings
Voxtral — ODE flow matching for audio codec

These use DecoderRuntime directly. The goal is to extend the DSL to support these patterns over time.

Current Status

Model DSL Status

Model	DSL	Status
LLaMA/Mistral	`llamaNetwork()`	`LlamaRuntime` deprecated
Qwen2/3	`qwenNetwork()`	Delegates to `llamaNetwork()`
Apertus	`apertusNetwork()`	`ApertusRuntime` deprecated
BERT	`bertNetwork()`	`BertRuntime` deprecated
Voxtral	`voxtralBackboneNetwork()`	Partial DSL
Gemma3n	none	Hand-coded only
Qwen3.5	none	Hand-coded (DeltaNet)

LLaMA/Mistral

llamaNetwork()

LlamaRuntime deprecated

Qwen2/3

qwenNetwork()

Delegates to llamaNetwork()

Apertus

apertusNetwork()

ApertusRuntime deprecated

BERT

bertNetwork()

BertRuntime deprecated

Voxtral

voxtralBackboneNetwork()

Partial DSL

Gemma3n

none

Hand-coded only

Qwen3.5

none

Hand-coded (DeltaNet)