Add a New Model Architecture
Option A: DSL Network Definition (Recommended)
If the architecture is a standard transformer variant, define it using the network DSL.
1. Create the Network Definition
Create a new file in llm-inference/<family>/src/commonMain/kotlin/:
public inline fun <reified T : DType, V> myModelNetwork(
metadata: LlamaModelMetadata
): Module<T, V> {
return sequential<T, V> {
val dslImpl = this as NeuralNetworkDslImpl<T, V>
dslImpl.embedding(metadata.vocabSize, metadata.embeddingLength, id = "token_embd")
val nnCtx = DefaultNeuralNetworkExecutionContext()
for (layer in 0 until metadata.blockCount) {
val stage = StageImpl<T, V>(nnCtx, "blk.$layer", T::class)
// Define your layer architecture here
stage.rmsNorm(dim, eps, id = "attn_norm")
stage.multiHeadAttention(dim, nHeads, nKVHeads, causal = true, id = "attn") {
rope(headDim, seqLen)
kvCache(seqLen, nKVHeads, headDim)
}
stage.residual()
stage.rmsNorm(dim, eps, id = "ffn_norm")
stage.swiGluFFN(dim, ffnDim, id = "ffn")
stage.residual()
dslImpl.modules += HybridTransformerBlock(stage.modules.toList(), name = "blk.$layer")
}
dslImpl.rmsNorm(dim, eps, id = "output_norm")
dslImpl.modules += VoidDenseModule<T, V>("output", vocabSize, dim)
}
}
Option B: Hand-Coded Runtime
For architectures with non-standard components (e.g., DeltaNet, sliding window), extend DecoderRuntime:
class MyModelRuntime<T : DType>(
// ...
) : DecoderRuntime<T>(ctx, dtype) {
override fun embedToken(tokenId: Int): Tensor<T, Float> { ... }
override fun runLayer(layerIdx: Int, x: Tensor<T, Float>): Tensor<T, Float> { ... }
override fun outputNorm(x: Tensor<T, Float>): Tensor<T, Float> { ... }
override fun outputProject(x: Tensor<T, Float>): Tensor<T, Float> { ... }
override fun resetState() { ... }
}
| DSL definitions are preferred because they enable compute graph optimization. Hand-coded runtimes should only be used for architectures the DSL cannot express. |