Kotlin getting started

SKaiNET is, at heart, a set of DSLs for tensors, networks, and models. This tutorial takes the shortest path through them: build a small classifier in code and run one forward pass. Every snippet below is compiled and executed in CI from skainet-docs-samples — see the skainet-docs-samples module (Quickstart.kt).

Define a model

A network is a Kotlin block. The sequential { } DSL stacks layers in order; input(n) declares the feature count, dense(n) adds a fully-connected layer, and each layer’s activation is just a lambda over a Tensor:

    fun buildModel(ctx: DirectCpuExecutionContext) =
        sequential<FP32, Float>(ctx) {
            input(784)                                  // 28x28 flattened
            dense(128) { activation = { it.relu() } }   // hidden layer
            dense(10) { activation = { it.softmax(1) } } // class scores
        }

This is a 784 → 128 (ReLU) → 10 (Softmax) MNIST-shaped classifier. No builder objects, no config files — the architecture reads top-to-bottom as code.

Run a forward pass

Create a CPU execution context, build the model against it, wrap your input as a Tensor, and call forward:

    fun classify(pixels: FloatArray): Tensor<FP32, Float> {
        val ctx = DirectCpuExecutionContext.create()
        val model = buildModel(ctx)

        // Shape is [batch, features]; one sample here.
        val input = ctx.fromFloatArray<FP32, Float>(Shape(1, 784), FP32::class, pixels)

        return model.forward(input, ctx)               // [1, 10] class scores
    }

The result is a [1, 10] tensor of class scores. For an untrained model the scores are meaningless — see training below to make them learn.

Train it

Training is the same spirit: a training { } block wires a model, a loss, and an optimizer, and step runs one forward/backward/update. This example learns to separate two clusters and then measures accuracy — see Metrics and performance testing for the full walkthrough, and the runnable TrainingDemo.kt in skainet-docs-samples.

        // A graph (autograd) context for training; a plain CPU context for inference.
        val baseCtx = DirectCpuExecutionContext()
        val trainCtx = DefaultGraphExecutionContext(
            baseOps = baseCtx.ops,
            phase = Phase.TRAIN,
            createTapeFactory = { _ -> DefaultGradientTape() },
        )

        val rng = Random(42)
        val model = sequential<FP32, Float>(trainCtx) {
            input(2)
            dense(8) { weights { randn(std = 0.5f, random = rng) } }
            activation { it.tanh() }
            dense(1) { weights { randn(std = 0.5f, random = rng) } }
            activation { it.tanh() }
        }

        val x = baseCtx.fromFloatArray<FP32, Float>(Shape(n, 2), FP32::class, featuresFlat)
        val y = baseCtx.fromFloatArray<FP32, Float>(Shape(n, 1), FP32::class, labelsFlat)

        val runner = training<FP32, Float> {
            model { model }
            loss { MSELoss() }
            optimizer {
                sgd(lr = 0.1).apply {
                    model.trainableParameters().forEach { addParameter(it) }
                }
            }
        }

        var firstLoss = 0f
        var lastLoss = 0f
        repeat(150) { epoch ->
            val loss = runner.step(trainCtx, x, y).data.get()
            if (epoch == 0) firstLoss = loss
            lastLoss = loss
        }

Next steps