Metrics and performance testing

There are two distinct "how good is it?" questions in SKaiNET:

  • Model quality — does the network make correct predictions? Measured with metrics (accuracy, error) computed from a forward pass.

  • Engine performance — how fast does the math run? Measured with the benchmark suite.

This page covers both and points at the deeper references for each.

Measuring model quality

A metric is computed from a forward pass over held-out data. The training example classifies two clusters and then measures classification accuracy — the fraction of samples whose predicted label matches the truth:

        // Metric: classification accuracy on a fresh inference context.
        val evalCtx = DirectCpuExecutionContext()
        val preds = model.forward(x, evalCtx)
        var correct = 0
        for (i in 0 until n) {
            val score = preds.data.get(i, 0)
            val predicted = if (score >= 0f) 1f else -1f
            if (predicted == labelsFlat[i]) correct++
        }
        val accuracy = correct.toFloat() / n

The same shape applies to any metric: run model.forward(x, ctx) on an evaluation context, then reduce predictions against targets. The snippet is compiled and run in CI from skainet-docs-samples (TrainingDemo.kt); the full training loop that produces model is in Kotlin getting started.

Always evaluate on a fresh inference context (DirectCpuExecutionContext()), separate from the autograd/training context, so metric computation does not record gradients.

Benchmarking engine performance

Engine performance is a separate concern with its own reproducible harness. Rather than ad-hoc timing in user code, SKaiNET ships an official benchmark suite:

Performance-testing practices