Metrics and performance testing
There are two distinct "how good is it?" questions in SKaiNET:
-
Model quality — does the network make correct predictions? Measured with metrics (accuracy, error) computed from a forward pass.
-
Engine performance — how fast does the math run? Measured with the benchmark suite.
This page covers both and points at the deeper references for each.
Measuring model quality
A metric is computed from a forward pass over held-out data. The training example classifies two clusters and then measures classification accuracy — the fraction of samples whose predicted label matches the truth:
// Metric: classification accuracy on a fresh inference context.
val evalCtx = DirectCpuExecutionContext()
val preds = model.forward(x, evalCtx)
var correct = 0
for (i in 0 until n) {
val score = preds.data.get(i, 0)
val predicted = if (score >= 0f) 1f else -1f
if (predicted == labelsFlat[i]) correct++
}
val accuracy = correct.toFloat() / n
The same shape applies to any metric: run model.forward(x, ctx) on an evaluation
context, then reduce predictions against targets. The snippet is compiled and run in
CI from skainet-docs-samples (TrainingDemo.kt); the full training loop that
produces model is in Kotlin getting started.
|
Always evaluate on a fresh inference context ( |
Benchmarking engine performance
Engine performance is a separate concern with its own reproducible harness. Rather than ad-hoc timing in user code, SKaiNET ships an official benchmark suite:
-
Engine benchmark program — what the suite measures, headline vs. secondary metrics, lanes, the result-record schema, and how to reproduce a public run locally.
-
Reading the matmul benchmark — interpreting the numbers for the kernel that dominates inference cost.
-
Register a self-hosted bench runner — running the suite on your own hardware.
Performance-testing practices
-
Pin methodology — fixed warmup/iteration counts and a stable machine; see the Methodology pinning section of the benchmark guide.
-
Compare against a baseline run rather than absolute numbers; hardware varies.
-
For backend-level context on where the time goes, see JVM CPU performance and How SIMD kernels are built.
Related
-
Kotlin getting started — defines and trains the model measured here.
-
Kernel × platform support — which kernels back each op per platform.