Official Engine Benchmarks
|
Audience: SKaiNET maintainers. This page documents the project’s own benchmark publication program β how we produce the numbers that appear on OpenBenchmarking. Library users consuming SKaiNET as a dependency do not need to read this; their performance story lives under Explanation β Performance. |
The SKaiNET Compute Engine Suite publishes throughput and latency microbenchmarks for the engine’s CPU kernel paths under Phoronix Test Suite + OpenBenchmarking conventions. The suite is intentionally small and stable so it can be re-run on every release and stay comparable across versions.
This page covers the engine-level program only. The runtime-level LLM benchmark program (SKaiNET-transformers) is a separate suite shipped from that repository.
What the suite measures
The engine suite ships eight scenarios, all driven by the same
publication harness (skainet-backends/benchmarks/jvm-cpu-publish) and
mirroring the existing JVM CPU JMH benchmarks under
skainet-backends/benchmarks/jvm-cpu-jmh/ plus the upstream Bf16 / Q8_0
microbench tests under skainet-backends/skainet-backend-native-cpu:
| Scenario | What it exercises | Unit | Direction |
|---|---|---|---|
|
|
GFLOPS |
higher is better |
|
|
GOP/s |
higher is better |
|
Direct |
GFLOPS |
higher is better |
|
|
GFLOPS |
higher is better |
|
|
GOP/s |
higher is better |
|
|
M elements/s |
higher is better |
|
|
M elements/s |
higher is better |
|
|
M elements/s |
higher is better |
Each scenario is wired up as a Phoronix test profile under
benchmarks/openbenchmarking/profiles/ and bundled into a single suite
benchmarks/openbenchmarking/suites/skainet-engine-suite/.
Headline vs secondary metrics
-
Headline β throughput on the steady-state full lane (i.e. 3 warmups
5 measured runs at the manifest’s full shapes). Suitable for cross-release comparisons and OpenBenchmarking publication. -
Secondary β smoke-mode values from CI on
ubuntu-latest. These exist to catch obvious regressions in the harness and JSON schema. They are not publishable; virtualized cloud runners are too noisy and the shapes are deliberately small.
A run is automatically flagged with "unstable": true in its
BenchmarkRecord when the coefficient of variation exceeds 3%. Unstable
records should be excluded from public leaderboards.
Lanes
| Lane | Trigger | Notes |
|---|---|---|
Smoke (ubuntu-latest) |
pull_request, push to main, workflow_dispatch |
|
Full (self-hosted) |
release, workflow_dispatch |
|
The full lane currently runs on a Linux x86 host with an AVX2-capable CPU. macOS Arm64 and Linux Arm64 lanes are tracked as follow-ups in the engine benchmark PRD.
Reproducing a public run locally
Prerequisites: JDK 21 or newer, Phoronix Test Suite (optional, only required to validate the local PTS profiles).
# 1. Build the publication harness.
./gradlew :skainet-backends:benchmarks:jvm-cpu-publish:shadowJar
# 2. Smoke run (β30 s; same shape as the CI smoke job).
./scripts/run_engine_smoke.sh
# 3. Full run (βminutes; same shape as the self-hosted lane).
./scripts/run_engine_benchmarks.sh
# 4. Inspect the JSON record for a single scenario.
ls out/engine
cat out/engine/<TIMESTAMP>/engine-fp32-gemm-panama.json
To install Phoronix Test Suite on Ubuntu 24.04+ (not in the default repos):
./scripts/install_pts.sh
./scripts/validate_pts_profiles.sh
To register this machine as the official self-hosted runner:
GH_RUNNER_TOKEN=<token from repo Settings -> Actions -> Runners> \
REPO=ainet-sk/SKaiNET \
./scripts/register_bench_runner.sh
Result record schema
Every scenario emits a BenchmarkRecord JSON (schema version 1.0.0)
with top-level runtime, system, config, metrics, and samples
fields. The full schema is defined under
skainet-backends/benchmarks/jvm-cpu-publish/src/main/kotlin/sk/ainet/bench/publish/schema/.
Records are deliberately self-describing β they carry the SKaiNET commit, JVM args, kernel-provider list, CPU model, JDK version, and every raw sample so a third party can spot-check a published result without re-running the suite.
Methodology pinning
All shapes, warmup/measured counts, JVM flags, and the schema version
are pinned in benchmarks/manifests/engine-release.yml. Bumping any
value in that manifest is a methodology change β bump the
manifest_version and call it out in the release notes so historical
comparisons don’t silently break.