JVM CPU Backend Performance Benchmarks (JMH)
This page explains how to run the JMH benchmarks for the JVM CPU backend and how to capture evidence for performance targets.
What’s included
-
Elementwise: FP32
addon 1,000,000 elements -
Reductions: FP32
sumandmeanon 1,000,000 elements -
Matmul: FP32 square
matmulwith sizes 256, 512, and 1024
Benchmarks are implemented in module:
-
:skainet-backends:benchmarks:jvm-cpu-jmh
Source files:
-
src/jmh/kotlin/sk/ainet/bench/ElementwiseAdd1MBench.kt -
src/jmh/kotlin/sk/ainet/bench/Reductions1MBench.kt -
src/jmh/kotlin/sk/ainet/bench/MatmulBench.kt
Prerequisites
-
JDK 21+ (CI builds on JDK 25)
-
Gradle passes the required JVM flags automatically:
-
--enable-preview -
--add-modules jdk.incubator.vector
-
For JDK 25-specific performance advantages, see Java 25 CPU Backend notes.
Feature flags
You can toggle acceleration paths at runtime using system properties or environment variables:
-
Vector acceleration:
-
-Dskainet.cpu.vector.enabled=true|false -
or
SKAINET_CPU_VECTOR_ENABLED=true|false
-
-
BLAS via Panama (matmul heuristic for larger sizes):
-
-Dskainet.cpu.blas.enabled=true|false -
or
SKAINET_CPU_BLAS_ENABLED=true|false
-
Each benchmark also exposes @Param to toggle these flags without modifying Gradle args.
How to run all benchmarks
From repository root:
./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh
This will build and execute all JMH benchmarks with the default parameters defined in sources.
Run specific benchmarks
-
Elementwise add (both vector on/off):
./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh \ -Pjmh.include=ElementwiseAdd1MBench
-
Reductions (vector on/off):
./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh \ -Pjmh.include=Reductions1MBench
-
Matmul, all sizes, with vector on and BLAS on:
./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh \ -Pjmh.include=MatmulBench \ -Pjmh.param.vectorEnabled=true \ -Pjmh.param.blasEnabled=true
-
Matmul at 512 only, comparing BLAS on/off with vector on:
./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh \ -Pjmh.include=MatmulBench \ -Pjmh.param.size=512 \ -Pjmh.param.vectorEnabled=true \ -Pjmh.param.blasEnabled=true,false
Notes:
-
You can also pass system properties via
-Dif preferred (e.g.,-Dskainet.cpu.vector.enabled=false). -
JMH JSON/text results can be configured via standard JMH plugin options if you need files for CI artifacts.
Recording environment details
Include at minimum:
-
CPU model, cores/threads, base/boost clock
-
RAM size and speed
-
OS version
-
JDK version and vendor
-
Gradle version
-
JVM flags in use (
--enable-preview --add-modules jdk.incubator.vector) -
SKaiNET flags used (vector, BLAS)
Performance targets (to be validated on your hardware)
-
≥ 4× speedup on FP32
matmul512Ă—512 vs baseline scalar -
≥ 3× speedup on FP32
addwith 1M elements vs baseline scalar
Use the above commands to produce “vector=false/blas=false” baselines vs “vector=true[/blas=true]” accelerated runs. Capture best-of or median-of JMH results as evidence and include raw tables in this document when available.