JVM CPU Backend Performance Benchmarks (JMH)

This page explains how to run the JMH benchmarks for the JVM CPU backend and how to capture evidence for performance targets.

What’s included
  • Elementwise: FP32 add on 1,000,000 elements

  • Reductions: FP32 sum and mean on 1,000,000 elements

  • Matmul: FP32 square matmul with sizes 256, 512, and 1024

Benchmarks are implemented in module:

  • :skainet-backends:benchmarks:jvm-cpu-jmh

Source files:

  • src/jmh/kotlin/sk/ainet/bench/ElementwiseAdd1MBench.kt

  • src/jmh/kotlin/sk/ainet/bench/Reductions1MBench.kt

  • src/jmh/kotlin/sk/ainet/bench/MatmulBench.kt

Prerequisites
  • JDK 21+ (CI builds on JDK 25)

  • Gradle passes the required JVM flags automatically:

    • --enable-preview

    • --add-modules jdk.incubator.vector

For JDK 25-specific performance advantages, see Java 25 CPU Backend notes.

Feature flags

You can toggle acceleration paths at runtime using system properties or environment variables:

  • Vector acceleration:

    • -Dskainet.cpu.vector.enabled=true|false

    • or SKAINET_CPU_VECTOR_ENABLED=true|false

  • BLAS via Panama (matmul heuristic for larger sizes):

    • -Dskainet.cpu.blas.enabled=true|false

    • or SKAINET_CPU_BLAS_ENABLED=true|false

Each benchmark also exposes @Param to toggle these flags without modifying Gradle args.

How to run all benchmarks

From repository root:

./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh

This will build and execute all JMH benchmarks with the default parameters defined in sources.

Run specific benchmarks
  • Elementwise add (both vector on/off):

./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh \
  -Pjmh.include=ElementwiseAdd1MBench
  • Reductions (vector on/off):

./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh \
  -Pjmh.include=Reductions1MBench
  • Matmul, all sizes, with vector on and BLAS on:

./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh \
  -Pjmh.include=MatmulBench \
  -Pjmh.param.vectorEnabled=true \
  -Pjmh.param.blasEnabled=true
  • Matmul at 512 only, comparing BLAS on/off with vector on:

./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh \
  -Pjmh.include=MatmulBench \
  -Pjmh.param.size=512 \
  -Pjmh.param.vectorEnabled=true \
  -Pjmh.param.blasEnabled=true,false

Notes:

  • You can also pass system properties via -D if preferred (e.g., -Dskainet.cpu.vector.enabled=false).

  • JMH JSON/text results can be configured via standard JMH plugin options if you need files for CI artifacts.

Recording environment details

Include at minimum:

  • CPU model, cores/threads, base/boost clock

  • RAM size and speed

  • OS version

  • JDK version and vendor

  • Gradle version

  • JVM flags in use (--enable-preview --add-modules jdk.incubator.vector)

  • SKaiNET flags used (vector, BLAS)

Performance targets (to be validated on your hardware)
  • ≥ 4Ă— speedup on FP32 matmul 512Ă—512 vs baseline scalar

  • ≥ 3Ă— speedup on FP32 add with 1M elements vs baseline scalar

Use the above commands to produce “vector=false/blas=false” baselines vs “vector=true[/blas=true]” accelerated runs. Capture best-of or median-of JMH results as evidence and include raw tables in this document when available.