JvmTurboQuantKernels
JVM SIMD-optimized kernels for TurboQuant operations.
Uses the Java Vector API (jdk.incubator.vector) for CPU SIMD acceleration of TurboQuant encode/decode paths. Falls back to scalar code for non-aligned tails.
These kernels optimize the hot paths:
Per-group abs-max computation (for scale calculation)
Vectorized quantization (float → code)
Vectorized dequantization (code → float)
Walsh-Hadamard transform butterfly stages
Usage: Called by the CPU backend when TurboQuant-encoded K/V is detected in the attention path.