turboQuant

fun turboQuant(preset: String, numLayers: Int, numHeads: Int, headDim: Int, maxSeqLen: Int): KvCacheStore(source)

Create a TurboQuant-compressed KV cache from a named preset.

Available presets: "safe-lowbit", "balanced", "experimental-max".

Example:

val cache = KvCacheStore.turboQuant("balanced", numLayers=32, numHeads=32, headDim=128, maxSeqLen=4096)

Parameters

preset
numLayers

Number of transformer layers

numHeads

Number of KV heads per layer

headDim

Dimension per head

maxSeqLen

Maximum sequence length


fun turboQuant(numLayers: Int, numHeads: Int, headDim: Int, maxSeqLen: Int, keyBits: Int = 4, valueBits: Int = 4, useQjl: Boolean = false): KvCacheStore(source)

Create a TurboQuant-compressed KV cache with custom bit budgets.

Example:

// 8-bit keys, 4-bit values (safe-lowbit style)
val cache = KvCacheStore.turboQuant(
numLayers=32, numHeads=32, headDim=128, maxSeqLen=4096,
keyBits=8, valueBits=4
)