turboQuant
fun turboQuant(preset: String, numLayers: Int, numHeads: Int, headDim: Int, maxSeqLen: Int): KvCacheStore(source)
Create a TurboQuant-compressed KV cache from a named preset.
Available presets: "safe-lowbit", "balanced", "experimental-max".
Example:
val cache = KvCacheStore.turboQuant("balanced", numLayers=32, numHeads=32, headDim=128, maxSeqLen=4096)Content copied to clipboard
Parameters
preset
Preset name (see TurboQuantPresets.availablePresets)
numLayers
Number of transformer layers
numHeads
Number of KV heads per layer
headDim
Dimension per head
maxSeqLen
Maximum sequence length
fun turboQuant(numLayers: Int, numHeads: Int, headDim: Int, maxSeqLen: Int, keyBits: Int = 4, valueBits: Int = 4, useQjl: Boolean = false): KvCacheStore(source)
Create a TurboQuant-compressed KV cache with custom bit budgets.
Example:
// 8-bit keys, 4-bit values (safe-lowbit style)
val cache = KvCacheStore.turboQuant(
numLayers=32, numHeads=32, headDim=128, maxSeqLen=4096,
keyBits=8, valueBits=4
)Content copied to clipboard