TurboQuantKvCacheStore
class TurboQuantKvCacheStore(config: KvCacheConfig, keyConfig: TurboQuantConfig, valueConfig: TurboQuantConfig) : KvCacheStore(source)
KV cache store with TurboQuant compression.
Compresses K/V projections on write using TurboQuant and decompresses on read. Supports asymmetric K/V policies (different bit budgets and variants for keys vs values).
Each token's K/V projection per head is stored as a TurboQuantBlock. This gives fine-grained control: different layers/heads could potentially use different configurations (though this implementation uses uniform config).
Properties
Functions
Link copied to clipboard
Append a single token's K/V projections for one layer.
Link copied to clipboard
Memory report for the entire cache.
Link copied to clipboard
open override fun readKeyStorage(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): TensorStorage
Read raw (possibly compressed) key storage for a layer as TensorStorage.
Link copied to clipboard
open override fun readValues(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): FloatArray
Read cached values for a layer, dequantized to float.
Link copied to clipboard
open override fun readValueStorage(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): TensorStorage
Read raw (possibly compressed) value storage for a layer as TensorStorage.