skainet-lang-core/sk.ainet.lang.tensor.storage/TurboQuantKvCacheStore

TurboQuantKvCacheStore

class TurboQuantKvCacheStore(config: KvCacheConfig, keyConfig: TurboQuantConfig, valueConfig: TurboQuantConfig) : KvCacheStore(source)

KV cache store with TurboQuant compression.

Compresses K/V projections on write using TurboQuant and decompresses on read. Supports asymmetric K/V policies (different bit budgets and variants for keys vs values).

Each token's K/V projection per head is stored as a TurboQuantBlock. This gives fine-grained control: different layers/heads could potentially use different configurations (though this implementation uses uniform config).

Constructors

TurboQuantKvCacheStore

constructor(config: KvCacheConfig, keyConfig: TurboQuantConfig, valueConfig: TurboQuantConfig)

Properties

open override val currentSeqLen: Int

Current number of tokens stored in the cache.

open override val headDim: Int

Dimension per head.

open override val keyEncoding: TensorEncoding

Encoding used for key storage.

open override val maxSeqLen: Int

Maximum sequence length this cache can hold.

open override val numHeads: Int

Number of KV heads per layer.

open override val numLayers: Int

Number of transformer layers in this cache.

open override val placement: Placement

Placement intent for the cache buffers.

open override val valueEncoding: TensorEncoding

Encoding used for value storage.

Functions

open override fun appendToken(layer: Int, key: FloatArray, value: FloatArray)

Append a single token's K/V projections for one layer.

open override fun clear()

Reset the cache, clearing all stored tokens.

open override fun evict(fromPos: Int)

Evict all cached tokens from position fromPos onward.

open override fun memoryReport(): KvCacheMemoryReport

Memory report for the entire cache.

open override fun readKeys(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): FloatArray

Read cached keys for a layer, dequantized to float.

open override fun readKeyStorage(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): TensorStorage

Read raw (possibly compressed) key storage for a layer as TensorStorage.

open override fun readValues(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): FloatArray

Read cached values for a layer, dequantized to float.

readValueStorage

open override fun readValueStorage(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): TensorStorage

Read raw (possibly compressed) value storage for a layer as TensorStorage.