TurboQuantKvCacheStore

KV cache store with TurboQuant compression.

Compresses K/V projections on write using TurboQuant and decompresses on read. Supports asymmetric K/V policies (different bit budgets and variants for keys vs values).

Each token's K/V projection per head is stored as a TurboQuantBlock. This gives fine-grained control: different layers/heads could potentially use different configurations (though this implementation uses uniform config).

Constructors

Link copied to clipboard
constructor(config: KvCacheConfig, keyConfig: TurboQuantConfig, valueConfig: TurboQuantConfig)

Properties

Link copied to clipboard
open override val currentSeqLen: Int

Current number of tokens stored in the cache.

Link copied to clipboard
open override val headDim: Int

Dimension per head.

Link copied to clipboard
open override val keyEncoding: TensorEncoding

Encoding used for key storage.

Link copied to clipboard
open override val maxSeqLen: Int

Maximum sequence length this cache can hold.

Link copied to clipboard
open override val numHeads: Int

Number of KV heads per layer.

Link copied to clipboard
open override val numLayers: Int

Number of transformer layers in this cache.

Link copied to clipboard
open override val placement: Placement

Placement intent for the cache buffers.

Link copied to clipboard
open override val valueEncoding: TensorEncoding

Encoding used for value storage.

Functions

Link copied to clipboard
open override fun appendToken(layer: Int, key: FloatArray, value: FloatArray)

Append a single token's K/V projections for one layer.

Link copied to clipboard
open override fun clear()

Reset the cache, clearing all stored tokens.

Link copied to clipboard
open override fun evict(fromPos: Int)

Evict all cached tokens from position fromPos onward.

Link copied to clipboard
open override fun memoryReport(): KvCacheMemoryReport

Memory report for the entire cache.

Link copied to clipboard
open override fun readKeys(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): FloatArray

Read cached keys for a layer, dequantized to float.

Link copied to clipboard
open override fun readKeyStorage(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): TensorStorage

Read raw (possibly compressed) key storage for a layer as TensorStorage.

Link copied to clipboard
open override fun readValues(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): FloatArray

Read cached values for a layer, dequantized to float.

Link copied to clipboard
open override fun readValueStorage(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): TensorStorage

Read raw (possibly compressed) value storage for a layer as TensorStorage.