DefaultKvCacheStore

Default KV cache implementation using dense FP32 storage.

This is the reference/baseline implementation that stores K/V as uncompressed float arrays. Quantized implementations (Q8_0, TurboQuant) will override appendToken and readKeys/readValues with encode-on-write / decode-on-read paths.

Internal layout per layer:

  • keys: FloatArray(numHeads * maxSeqLen * headDim) — numHeads, maxSeqLen, headDim

  • values: FloatArray(numHeads * maxSeqLen * headDim) — numHeads, maxSeqLen, headDim

Append writes to position currentSeqLen; read returns a contiguous slice.

Constructors

Link copied to clipboard
constructor(config: KvCacheConfig)

Properties

Link copied to clipboard
open override val currentSeqLen: Int

Current number of tokens stored in the cache.

Link copied to clipboard
open override val headDim: Int

Dimension per head.

Link copied to clipboard
open override val keyEncoding: TensorEncoding

Encoding used for key storage.

Link copied to clipboard
open override val maxSeqLen: Int

Maximum sequence length this cache can hold.

Link copied to clipboard
open override val numHeads: Int

Number of KV heads per layer.

Link copied to clipboard
open override val numLayers: Int

Number of transformer layers in this cache.

Link copied to clipboard
open override val placement: Placement

Placement intent for the cache buffers.

Link copied to clipboard
open override val valueEncoding: TensorEncoding

Encoding used for value storage.

Functions

Link copied to clipboard
open override fun appendToken(layer: Int, key: FloatArray, value: FloatArray)

Append a single token's K/V projections for one layer.

Link copied to clipboard
open override fun clear()

Reset the cache, clearing all stored tokens.

Link copied to clipboard
open override fun evict(fromPos: Int)

Evict all cached tokens from position fromPos onward.

Link copied to clipboard
open override fun memoryReport(): KvCacheMemoryReport

Memory report for the entire cache.

Link copied to clipboard
open override fun readKeys(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): FloatArray

Read cached keys for a layer, dequantized to float.

Link copied to clipboard
open override fun readKeyStorage(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): TensorStorage

Read raw (possibly compressed) key storage for a layer as TensorStorage.

Link copied to clipboard
open override fun readValues(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): FloatArray

Read cached values for a layer, dequantized to float.

Link copied to clipboard
open override fun readValueStorage(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): TensorStorage

Read raw (possibly compressed) value storage for a layer as TensorStorage.