skainet-lang-core/sk.ainet.lang.tensor.storage/KvCacheStore

KvCacheStore

interface KvCacheStore(source)

Dedicated KV-cache storage abstraction for inference.

Unlike generic TensorStorage, a KV cache is append-friendly and role-aware: keys and values may use different encodings and bit budgets. The cache is addressed by (layer, head, position) and supports compressed block storage for quantized formats (Q4_K, Q8_0, TurboQuant, etc.).

Backends and attention kernels interact with the cache through this interface rather than managing raw tensors directly. This allows:

Compressed K/V writes on token append
Tile-level dequantization on read (only the needed range)
Asymmetric K/V policies (e.g., Q8_0 for keys, 4-bit for values)
Backend-specific fused dequant+attention paths

Inheritors

DefaultKvCacheStore

TurboQuantKvCacheStore

Types

object Companion

Properties

abstract val currentSeqLen: Int

Current number of tokens stored in the cache.

abstract val headDim: Int

Dimension per head.

abstract val keyEncoding: TensorEncoding

Encoding used for key storage.

abstract val maxSeqLen: Int

Maximum sequence length this cache can hold.

abstract val numHeads: Int

Number of KV heads per layer.

abstract val numLayers: Int

Number of transformer layers in this cache.

abstract val placement: Placement

Placement intent for the cache buffers.

abstract val valueEncoding: TensorEncoding

Encoding used for value storage.

Functions

abstract fun appendToken(layer: Int, key: FloatArray, value: FloatArray)

Append a single token's K/V projections for one layer.

abstract fun clear()

Reset the cache, clearing all stored tokens.

abstract fun evict(fromPos: Int)

Evict all cached tokens from position fromPos onward.

abstract fun memoryReport(): KvCacheMemoryReport

Memory report for the entire cache.

abstract fun readKeys(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): FloatArray

Read cached keys for a layer, dequantized to float.

abstract fun readKeyStorage(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): TensorStorage

Read raw (possibly compressed) key storage for a layer as TensorStorage.

abstract fun readValues(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): FloatArray

Read cached values for a layer, dequantized to float.

readValueStorage

abstract fun readValueStorage(layer: Int, startPos: Int = 0, endPos: Int = currentSeqLen): TensorStorage

Read raw (possibly compressed) value storage for a layer as TensorStorage.