skainet-lang-core/sk.ainet.lang.tensor.storage/CompressedKvAttention

CompressedKvAttention

class CompressedKvAttention(cache: KvCacheStore, dequantStrategy: CompressedKvAttention.DequantStrategy = DequantStrategy.FULL_TILE)(source)

Bridge between KvCacheStore and the SDPA execution path.

This abstraction provides the integration point for compressed K/V in the attention runtime. Instead of modifying the core TensorOps interface (which maps to backend-specific fused kernels), this component sits between the model layer and SDPA:

Write path: Compresses K/V on token append via storeKeyValue
Read path: Dequantizes only required tiles via loadKeysForAttention and loadValuesForAttention
Extension point: Backends can override DequantStrategy to fuse decompression with attention math.

Usage in a transformer layer:

val bridge = CompressedKvAttention(kvCache)
bridge.storeKeyValue(layer, keyProjection, valueProjection)
val keys = bridge.loadKeysForAttention(layer)
val values = bridge.loadValuesForAttention(layer)
// pass keys, values to scaledDotProductAttention

Constructors

CompressedKvAttention

constructor(cache: KvCacheStore, dequantStrategy: CompressedKvAttention.DequantStrategy = DequantStrategy.FULL_TILE)

Types

DequantStrategy

enum DequantStrategy : Enum<CompressedKvAttention.DequantStrategy>

Strategy for dequantizing compressed K/V during attention.

Properties

isKeyCompressed

val isKeyCompressed: Boolean

Whether the cache uses compressed (non-Dense) encoding for keys.

isValueCompressed

val isValueCompressed: Boolean

Whether the cache uses compressed (non-Dense) encoding for values.

Functions

loadKeysForAttention

fun loadKeysForAttention(layer: Int, startPos: Int = 0, endPos: Int = cache.currentSeqLen): FloatArray

Load cached keys for attention, dequantizing as needed.

loadKeyStorageRaw

fun loadKeyStorageRaw(layer: Int, startPos: Int = 0, endPos: Int = cache.currentSeqLen): TensorStorage

Load raw TensorStorage for keys, preserving the cache's native encoding.

loadValuesForAttention

fun loadValuesForAttention(layer: Int, startPos: Int = 0, endPos: Int = cache.currentSeqLen): FloatArray

Load cached values for attention, dequantizing as needed.

loadValueStorageRaw

fun loadValueStorageRaw(layer: Int, startPos: Int = 0, endPos: Int = cache.currentSeqLen): TensorStorage

Load raw TensorStorage for values, preserving native encoding.

storeKeyValue

fun storeKeyValue(layer: Int, key: FloatArray, value: FloatArray)

Store K/V projections for a new token, compressing as configured.