DefaultKvCacheStore
Default KV cache implementation using dense FP32 storage.
This is the reference/baseline implementation that stores K/V as uncompressed float arrays. Quantized implementations (Q8_0, TurboQuant) will override appendToken and readKeys/readValues with encode-on-write / decode-on-read paths.
Internal layout per layer:
keys:
FloatArray(numHeads * maxSeqLen * headDim)— numHeads, maxSeqLen, headDimvalues:
FloatArray(numHeads * maxSeqLen * headDim)— numHeads, maxSeqLen, headDim
Append writes to position currentSeqLen; read returns a contiguous slice.
Properties
Functions
Append a single token's K/V projections for one layer.
Memory report for the entire cache.
Read raw (possibly compressed) key storage for a layer as TensorStorage.
Read cached values for a layer, dequantized to float.
Read raw (possibly compressed) value storage for a layer as TensorStorage.