KvCacheStore
Dedicated KV-cache storage abstraction for inference.
Unlike generic TensorStorage, a KV cache is append-friendly and role-aware: keys and values may use different encodings and bit budgets. The cache is addressed by (layer, head, position) and supports compressed block storage for quantized formats (Q4_K, Q8_0, TurboQuant, etc.).
Backends and attention kernels interact with the cache through this interface rather than managing raw tensors directly. This allows:
Compressed K/V writes on token append
Tile-level dequantization on read (only the needed range)
Asymmetric K/V policies (e.g., Q8_0 for keys, 4-bit for values)
Backend-specific fused dequant+attention paths
Inheritors
Properties
Functions
Append a single token's K/V projections for one layer.
Memory report for the entire cache.
Read raw (possibly compressed) key storage for a layer as TensorStorage.
Read cached values for a layer, dequantized to float.
Read raw (possibly compressed) value storage for a layer as TensorStorage.