KvCacheConfig
data class KvCacheConfig(val numLayers: Int, val numHeads: Int, val headDim: Int, val maxSeqLen: Int, val keyEncoding: TensorEncoding = TensorEncoding.Dense(4), val valueEncoding: TensorEncoding = TensorEncoding.Dense(4), val placement: Placement = Placement.CPU_HEAP.copy(residency = Residency.PERSISTENT))(source)
Configuration for asymmetric K/V encoding policies.
Keys are often more quality-sensitive than values, so different bit budgets may be appropriate. For example:
safe-lowbit: Q8_0 keys + 4-bit values
balanced: 4-bit keys + 4-bit values
Constructors
Link copied to clipboard
constructor(numLayers: Int, numHeads: Int, headDim: Int, maxSeqLen: Int, keyEncoding: TensorEncoding = TensorEncoding.Dense(4), valueEncoding: TensorEncoding = TensorEncoding.Dense(4), placement: Placement = Placement.CPU_HEAP.copy(residency = Residency.PERSISTENT))