KvCacheConfig

data class KvCacheConfig(val numLayers: Int, val numHeads: Int, val headDim: Int, val maxSeqLen: Int, val keyEncoding: TensorEncoding = TensorEncoding.Dense(4), val valueEncoding: TensorEncoding = TensorEncoding.Dense(4), val placement: Placement = Placement.CPU_HEAP.copy(residency = Residency.PERSISTENT))(source)

Configuration for asymmetric K/V encoding policies.

Keys are often more quality-sensitive than values, so different bit budgets may be appropriate. For example:

  • safe-lowbit: Q8_0 keys + 4-bit values

  • balanced: 4-bit keys + 4-bit values

Constructors

Link copied to clipboard
constructor(numLayers: Int, numHeads: Int, headDim: Int, maxSeqLen: Int, keyEncoding: TensorEncoding = TensorEncoding.Dense(4), valueEncoding: TensorEncoding = TensorEncoding.Dense(4), placement: Placement = Placement.CPU_HEAP.copy(residency = Residency.PERSISTENT))

Types

Link copied to clipboard
object Companion

Properties

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
Link copied to clipboard