KvCache
@Target(allowedTargets = [AnnotationTarget.PROPERTY, AnnotationTarget.VALUE_PARAMETER, AnnotationTarget.FIELD] )
Configures TurboQuant KV-cache compression for an attention layer.
Applied to attention layer properties to declare KV-cache compression settings. The runtime uses these annotations to configure the KvCacheStore and CompressedKvAttention for each layer.
Example:
@KvCache(preset = "balanced")
val selfAttention: MultiHeadAttention
@KvCache(keyBits = 8, valueBits = 4)
val crossAttention: MultiHeadAttention
@KvCache(preset = "safe-lowbit", maxSeqLen = 4096)
val longContextAttention: MultiHeadAttentionContent copied to clipboard