KvCacheBypass
@Target(allowedTargets = [AnnotationTarget.PROPERTY, AnnotationTarget.VALUE_PARAMETER, AnnotationTarget.FIELD] )
Disables TurboQuant compression for a specific layer.
When applied alongside a model-level KvCache annotation, this overrides the compression setting for individual layers that are sensitive to quantization (e.g., early layers or cross-attention).
Example:
@KvCacheBypass
val firstLayerAttention: MultiHeadAttention // stays FP32Content copied to clipboard