skainet-lang-core/sk.ainet.lang.tensor.storage/KvCacheBypass

KvCacheBypass

@Target(allowedTargets = [AnnotationTarget.PROPERTY, AnnotationTarget.VALUE_PARAMETER, AnnotationTarget.FIELD])

Disables TurboQuant compression for a specific layer.

When applied alongside a model-level KvCache annotation, this overrides the compression setting for individual layers that are sensitive to quantization (e.g., early layers or cross-attention).

Example:

@KvCacheBypass
val firstLayerAttention: MultiHeadAttention  // stays FP32