LayerScale
class LayerScale<T : DType, V>(val dim: Int, val name: String = "LayerScale", initScale: Tensor<T, V>? = null) : Module<T, V> , ModuleParameters<T, V> (source)
Layer Scale: element-wise multiplication by a learnable per-channel scalar.
Introduced in "Going deeper with Image Transformers" (CaiT). Used in vision transformers and audio codec decoders (Voxtral).
Parameters
dim
Number of channels
name
Module name
initScale
Initial scale tensor (shape: dim), typically initialized to a small value (e.g. 0.01)