AdamOptimizer
constructor(lr: Double = 0.001, beta1: Double = 0.9, beta2: Double = 0.999, epsilon: Double = 1.0E-8, weightDecay: Double = 0.0, decoupledWeightDecay: Boolean = true, amsgrad: Boolean = false)(source)
Parameters
lr
Learning rate (default: 0.001)
beta1
Exponential decay rate for the first moment estimates (default: 0.9)
beta2
Exponential decay rate for the second moment estimates (default: 0.999)
epsilon
Small constant for numerical stability (default: 1e-8)
weightDecay
Weight decay coefficient (default: 0.0)
decoupledWeightDecay
If true, uses AdamW-style decoupled weight decay (default: true)
amsgrad
If true, uses the AMSGrad variant that maintains the maximum of all v_t (default: false)