AdamOptimizer

class AdamOptimizer @JvmOverloads constructor(lr: Double = 0.001, beta1: Double = 0.9, beta2: Double = 0.999, epsilon: Double = 1.0E-8, weightDecay: Double = 0.0, decoupledWeightDecay: Boolean = true, amsgrad: Boolean = false) : Optimizer(source)

Adam optimizer (Adaptive Moment Estimation).

Implements the Adam algorithm from "Adam: A Method for Stochastic Optimization" (Kingma & Ba, 2014) with optional decoupled weight decay (AdamW).

The update rule is:

m_t = β1 * m_{t-1} + (1 - β1) * g_t
v_t = β2 * v_{t-1} + (1 - β2) * g_t^2
m_hat = m_t / (1 - β1^t)
v_hat = v_t / (1 - β2^t)
θ_t = θ_{t-1} - lr * m_hat / (sqrt(v_hat) + ε)

When decoupledWeightDecay is true (default), weight decay is applied directly to the parameters (AdamW style) rather than added to the gradient (L2 regularization).

Parameters

lr

Learning rate (default: 0.001)

beta1

Exponential decay rate for the first moment estimates (default: 0.9)

beta2

Exponential decay rate for the second moment estimates (default: 0.999)

epsilon

Small constant for numerical stability (default: 1e-8)

weightDecay

Weight decay coefficient (default: 0.0)

decoupledWeightDecay

If true, uses AdamW-style decoupled weight decay (default: true)

amsgrad

If true, uses the AMSGrad variant that maintains the maximum of all v_t (default: false)

Constructors

Link copied to clipboard
constructor(lr: Double = 0.001, beta1: Double = 0.9, beta2: Double = 0.999, epsilon: Double = 1.0E-8, weightDecay: Double = 0.0, decoupledWeightDecay: Boolean = true, amsgrad: Boolean = false)

Functions

Link copied to clipboard
open override fun addParameter(param: ModuleParameter<*, *>, applyWeightDecay: Boolean)

Register a raw module parameter to be optimized.

open override fun addParameter(param: Parameter, applyWeightDecay: Boolean)

Register a parameter to be optimized.

Link copied to clipboard
fun reset()

Resets the optimizer state (moment estimates and step counter). Useful when starting training from scratch with the same optimizer instance.

Link copied to clipboard
open override fun step()

Perform one optimization step, updating all registered parameters in-place (via reassigning their tensor values where needed).

Link copied to clipboard
open override fun zeroGrad()

Zero accumulated gradients on all registered parameters.