AccelerateCpuOps

CPU operations accelerated by Apple's Accelerate framework. Overrides hot-path operations (matmul, elementwise, reductions) with hardware-optimized routines that leverage ARM NEON and AMX.

Falls through to DefaultCpuOpsBase for non-FP32, non-contiguous, or complex broadcasting cases.

Constructors

Link copied to clipboard
constructor(dataFactory: TensorDataFactory)

Functions

Link copied to clipboard
open override fun <T : DType, V> add(a: Tensor<T, V>, b: Tensor<T, V>): Tensor<T, V>
Link copied to clipboard
open override fun <T : DType, V> divide(a: Tensor<T, V>, b: Tensor<T, V>): Tensor<T, V>
Link copied to clipboard
open override fun <T : DType, V> matmul(a: Tensor<T, V>, b: Tensor<T, V>): Tensor<T, V>
Link copied to clipboard
open override fun <T : DType, V> mean(tensor: Tensor<T, V>, dim: Int?): Tensor<T, V>
Link copied to clipboard
open override fun <T : DType, V> multiply(a: Tensor<T, V>, b: Tensor<T, V>): Tensor<T, V>
Link copied to clipboard
open override fun <T : DType, V> relu(tensor: Tensor<T, V>): Tensor<T, V>
Link copied to clipboard
open override fun <T : DType, V> silu(tensor: Tensor<T, V>): Tensor<T, V>
Link copied to clipboard
open override fun <T : DType, V> subtract(a: Tensor<T, V>, b: Tensor<T, V>): Tensor<T, V>
Link copied to clipboard
open override fun <T : DType, V> sum(tensor: Tensor<T, V>, dim: Int?): Tensor<T, V>
Link copied to clipboard
open override fun <T : DType, V> transpose(tensor: Tensor<T, V>): Tensor<T, V>