skainet-backend-cpu/sk.ainet.exec.tensor.ops/AccelerateCpuOps

AccelerateCpuOps

class AccelerateCpuOps(dataFactory: TensorDataFactory) : DefaultCpuOpsBase(source)

CPU operations accelerated by Apple's Accelerate framework. Overrides hot-path operations (matmul, elementwise, reductions) with hardware-optimized routines that leverage ARM NEON and AMX.

Falls through to DefaultCpuOpsBase for non-FP32, non-contiguous, or complex broadcasting cases.

Constructors

AccelerateCpuOps

constructor(dataFactory: TensorDataFactory)

Functions

add

open override fun <T : DType, V> add(a: Tensor<T, V>, b: Tensor<T, V>): Tensor<T, V>

divide

open override fun <T : DType, V> divide(a: Tensor<T, V>, b: Tensor<T, V>): Tensor<T, V>

matmul

open override fun <T : DType, V> matmul(a: Tensor<T, V>, b: Tensor<T, V>): Tensor<T, V>

mean

open override fun <T : DType, V> mean(tensor: Tensor<T, V>, dim: Int?): Tensor<T, V>

multiply

open override fun <T : DType, V> multiply(a: Tensor<T, V>, b: Tensor<T, V>): Tensor<T, V>

relu

open override fun <T : DType, V> relu(tensor: Tensor<T, V>): Tensor<T, V>

silu

open override fun <T : DType, V> silu(tensor: Tensor<T, V>): Tensor<T, V>

subtract

open override fun <T : DType, V> subtract(a: Tensor<T, V>, b: Tensor<T, V>): Tensor<T, V>

sum

open override fun <T : DType, V> sum(tensor: Tensor<T, V>, dim: Int?): Tensor<T, V>

transpose

open override fun <T : DType, V> transpose(tensor: Tensor<T, V>): Tensor<T, V>