skainet-lang-core/sk.ainet.lang.tensor.ops/QuantizedMatmul/matmulAutoDispatch

matmulAutoDispatch

fun matmulAutoDispatch(input: Tensor<FP32, Float>, weight: Tensor<*, *>, ctx: ExecutionContext): Tensor<FP32, Float>(source)

Perform matmul with automatic dispatch based on weight type. Uses quantized-optimized path when weights are quantized, otherwise falls back to standard matmul.

Supported weight types:

Q8_0TensorData: Uses Q8_0 fused matmul
Q4_KTensorData: Uses Q4_K fused matmul
TernaryTensorData: Uses ternary addition-only matmul
FP32: Standard floating-point matmul

Return

Output tensor

Parameters

input

FP32 input tensor

weight

Weight tensor (quantized or FP32)

ctx

ExecutionContext