matmulAutoDispatch
fun matmulAutoDispatch(input: Tensor<FP32, Float>, weight: Tensor<*, *>, ctx: ExecutionContext): Tensor<FP32, Float>(source)
Perform matmul with automatic dispatch based on weight type. Uses quantized-optimized path when weights are quantized, otherwise falls back to standard matmul.
Supported weight types:
Q8_0TensorData: Uses Q8_0 fused matmul
Q4_KTensorData: Uses Q4_K fused matmul
TernaryTensorData: Uses ternary addition-only matmul
FP32: Standard floating-point matmul
Return
Output tensor
Parameters
input
FP32 input tensor
weight
Weight tensor (quantized or FP32)
ctx
ExecutionContext