matmulAutoDispatch

Perform matmul with automatic dispatch based on weight type. Uses quantized-optimized path when weights are quantized, otherwise falls back to standard matmul.

Supported weight types:

  • Q8_0TensorData: Uses Q8_0 fused matmul

  • Q4_KTensorData: Uses Q4_K fused matmul

  • TernaryTensorData: Uses ternary addition-only matmul

  • FP32: Standard floating-point matmul

Return

Output tensor

Parameters

input

FP32 input tensor

weight

Weight tensor (quantized or FP32)

ctx

ExecutionContext