QuantizedTensorFactory

Factory for creating quantized tensor data from raw GGUF bytes.

This factory converts raw quantized bytes (loaded with RAW_BYTES policy) into specialized tensor data types that enable direct quantized matmul operations without full FP32 dequantization.

Usage:

// Load weights with RAW_BYTES policy
val weights = loader.loadToMap<Int8, Byte>(ctx)

// Convert Q8_0 tensor to quantized format
val quantType = weights.quantTypes["blk.0.attn_q.weight"]
val rawTensor = weights.tensors["blk.0.attn_q.weight"]
if (quantType == GGMLQuantizationType.Q8_0) {
val q8Data = QuantizedTensorFactory.toQ8_0(rawTensor)
// Use q8Data with QuantizedMatmul.matmulQ8_0()
}

Properties

Link copied to clipboard

Quantization types that support direct quantized matmul (without dequantization).

Functions

Link copied to clipboard

Check if a quantization type supports direct quantized matmul.

Link copied to clipboard
fun toQ4_K(rawTensor: Tensor<Int8, Byte>): Q4_KTensorData

Convert a raw byte tensor to Q4_KTensorData using the tensor's shape.

fun toQ4_K(rawTensor: Tensor<Int8, Byte>, logicalShape: Shape): Q4_KTensorData

Convert a raw byte tensor to Q4_KTensorData.

Link copied to clipboard
fun toQ8_0(rawTensor: Tensor<Int8, Byte>): Q8_0TensorData

Convert a raw byte tensor to Q8_0TensorData using the tensor's shape.

fun toQ8_0(rawTensor: Tensor<Int8, Byte>, logicalShape: Shape): Q8_0TensorData

Convert a raw byte tensor to Q8_0TensorData.