QuantizedTensorFactory
Factory for creating quantized tensor data from raw GGUF bytes.
This factory converts raw quantized bytes (loaded with RAW_BYTES policy) into specialized tensor data types that enable direct quantized matmul operations without full FP32 dequantization.
Usage:
// Load weights with RAW_BYTES policy
val weights = loader.loadToMap<Int8, Byte>(ctx)
// Convert Q8_0 tensor to quantized format
val quantType = weights.quantTypes["blk.0.attn_q.weight"]
val rawTensor = weights.tensors["blk.0.attn_q.weight"]
if (quantType == GGMLQuantizationType.Q8_0) {
val q8Data = QuantizedTensorFactory.toQ8_0(rawTensor)
// Use q8Data with QuantizedMatmul.matmulQ8_0()
}Content copied to clipboard