skainet-io-gguf/sk.ainet.io.gguf.llama/QuantizedTensorFactory

QuantizedTensorFactory

Factory for creating quantized tensor data from raw GGUF bytes.

This factory converts raw quantized bytes (loaded with RAW_BYTES policy) into specialized tensor data types that enable direct quantized matmul operations without full FP32 dequantization.

Usage:

// Load weights with RAW_BYTES policy
val weights = loader.loadToMap<Int8, Byte>(ctx)

// Convert Q8_0 tensor to quantized format
val quantType = weights.quantTypes["blk.0.attn_q.weight"]
val rawTensor = weights.tensors["blk.0.attn_q.weight"]
if (quantType == GGMLQuantizationType.Q8_0) {
    val q8Data = QuantizedTensorFactory.toQ8_0(rawTensor)
    // Use q8Data with QuantizedMatmul.matmulQ8_0()
}