Q4_KTensorData
Tensor data interface for Q4_K quantized format.
Q4_K block format (256 elements per block, 144 bytes per block):
2 bytes: f16 d (main scale)
2 bytes: f16 dMin (minimum scale)
12 bytes: packed scales (8 sub-blocks × 12 bits each = 96 bits = 12 bytes)
128 bytes: 4-bit quantized codes (256 elements / 2 = 128 bytes)
Each sub-block (32 elements):
6-bit scale index (0..63)
6-bit min index (0..63)
scale = d * (scaleIdx / 63)
min = dMin * (minIdx / 63)
Dequantization: outputi = codei * scale + min
Inheritors
Properties
Functions
Link copied to clipboard
Get the minimum scale factor (dMin) for a block.
Link copied to clipboard
Get the minimum value for a specific sub-block within a block.
Link copied to clipboard
Get the scale for a specific sub-block within a block.
Link copied to clipboard
Dequantize Q4_K tensor data to a FloatArray. outputi = codei * scale + min