Q8_0TensorData

Tensor data interface for Q8_0 quantized format.

Q8_0 block format (32 elements per block, 34 bytes per block):

  • 2 bytes: f16 scale

  • 32 bytes: int8 quantized codes

Dequantization: outputi = codei * scale

This interface enables direct quantized matmul operations without full dequantization, providing significant memory and compute savings for inference.

Inheritors

Types

Link copied to clipboard
object Companion

Properties

Link copied to clipboard
abstract val blockCount: Int

Number of Q8_0 blocks in the tensor.

Link copied to clipboard
abstract val packedData: ByteArray

Raw packed data containing all blocks.

Functions

Link copied to clipboard
abstract fun getBlockScale(blockIdx: Int): Float

Get the scale factor for a specific block.

Link copied to clipboard
abstract fun getCode(blockIdx: Int, elementIdx: Int): Byte

Get a quantized code value within a block (0..31).

Link copied to clipboard

Dequantize Q8_0 tensor data to a FloatArray. outputi = codei * scale