StreamingGgufParametersLoader

class StreamingGgufParametersLoader(sourceProvider: () -> RandomAccessSource, onProgress: (current: Long, total: Long, message: String?) -> Unit = { _, _, _ -> }) : ParametersLoader(source)

Streaming GGUF parameters loader — the recommended path for loading GGUF models.

Unlike GgufParametersLoader (which uses the legacy GGUFReader and rejects quantized types), this loader:

  • Uses StreamingGGUFReader for memory-efficient parsing

  • Supports quantized types (Q4_K, Q8_0) as packed TensorData

  • Loads tensor data on-demand without heap-loading the full file

  • Preserves quantized layout through the loading pipeline

For F32 and I32 tensors, data is returned as standard dense arrays. For quantized tensors, data is returned as packed block storage (e.g., Q4_KBlockTensorData, Q8_0BlockTensorData).

Constructors

Link copied to clipboard
constructor(sourceProvider: () -> RandomAccessSource, onProgress: (current: Long, total: Long, message: String?) -> Unit = { _, _, _ -> })

Functions

Link copied to clipboard
open suspend override fun <T : DType, V> load(ctx: ExecutionContext, dtype: KClass<T>, onTensorLoaded: (String, Tensor<T, V>) -> Unit)