skainet-io-gguf/sk.ainet.io.gguf/StreamingGGUFReader

StreamingGGUFReader

class StreamingGGUFReader : AutoCloseable(source)

Streaming GGUF reader that parses metadata without loading the entire file.

Memory usage is proportional to metadata size (~1 MB), not file size (100+ GB). Individual tensors can be loaded on-demand via loadTensor or loadTensorData.

This enables parsing of very large model files (70B+ parameters, 100+ GB) without requiring the entire file to fit in memory.

Usage:

StreamingGGUFReader.open(source).use { reader ->
    // Access metadata immediately - only ~1MB loaded
    println("Tensors: ${reader.tensorCount}")
    println("Architecture: ${reader.fields["general.architecture"]}")

    // Load specific tensor when needed
    val weights = reader.loadTensor("model.embed_tokens.weight")
}

Types

object Companion

Properties

var alignment: Int

Data alignment (default 32 bytes)

var dataOffset: Long

Byte offset where tensor data section begins

val fields: LinkedHashMap<String, Any?>

Parsed metadata fields (key-value pairs from file header)

var kvCount: ULong

Number of key-value metadata entries

var tensorCount: ULong

Total number of tensors in the file

val tensors: List<StreamingTensorInfo>

Parsed tensor metadata (without actual tensor data)

var version: UInt

GGUF format version (2 or 3)

Functions

open override fun close()

fun loadTensor(name: String): ByteArray

Load tensor data by name.

fun loadTensorData(tensor: StreamingTensorInfo): ByteArray

Load tensor data for a specific tensor.

fun loadTensorData(tensor: StreamingTensorInfo, buffer: ByteArray, offset: Int = 0): Int

Load tensor data into an existing buffer. Useful for avoiding allocations when processing multiple tensors.