StreamingGGUFReader

Streaming GGUF reader that parses metadata without loading the entire file.

Memory usage is proportional to metadata size (~1 MB), not file size (100+ GB). Individual tensors can be loaded on-demand via loadTensor or loadTensorData.

This enables parsing of very large model files (70B+ parameters, 100+ GB) without requiring the entire file to fit in memory.

Usage:

StreamingGGUFReader.open(source).use { reader ->
// Access metadata immediately - only ~1MB loaded
println("Tensors: ${reader.tensorCount}")
println("Architecture: ${reader.fields["general.architecture"]}")

// Load specific tensor when needed
val weights = reader.loadTensor("model.embed_tokens.weight")
}

Types

Link copied to clipboard
object Companion

Properties

Link copied to clipboard

Data alignment (default 32 bytes)

Link copied to clipboard

Byte offset where tensor data section begins

Link copied to clipboard

Parsed metadata fields (key-value pairs from file header)

Link copied to clipboard

Number of key-value metadata entries

Link copied to clipboard

Total number of tensors in the file

Link copied to clipboard

Parsed tensor metadata (without actual tensor data)

Link copied to clipboard

GGUF format version (2 or 3)

Functions

Link copied to clipboard
open override fun close()
Link copied to clipboard

Load tensor data by name.

Link copied to clipboard

Load tensor data for a specific tensor.

fun loadTensorData(tensor: StreamingTensorInfo, buffer: ByteArray, offset: Int = 0): Int

Load tensor data into an existing buffer. Useful for avoiding allocations when processing multiple tensors.