StreamingGGUFReader
Streaming GGUF reader that parses metadata without loading the entire file.
Memory usage is proportional to metadata size (~1 MB), not file size (100+ GB). Individual tensors can be loaded on-demand via loadTensor or loadTensorData.
This enables parsing of very large model files (70B+ parameters, 100+ GB) without requiring the entire file to fit in memory.
Usage:
StreamingGGUFReader.open(source).use { reader ->
// Access metadata immediately - only ~1MB loaded
println("Tensors: ${reader.tensorCount}")
println("Architecture: ${reader.fields["general.architecture"]}")
// Load specific tensor when needed
val weights = reader.loadTensor("model.embed_tokens.weight")
}Content copied to clipboard