MmapLlamaLoader
Memory-mapped GGUF loader that provides zero-copy tensor access.
This loader memory-maps the GGUF file and creates tensor views that reference the mapped memory directly, avoiding data copies. This is particularly efficient for large models as:
Memory is loaded on-demand by the OS (lazy loading)
Multiple processes can share the same mapped pages
No explicit memory allocation for tensor data
Limitations:
JVM only (uses java.nio.MappedByteBuffer)
Currently only supports F32 tensors (no quantized tensor support yet)
Quantized models require dequantization which defeats the zero-copy benefit
Usage:
val loader = MmapLlamaLoader(path)
val weights = loader.loadToMap<FP32, Float>(ctx)
// Use weights...
loader.close() // Release mmap when doneContent copied to clipboard
Parameters
filePath
path to the GGUF model file