skainet-io-gguf/sk.ainet.io.gguf.llama/MmapLlamaLoader

MmapLlamaLoader

class MmapLlamaLoader(filePath: Path) : AutoCloseable(source)

Memory-mapped GGUF loader that provides zero-copy tensor access.

This loader memory-maps the GGUF file and creates tensor views that reference the mapped memory directly, avoiding data copies. This is particularly efficient for large models as:

Memory is loaded on-demand by the OS (lazy loading)
Multiple processes can share the same mapped pages
No explicit memory allocation for tensor data

Limitations:

JVM only (uses java.nio.MappedByteBuffer)
Currently only supports F32 tensors (no quantized tensor support yet)
Quantized models require dequantization which defeats the zero-copy benefit

Usage:

val loader = MmapLlamaLoader(path)
val weights = loader.loadToMap<FP32, Float>(ctx)
// Use weights...
loader.close() // Release mmap when done

Parameters

filePath

path to the GGUF model file

Constructors

MmapLlamaLoader

constructor(filePath: Path)

Functions

open override fun close()

loadToMap

inline fun <T : DType, V> loadToMap(ctx: ExecutionContext): LlamaWeights<T, V>

fun <T : DType, V> loadToMap(ctx: ExecutionContext, dtype: KClass<T>): LlamaWeights<T, V>

Load model weights as mmap-backed tensors.