MmapLlamaLoader

Memory-mapped GGUF loader that provides zero-copy tensor access.

This loader memory-maps the GGUF file and creates tensor views that reference the mapped memory directly, avoiding data copies. This is particularly efficient for large models as:

  • Memory is loaded on-demand by the OS (lazy loading)

  • Multiple processes can share the same mapped pages

  • No explicit memory allocation for tensor data

Limitations:

  • JVM only (uses java.nio.MappedByteBuffer)

  • Currently only supports F32 tensors (no quantized tensor support yet)

  • Quantized models require dequantization which defeats the zero-copy benefit

Usage:

val loader = MmapLlamaLoader(path)
val weights = loader.loadToMap<FP32, Float>(ctx)
// Use weights...
loader.close() // Release mmap when done

Parameters

filePath

path to the GGUF model file

Constructors

Link copied to clipboard
constructor(filePath: Path)

Functions

Link copied to clipboard
open override fun close()
Link copied to clipboard
inline fun <T : DType, V> loadToMap(ctx: ExecutionContext): LlamaWeights<T, V>

fun <T : DType, V> loadToMap(ctx: ExecutionContext, dtype: KClass<T>): LlamaWeights<T, V>

Load model weights as mmap-backed tensors.