skainet-io-gguf/sk.ainet.io.gguf.llama/LlamaWeightLoader

LlamaWeightLoader

Adapter that loads LLaMA weights from GGUF files and emits them in the canonical GGUF tensor naming scheme. Validation covers metadata presence and basic shape consistency for the tensors we materialize.

Constructors

LlamaWeightLoader

constructor(sourceProvider: () -> Source, loadTensorData: Boolean = true, quantPolicy: QuantPolicy = QuantPolicy.RAW_BYTES)

Primary constructor for sequential Source-based loading. Loads entire file into memory - suitable for models under 2GB.

constructor(randomAccessProvider: () -> RandomAccessSource, quantPolicy: QuantPolicy = QuantPolicy.RAW_BYTES)

Secondary constructor for streaming RandomAccessSource-based loading. Parses metadata only (~1MB memory) and loads tensors on-demand. Suitable for models of any size (100+ GB).

Types

Dequant

object Dequant

Backward-compatible companion delegating to shared DequantOps. Existing callers (e.g. LlamaWeightLoader.dequantF16(raw)) continue to work.

Functions

load

inline suspend fun <T : DType, V> load(ctx: ExecutionContext, noinline onTensorLoaded: (String, Tensor<T, V>) -> Unit): LlamaModelMetadata

suspend fun <T : DType, V> load(ctx: ExecutionContext, dtype: KClass<T>, onTensorLoaded: (String, Tensor<T, V>) -> Unit): LlamaModelMetadata

Load weights and invoke onTensorLoaded for each required tensor. Returns parsed metadata.

loadStreaming

inline suspend fun <T : DType, V> loadStreaming(ctx: ExecutionContext, noinline onTensorLoaded: (String, Tensor<T, V>) -> Unit): LlamaModelMetadata

suspend fun <T : DType, V> loadStreaming(ctx: ExecutionContext, dtype: KClass<T>, onTensorLoaded: (String, Tensor<T, V>) -> Unit): LlamaModelMetadata

Load weights using streaming API - parses metadata only, loads tensors on-demand. Requires randomAccessProvider constructor.

loadToMap

inline suspend fun <T : DType, V> loadToMap(ctx: ExecutionContext): LlamaWeights<T, V>

suspend fun <T : DType, V> loadToMap(ctx: ExecutionContext, dtype: KClass<T>): LlamaWeights<T, V>

Convenience helper that collects tensors into a map alongside metadata.

loadToMapStreaming

inline suspend fun <T : DType, V> loadToMapStreaming(ctx: ExecutionContext): LlamaWeights<T, V>

suspend fun <T : DType, V> loadToMapStreaming(ctx: ExecutionContext, dtype: KClass<T>): LlamaWeights<T, V>

Load weights to map using streaming API. Requires randomAccessProvider constructor.