LlamaWeightLoader

Adapter that loads LLaMA weights from GGUF files and emits them in the canonical GGUF tensor naming scheme. Validation covers metadata presence and basic shape consistency for the tensors we materialize.

Constructors

Link copied to clipboard
constructor(sourceProvider: () -> Source, loadTensorData: Boolean = true, quantPolicy: QuantPolicy = QuantPolicy.RAW_BYTES)

Primary constructor for sequential Source-based loading. Loads entire file into memory - suitable for models under 2GB.

constructor(randomAccessProvider: () -> RandomAccessSource, quantPolicy: QuantPolicy = QuantPolicy.RAW_BYTES)

Secondary constructor for streaming RandomAccessSource-based loading. Parses metadata only (~1MB memory) and loads tensors on-demand. Suitable for models of any size (100+ GB).

Types

Link copied to clipboard
object Dequant

Backward-compatible companion delegating to shared DequantOps. Existing callers (e.g. LlamaWeightLoader.dequantF16(raw)) continue to work.

Functions

Link copied to clipboard
inline suspend fun <T : DType, V> load(ctx: ExecutionContext, noinline onTensorLoaded: (String, Tensor<T, V>) -> Unit): LlamaModelMetadata

suspend fun <T : DType, V> load(ctx: ExecutionContext, dtype: KClass<T>, onTensorLoaded: (String, Tensor<T, V>) -> Unit): LlamaModelMetadata

Load weights and invoke onTensorLoaded for each required tensor. Returns parsed metadata.

Link copied to clipboard
inline suspend fun <T : DType, V> loadStreaming(ctx: ExecutionContext, noinline onTensorLoaded: (String, Tensor<T, V>) -> Unit): LlamaModelMetadata

suspend fun <T : DType, V> loadStreaming(ctx: ExecutionContext, dtype: KClass<T>, onTensorLoaded: (String, Tensor<T, V>) -> Unit): LlamaModelMetadata

Load weights using streaming API - parses metadata only, loads tensors on-demand. Requires randomAccessProvider constructor.

Link copied to clipboard
inline suspend fun <T : DType, V> loadToMap(ctx: ExecutionContext): LlamaWeights<T, V>

suspend fun <T : DType, V> loadToMap(ctx: ExecutionContext, dtype: KClass<T>): LlamaWeights<T, V>

Convenience helper that collects tensors into a map alongside metadata.

Link copied to clipboard
inline suspend fun <T : DType, V> loadToMapStreaming(ctx: ExecutionContext): LlamaWeights<T, V>

suspend fun <T : DType, V> loadToMapStreaming(ctx: ExecutionContext, dtype: KClass<T>): LlamaWeights<T, V>

Load weights to map using streaming API. Requires randomAccessProvider constructor.