skainet-io-gguf/sk.ainet.io.gguf.gemma/Gemma3nWeightLoader

Gemma3nWeightLoader

Adapter that loads Gemma 3n weights from GGUF files.

Key differences from LlamaWeightLoader:

Architecture validation: accepts "gemma3n", "gemma3", "gemma" prefixes
Variable intermediate (FFN) sizes per layer
Per-layer embedding support
Hybrid attention metadata extraction

Constructors

constructor(sourceProvider: () -> Source, loadTensorData: Boolean = true, quantPolicy: QuantPolicy = QuantPolicy.RAW_BYTES)

Primary constructor for sequential Source-based loading. Loads entire file into memory - suitable for models under 2GB.

constructor(randomAccessProvider: () -> RandomAccessSource, quantPolicy: QuantPolicy = QuantPolicy.RAW_BYTES)

Secondary constructor for streaming RandomAccessSource-based loading. Parses metadata only (~1MB memory) and loads tensors on-demand. Suitable for models of any size (100+ GB).

Functions

load

inline suspend fun <T : DType, V> load(ctx: ExecutionContext, noinline onTensorLoaded: (String, Tensor<T, V>) -> Unit): Gemma3nModelMetadata

suspend fun <T : DType, V> load(ctx: ExecutionContext, dtype: KClass<T>, onTensorLoaded: (String, Tensor<T, V>) -> Unit): Gemma3nModelMetadata

Load weights and invoke onTensorLoaded for each required tensor. Returns parsed metadata.

loadStreaming

inline suspend fun <T : DType, V> loadStreaming(ctx: ExecutionContext, noinline onTensorLoaded: (String, Tensor<T, V>) -> Unit): Gemma3nModelMetadata

suspend fun <T : DType, V> loadStreaming(ctx: ExecutionContext, dtype: KClass<T>, onTensorLoaded: (String, Tensor<T, V>) -> Unit): Gemma3nModelMetadata

Load weights using streaming API - parses metadata only, loads tensors on-demand. Requires randomAccessProvider constructor.

loadToMap

inline suspend fun <T : DType, V> loadToMap(ctx: ExecutionContext): Gemma3nWeights<T, V>

suspend fun <T : DType, V> loadToMap(ctx: ExecutionContext, dtype: KClass<T>): Gemma3nWeights<T, V>

Convenience helper that collects tensors into a map alongside metadata.

loadToMapStreaming

inline suspend fun <T : DType, V> loadToMapStreaming(ctx: ExecutionContext): Gemma3nWeights<T, V>

suspend fun <T : DType, V> loadToMapStreaming(ctx: ExecutionContext, dtype: KClass<T>): Gemma3nWeights<T, V>

Load weights to map using streaming API. Requires randomAccessProvider constructor.