Gemma3nWeightLoader

Adapter that loads Gemma 3n weights from GGUF files.

Key differences from LlamaWeightLoader:

  • Architecture validation: accepts "gemma3n", "gemma3", "gemma" prefixes

  • Variable intermediate (FFN) sizes per layer

  • Per-layer embedding support

  • Hybrid attention metadata extraction

Constructors

Link copied to clipboard
constructor(sourceProvider: () -> Source, loadTensorData: Boolean = true, quantPolicy: QuantPolicy = QuantPolicy.RAW_BYTES)

Primary constructor for sequential Source-based loading. Loads entire file into memory - suitable for models under 2GB.

constructor(randomAccessProvider: () -> RandomAccessSource, quantPolicy: QuantPolicy = QuantPolicy.RAW_BYTES)

Secondary constructor for streaming RandomAccessSource-based loading. Parses metadata only (~1MB memory) and loads tensors on-demand. Suitable for models of any size (100+ GB).

Functions

Link copied to clipboard
inline suspend fun <T : DType, V> load(ctx: ExecutionContext, noinline onTensorLoaded: (String, Tensor<T, V>) -> Unit): Gemma3nModelMetadata

suspend fun <T : DType, V> load(ctx: ExecutionContext, dtype: KClass<T>, onTensorLoaded: (String, Tensor<T, V>) -> Unit): Gemma3nModelMetadata

Load weights and invoke onTensorLoaded for each required tensor. Returns parsed metadata.

Link copied to clipboard
inline suspend fun <T : DType, V> loadStreaming(ctx: ExecutionContext, noinline onTensorLoaded: (String, Tensor<T, V>) -> Unit): Gemma3nModelMetadata

suspend fun <T : DType, V> loadStreaming(ctx: ExecutionContext, dtype: KClass<T>, onTensorLoaded: (String, Tensor<T, V>) -> Unit): Gemma3nModelMetadata

Load weights using streaming API - parses metadata only, loads tensors on-demand. Requires randomAccessProvider constructor.

Link copied to clipboard
inline suspend fun <T : DType, V> loadToMap(ctx: ExecutionContext): Gemma3nWeights<T, V>

suspend fun <T : DType, V> loadToMap(ctx: ExecutionContext, dtype: KClass<T>): Gemma3nWeights<T, V>

Convenience helper that collects tensors into a map alongside metadata.

Link copied to clipboard
inline suspend fun <T : DType, V> loadToMapStreaming(ctx: ExecutionContext): Gemma3nWeights<T, V>

suspend fun <T : DType, V> loadToMapStreaming(ctx: ExecutionContext, dtype: KClass<T>): Gemma3nWeights<T, V>

Load weights to map using streaming API. Requires randomAccessProvider constructor.