LlamaWeightLoader
Adapter that loads LLaMA weights from GGUF files and emits them in the canonical GGUF tensor naming scheme. Validation covers metadata presence and basic shape consistency for the tensors we materialize.
Constructors
Primary constructor for sequential Source-based loading. Loads entire file into memory - suitable for models under 2GB.
Secondary constructor for streaming RandomAccessSource-based loading. Parses metadata only (~1MB memory) and loads tensors on-demand. Suitable for models of any size (100+ GB).
Types
Backward-compatible companion delegating to shared DequantOps. Existing callers (e.g. LlamaWeightLoader.dequantF16(raw)) continue to work.
Functions
Load weights and invoke onTensorLoaded for each required tensor. Returns parsed metadata.
Load weights using streaming API - parses metadata only, loads tensors on-demand. Requires randomAccessProvider constructor.
Convenience helper that collects tensors into a map alongside metadata.
Load weights to map using streaming API. Requires randomAccessProvider constructor.