GgufModelMetadata
Parsed model metadata from a GGUF file.
This class extracts common metadata fields from the GGUF key-value store in a structured, type-safe manner. Supports various model types including LLMs, vision models, and object detectors.
Usage:
StreamingGGUFReader.open(source).use { reader ->
val metadata = GgufModelMetadata.from(reader)
println("Architecture: ${metadata.architecture}")
println("Classes: ${metadata.classNames?.size ?: "none"}")
}Constructors
Properties
Model architecture identifier (e.g., "llama", "yolov3-tiny", "bert")
BOS token id from tokenizer.ggml.bos_token_id, if present.
Class names for classification/detection models
Context length for language models
Hidden size / embedding dimension
EOS token id from tokenizer.ggml.eos_token_id, if present.
Number of layers
Number of classes (may differ from classNames.size if names not provided)
Merge list from tokenizer.ggml.merges, each entry formatted as "first second" (space-separated). Priority order — index 0 is the highest-priority merge.
Tokenizer model identifier (tokenizer.ggml.model), e.g. "gpt2", "llama", "bert". Used by TokenizerFactory to dispatch to the right tokenizer implementation regardless of file format.
Full vocab as stored in tokenizer.ggml.tokens (index = token id).
Per-token type codes from tokenizer.ggml.token_type. GGUF convention: 1 = normal, 2 = unknown, 3 = control/special, 4 = user-defined, 5 = unused, 6 = byte.