GgufModelMetadata

data class GgufModelMetadata(val architecture: String?, val name: String?, val author: String?, val license: String?, val version: String?, val url: String?, val classNames: List<String>?, val numClasses: Int?, val inputSize: Int?, val contextLength: Int?, val embeddingLength: Int?, val headCount: Int?, val layerCount: Int?, val vocabSize: Int?, val tokenizerModel: String? = null, val tokenizerTokens: List<String>? = null, val tokenizerMerges: List<String>? = null, val tokenizerTokenTypes: List<Int>? = null, val bosTokenId: Int? = null, val eosTokenId: Int? = null, val rawFields: Map<String, Any?>)(source)

Parsed model metadata from a GGUF file.

This class extracts common metadata fields from the GGUF key-value store in a structured, type-safe manner. Supports various model types including LLMs, vision models, and object detectors.

Usage:

StreamingGGUFReader.open(source).use { reader ->
val metadata = GgufModelMetadata.from(reader)
println("Architecture: ${metadata.architecture}")
println("Classes: ${metadata.classNames?.size ?: "none"}")
}

Constructors

Link copied to clipboard
constructor(architecture: String?, name: String?, author: String?, license: String?, version: String?, url: String?, classNames: List<String>?, numClasses: Int?, inputSize: Int?, contextLength: Int?, embeddingLength: Int?, headCount: Int?, layerCount: Int?, vocabSize: Int?, tokenizerModel: String? = null, tokenizerTokens: List<String>? = null, tokenizerMerges: List<String>? = null, tokenizerTokenTypes: List<Int>? = null, bosTokenId: Int? = null, eosTokenId: Int? = null, rawFields: Map<String, Any?>)

Types

Link copied to clipboard
object Companion

Properties

Link copied to clipboard

Model architecture identifier (e.g., "llama", "yolov3-tiny", "bert")

Link copied to clipboard

Author/organization

Link copied to clipboard

BOS token id from tokenizer.ggml.bos_token_id, if present.

Link copied to clipboard

Class names for classification/detection models

Link copied to clipboard

Context length for language models

Link copied to clipboard

Hidden size / embedding dimension

Link copied to clipboard

EOS token id from tokenizer.ggml.eos_token_id, if present.

Link copied to clipboard

Number of attention heads

Link copied to clipboard

Input size for vision models

Link copied to clipboard

Number of layers

Link copied to clipboard

License information

Link copied to clipboard
val name: String?

Model name/description

Link copied to clipboard

Number of classes (may differ from classNames.size if names not provided)

Link copied to clipboard

All raw metadata fields for custom access

Link copied to clipboard

Merge list from tokenizer.ggml.merges, each entry formatted as "first second" (space-separated). Priority order — index 0 is the highest-priority merge.

Link copied to clipboard

Tokenizer model identifier (tokenizer.ggml.model), e.g. "gpt2", "llama", "bert". Used by TokenizerFactory to dispatch to the right tokenizer implementation regardless of file format.

Link copied to clipboard

Full vocab as stored in tokenizer.ggml.tokens (index = token id).

Link copied to clipboard

Per-token type codes from tokenizer.ggml.token_type. GGUF convention: 1 = normal, 2 = unknown, 3 = control/special, 4 = user-defined, 5 = unused, 6 = byte.

Link copied to clipboard
val url: String?

Source URL or repository

Link copied to clipboard

Model version

Link copied to clipboard

Vocabulary size — derived from the tokenizer tokens list when present.