skainet-io-gguf/sk.ainet.io.gguf/GgufModelMetadata

GgufModelMetadata

data class GgufModelMetadata(val architecture: String?, val name: String?, val author: String?, val license: String?, val version: String?, val url: String?, val classNames: List<String>?, val numClasses: Int?, val inputSize: Int?, val contextLength: Int?, val embeddingLength: Int?, val headCount: Int?, val layerCount: Int?, val vocabSize: Int?, val tokenizerModel: String? = null, val tokenizerTokens: List<String>? = null, val tokenizerMerges: List<String>? = null, val tokenizerTokenTypes: List<Int>? = null, val bosTokenId: Int? = null, val eosTokenId: Int? = null, val rawFields: Map<String, Any?>)(source)

Parsed model metadata from a GGUF file.

This class extracts common metadata fields from the GGUF key-value store in a structured, type-safe manner. Supports various model types including LLMs, vision models, and object detectors.

Usage:

StreamingGGUFReader.open(source).use { reader ->
    val metadata = GgufModelMetadata.from(reader)
    println("Architecture: ${metadata.architecture}")
    println("Classes: ${metadata.classNames?.size ?: "none"}")
}

Constructors

GgufModelMetadata

constructor(architecture: String?, name: String?, author: String?, license: String?, version: String?, url: String?, classNames: List<String>?, numClasses: Int?, inputSize: Int?, contextLength: Int?, embeddingLength: Int?, headCount: Int?, layerCount: Int?, vocabSize: Int?, tokenizerModel: String? = null, tokenizerTokens: List<String>? = null, tokenizerMerges: List<String>? = null, tokenizerTokenTypes: List<Int>? = null, bosTokenId: Int? = null, eosTokenId: Int? = null, rawFields: Map<String, Any?>)

Types

Companion

object Companion

Properties

architecture

val architecture: String?

Model architecture identifier (e.g., "llama", "yolov3-tiny", "bert")

author

val author: String?

Author/organization

bosTokenId

val bosTokenId: Int?

BOS token id from tokenizer.ggml.bos_token_id, if present.

classNames

val classNames: List<String>?

Class names for classification/detection models

contextLength

val contextLength: Int?

Context length for language models

embeddingLength

val embeddingLength: Int?

Hidden size / embedding dimension

eosTokenId

val eosTokenId: Int?

EOS token id from tokenizer.ggml.eos_token_id, if present.

headCount

val headCount: Int?

Number of attention heads

inputSize

val inputSize: Int?

Input size for vision models

layerCount

val layerCount: Int?

Number of layers

license

val license: String?

License information

name

val name: String?

Model name/description

numClasses

val numClasses: Int?

Number of classes (may differ from classNames.size if names not provided)

rawFields

val rawFields: Map<String, Any?>

All raw metadata fields for custom access

tokenizerMerges

val tokenizerMerges: List<String>?

Merge list from tokenizer.ggml.merges, each entry formatted as "first second" (space-separated). Priority order — index 0 is the highest-priority merge.

tokenizerModel

val tokenizerModel: String?

Tokenizer model identifier (tokenizer.ggml.model), e.g. "gpt2", "llama", "bert". Used by TokenizerFactory to dispatch to the right tokenizer implementation regardless of file format.

tokenizerTokens

val tokenizerTokens: List<String>?

Full vocab as stored in tokenizer.ggml.tokens (index = token id).

tokenizerTokenTypes

val tokenizerTokenTypes: List<Int>?

Per-token type codes from tokenizer.ggml.token_type. GGUF convention: 1 = normal, 2 = unknown, 3 = control/special, 4 = user-defined, 5 = unused, 6 = byte.

url

val url: String?

Source URL or repository

version

val version: String?

Model version

vocabSize

val vocabSize: Int?

Vocabulary size — derived from the tokenizer tokens list when present.