Tokenizer

interface Tokenizer(source)

Common surface for all tokenizer implementations.

Tokenizer selection is per-architecture, not per file format — see TokenizerFactory. A Qwen model needs byte-level BPE whether its weights come from .gguf or .safetensors; a LLaMA model needs SentencePiece regardless of format.

Inheritors

Properties

Link copied to clipboard
abstract val bosTokenId: Int?
Link copied to clipboard
abstract val eosTokenId: Int?
Link copied to clipboard
abstract val vocabSize: Int

Functions

Link copied to clipboard
abstract fun decode(ids: IntArray): String
Link copied to clipboard
abstract fun encode(text: String): IntArray