encode

open override fun encode(text: String): IntArray(source)

Encode text to token IDs.

  1. Split text using pre-tokenization regex pattern

  2. For each chunk, convert to bytes and apply BPE merges

  3. Offset ranks by numSpecialTokens to get final IDs