SKaiNET Transformers

SKaiNET Transformers is a Kotlin Multiplatform inference engine for large language models. It loads GGUF and SafeTensors models, builds compute graphs from DSL network definitions, applies optimization passes, and executes inference on CPU (with SIMD acceleration) or GPU backends.

Key Features

  • Unified pipeline — load any supported model with a single CLI, auto-detected from GGUF metadata

  • Tool calling — agent loop with tool execution for any model that supports chat templates

  • Compute graph optimization — transpose elimination, weight deduplication, RMSNorm/SwiGLU/QKV fusion

  • Kotlin Multiplatform — runs on JVM, macOS Native, Linux Native, JS, and WASM

  • Quantization support — Q4_K_M, Q8_0, and other GGUF quantization formats with SIMD dequantization

Supported Model Families

Family Models Tool Calling DSL Network

LLaMA

LLaMA 2/3, Mistral

Yes

llamaNetwork()

Qwen

Qwen2, Qwen3, Qwen3.5

Yes

qwenNetwork()

Gemma

Gemma 2, Gemma 3n, Gemma 4

Yes

Hand-coded

Apertus

Apertus 8B

No

apertusNetwork()

BERT

MiniLM, BERT variants

No

bertNetwork()

Voxtral

Voxtral TTS

No

voxtralBackboneNetwork()

Documentation Structure

This documentation follows the Divio documentation system:

Tutorials

Step-by-step lessons to get you started.

How-to Guides

Practical recipes for specific tasks.

Reference

Technical descriptions of APIs and components.

Explanation

Background and design decisions.