SKaiNET Transformers
SKaiNET Transformers is a Kotlin Multiplatform inference engine for large language models. It loads GGUF and SafeTensors models, builds compute graphs from DSL network definitions, applies optimization passes, and executes inference on CPU (with SIMD acceleration) or GPU backends.
Key Features
-
Unified pipeline — load any supported model with a single CLI, auto-detected from GGUF metadata
-
Tool calling — agent loop with tool execution for any model that supports chat templates
-
Compute graph optimization — transpose elimination, weight deduplication, RMSNorm/SwiGLU/QKV fusion
-
Kotlin Multiplatform — runs on JVM, macOS Native, Linux Native, JS, and WASM
-
Quantization support — Q4_K_M, Q8_0, and other GGUF quantization formats with SIMD dequantization
Supported Model Families
| Family | Models | Tool Calling | DSL Network |
|---|---|---|---|
LLaMA |
LLaMA 2/3, Mistral |
Yes |
|
Qwen |
Qwen2, Qwen3, Qwen3.5 |
Yes |
|
Gemma |
Gemma 2, Gemma 3n, Gemma 4 |
Yes |
Hand-coded |
Apertus |
Apertus 8B |
No |
|
BERT |
MiniLM, BERT variants |
No |
|
Voxtral |
Voxtral TTS |
No |
|
Documentation Structure
This documentation follows the Divio documentation system:
- Tutorials
-
Step-by-step lessons to get you started.
- How-to Guides
-
Practical recipes for specific tasks.
- Reference
-
Technical descriptions of APIs and components.
- Explanation
-
Background and design decisions.