Vision
SKaiNET aims to democratize "Edge AI / On-device AI" by bridging the gap between high-level application development and low-level hardware optimization. We believe AI should be portable, type-safe, and developer-friendly, enabling seamless intelligence in everything from mobile apps to IoT devices without sacrificing performance.
For architecture details see ARCHITECTURE.md.
Quickstart
Add the core dependencies (Gradle Kotlin DSL):
dependencies {
implementation("sk.ainet.core:SKaiNET-lang-core:0.19.0")
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.19.0")
}
Hello Neural Net
val model = nn {
input(28 * 28)
dense(out = 128)
relu()
dense(out = 10)
}
Core Tensor Ops
val a = tensor(shape(2, 2)) { float(1f, 2f, 3f, 4f) }
val b = tensor(shape(2, 2)) { float(5f, 6f, 7f, 8f) }
val c = a matMul b
val d = c.relu()
GGUF Model Loading
// Recommended: streaming reader — memory-efficient, supports quantized types
val source = JvmRandomAccessSource.open("model.gguf")
StreamingGGUFReader.open(source).use { reader ->
println("Tensors: ${reader.tensorCount}")
// Load specific tensor on demand (no whole-file loading)
val bytes = reader.loadTensor("token_embd.weight")
// Or get a TensorStorage descriptor with encoding/placement metadata
val storage = reader.loadTensorStorage("token_embd.weight")
}
More examples: SKaiNET-examples | SKaiNET-notebook
Ecosystem
SKaiNET is a modular ecosystem. While this repository contains the core engine, specialized high-level libraries are maintained in standalone repositories:
| Project | Description |
|---|---|
| SKaiNET-LLM | Llama, Gemma, and BERT inference runtimes |
| SKaiNET-transformers | Pre-built transformer architectures and layers |
| SKaiNET-examples | Sample projects and integration demos |
Explore
| Goal | Start here |
|---|---|
| Examples and sample projects | SKaiNET-examples |
| Interactive notebooks | SKaiNET-notebook |
| LLM inference (Llama, Gemma) | SKaiNET-LLM |
Features
Kotlin Multiplatform
-
Targets: JVM, macOS (Native), JS, WASM (Browser + WasmWasi)
-
Single codebase shared across all platforms via Kotlin Multiplatform
Optimized Execution
-
ComputeGraphExecutor: Optimized engine with fusion passes and trace-to-DAG bridging.
-
SDPA & Gather: High-performance Scaled Dot-Product Attention and indexing operations.
-
TurboQuant: Runtime KV-cache compression (~8x at 4-bit) for long-context LLM inference. Presets:
safe-lowbit,balanced,experimental-max. SeeTurboQuantUsagefor integration guide.
Agentic AI Infrastructure
-
ComputeGraph: Unified framework for defining agentic workflows and tool-calling loops.
-
Java facade:
JavaAgentLoop(inskainet-lang-java)
Neural Network DSL
-
Sequential:
nn { input(); dense(); relu(); dense() } -
DAG / Graph: arbitrary wiring with
dag { }for ResNet, YOLO-style architectures -
Layers: Dense, Conv1d/2d/3d, MaxPool, AvgPool, BatchNorm, Dropout, LeakyReLU, ELU
-
KAN (Kolmogorov–Arnold Networks) layer (experimental)
-
Autograd engine with reverse-mode gradients, SGD and Adam/AdamW optimizers
Data and I/O
-
Built-in loaders: MNIST, Fashion-MNIST, CIFAR-10
-
Formats: GGUF, ONNX, SafeTensors, JSON, Image (JPEG, PNG)
-
Type-safe transform DSL: resize, crop, normalize, toTensor
Java 21+ Support
-
SKaiNETentry point,TensorJavaOps, builder-pattern model definition -
Maven BOM (
sk.ainet:skainet-bom) for one-line version management
Edge AI: Arduino / C99 Export
-
Export trained models to standalone, optimized C99 with static memory allocation
-
Ready-to-use Arduino library output
Compiler: MLIR / StableHLO
-
Lower Kotlin DSL to MLIR StableHLO dialect
-
Optimization passes: constant folding, operation fusion, dead code elimination
-
Valid IREE-compilable output with streaming API and public
HloGenerator
What's New in 0.19.0
-
Qwen / GPT-2 Byte-Level BPE Tokenizer — Full GPT-2-style pipeline (byte-to-unicode, pretokenization regex, merge-rank BPE, atomic special-token splitting). Builds from GGUF metadata or HuggingFace
tokenizer.json; verified against Qwen2.5-0.5B reference token IDs. -
LLaMA / SentencePiece Tokenizer — llama.cpp SPM pipeline with whitespace escape, score-priority BPE (SPM rule, opposite of GPT-2 merge-rank), and
<0xNN>byte fallback. Builds from GGUF (tokenizer.ggml.model == "llama") and HuggingFace Unigramtokenizer.json. -
TokenizerFactoryPer-Architecture Dispatch — Tokenizer selection is now per-architecture, not per file format. Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece, regardless of whether weights come from GGUF or SafeTensors. -
Byte-Level BPE Fix for Qwen/GPT-2 — Previously these models encoded text into garbage tokens because
GgufModelMetadataignoredtokenizer.ggml.mergesentirely, blocking chat mode and tool calling. (#463) -
LLaMA GGUF Tokenization Fix —
TokenizerFactorypreviously threwUnsupportedTokenizerExceptionfor LLaMA-family GGUFs; the new SentencePiece path closes that gap. (#464) -
GGUF UInt Field Fix — UINT32 fields (e.g.
tokenizer.ggml.bos_token_id) are KotlinUIntvalue classes, not subclasses ofNumber, and were silently dropped byas? Numbercasts. Fixed via atoIntFlexiblehelper that handles every signed and unsigned numeric type GGUF can produce.
See CHANGELOG.md for the full release history.
Roadmap
-
Q1 2026: Comprehensive documentation ✅
-
Q2 2026: TurboQuant KV-cache compression ✅ (shipped in 0.18.0); Qwen/LLaMA tokenizers ✅ (shipped in 0.19.0)
-
Q3 2026: Agentic AI enhancements ✅ (tool calling shipped in 0.13.0; ongoing)
-
Q4 2026: Federated learning support for multi-device training
Contributing & Community
We love contributions! Whether it's a new operator, documentation, or a bug fix:
-
Read our CONTRIBUTING.md.
-
Check the Good First Issues.
-
Open a discussion or issue on GitHub.
Browse the full codebase documentation on DeepWiki.
Contributors (0.14.0)
-
Dhia Chemingui (@dhiaspaner) — Android KMP plugin migration (#385, #386)
License
MIT — see LICENCE.