All modules

LICENCE

Vision

SKaiNET aims to democratize "Edge AI / On-device AI" by bridging the gap between high-level application development and low-level hardware optimization. We believe AI should be portable, type-safe, and developer-friendly, enabling seamless intelligence in everything from mobile apps to IoT devices without sacrificing performance.

For architecture details see ARCHITECTURE.md.

Quickstart

Add the core dependencies (Gradle Kotlin DSL):

dependencies {
    implementation("sk.ainet.core:SKaiNET-lang-core:0.19.0")
    implementation("sk.ainet.core:SKaiNET-backend-cpu:0.19.0")
}

Hello Neural Net

val model = nn {
    input(28 * 28)
    dense(out = 128)
    relu()
    dense(out = 10)
}

Core Tensor Ops

val a = tensor(shape(2, 2)) { float(1f, 2f, 3f, 4f) }
val b = tensor(shape(2, 2)) { float(5f, 6f, 7f, 8f) }

val c = a matMul b
val d = c.relu()

GGUF Model Loading

// Recommended: streaming reader — memory-efficient, supports quantized types
val source = JvmRandomAccessSource.open("model.gguf")
StreamingGGUFReader.open(source).use { reader ->
    println("Tensors: ${reader.tensorCount}")
    
    // Load specific tensor on demand (no whole-file loading)
    val bytes = reader.loadTensor("token_embd.weight")
    
    // Or get a TensorStorage descriptor with encoding/placement metadata
    val storage = reader.loadTensorStorage("token_embd.weight")
}

More examples: SKaiNET-examples | SKaiNET-notebook

Ecosystem

SKaiNET is a modular ecosystem. While this repository contains the core engine, specialized high-level libraries are maintained in standalone repositories:

Project	Description
SKaiNET-LLM	Llama, Gemma, and BERT inference runtimes
SKaiNET-transformers	Pre-built transformer architectures and layers
SKaiNET-examples	Sample projects and integration demos

Explore

Goal	Start here
Examples and sample projects	SKaiNET-examples
Interactive notebooks	SKaiNET-notebook
LLM inference (Llama, Gemma)	SKaiNET-LLM

Features

Kotlin Multiplatform

Targets: JVM, macOS (Native), JS, WASM (Browser + WasmWasi)
Single codebase shared across all platforms via Kotlin Multiplatform

Optimized Execution

ComputeGraphExecutor: Optimized engine with fusion passes and trace-to-DAG bridging.
SDPA & Gather: High-performance Scaled Dot-Product Attention and indexing operations.
TurboQuant: Runtime KV-cache compression (~8x at 4-bit) for long-context LLM inference. Presets: safe-lowbit, balanced, experimental-max. See TurboQuantUsage for integration guide.

Agentic AI Infrastructure

ComputeGraph: Unified framework for defining agentic workflows and tool-calling loops.
Java facade: JavaAgentLoop (in skainet-lang-java)

Neural Network DSL

Sequential: nn { input(); dense(); relu(); dense() }
DAG / Graph: arbitrary wiring with dag { } for ResNet, YOLO-style architectures
Layers: Dense, Conv1d/2d/3d, MaxPool, AvgPool, BatchNorm, Dropout, LeakyReLU, ELU
KAN (Kolmogorov–Arnold Networks) layer (experimental)
Autograd engine with reverse-mode gradients, SGD and Adam/AdamW optimizers

Data and I/O

Built-in loaders: MNIST, Fashion-MNIST, CIFAR-10
Formats: GGUF, ONNX, SafeTensors, JSON, Image (JPEG, PNG)
Type-safe transform DSL: resize, crop, normalize, toTensor

Java 21+ Support

SKaiNET entry point, TensorJavaOps, builder-pattern model definition
Maven BOM (sk.ainet:skainet-bom) for one-line version management

Edge AI: Arduino / C99 Export

Export trained models to standalone, optimized C99 with static memory allocation
Ready-to-use Arduino library output

Compiler: MLIR / StableHLO

Lower Kotlin DSL to MLIR StableHLO dialect
Optimization passes: constant folding, operation fusion, dead code elimination
Valid IREE-compilable output with streaming API and public HloGenerator

What's New in 0.19.0

Qwen / GPT-2 Byte-Level BPE Tokenizer — Full GPT-2-style pipeline (byte-to-unicode, pretokenization regex, merge-rank BPE, atomic special-token splitting). Builds from GGUF metadata or HuggingFace tokenizer.json; verified against Qwen2.5-0.5B reference token IDs.
LLaMA / SentencePiece Tokenizer — llama.cpp SPM pipeline with whitespace escape, score-priority BPE (SPM rule, opposite of GPT-2 merge-rank), and <0xNN> byte fallback. Builds from GGUF (tokenizer.ggml.model == "llama") and HuggingFace Unigram tokenizer.json.
TokenizerFactory Per-Architecture Dispatch — Tokenizer selection is now per-architecture, not per file format. Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece, regardless of whether weights come from GGUF or SafeTensors.
Byte-Level BPE Fix for Qwen/GPT-2 — Previously these models encoded text into garbage tokens because GgufModelMetadata ignored tokenizer.ggml.merges entirely, blocking chat mode and tool calling. (#463)
LLaMA GGUF Tokenization Fix — TokenizerFactory previously threw UnsupportedTokenizerException for LLaMA-family GGUFs; the new SentencePiece path closes that gap. (#464)
GGUF UInt Field Fix — UINT32 fields (e.g. tokenizer.ggml.bos_token_id) are Kotlin UInt value classes, not subclasses of Number, and were silently dropped by as? Number casts. Fixed via a toIntFlexible helper that handles every signed and unsigned numeric type GGUF can produce.

See CHANGELOG.md for the full release history.

Roadmap

Q1 2026: Comprehensive documentation ✅
Q2 2026: TurboQuant KV-cache compression ✅ (shipped in 0.18.0); Qwen/LLaMA tokenizers ✅ (shipped in 0.19.0)
Q3 2026: Agentic AI enhancements ✅ (tool calling shipped in 0.13.0; ongoing)
Q4 2026: Federated learning support for multi-device training