All modules

LICENCE

SKaiNET is an open-source deep learning framework written in Kotlin, designed with developers in mind to enable the creation of modern AI-powered applications with ease.

🌟 Vision

SKaiNET aims to democratize "Edge AI / On-device AI" by bridging the gap between high-level application development and low-level hardware optimization. We believe AI should be portable, type-safe, and developer-friendly, enabling seamless intelligence in everything from mobile apps to IoT devices without sacrificing performance.

!IMPORTANT About the name

“SKaiNET” is a working project name chosen early in the project’s life as part of a personal learning and experimentation effort, before any trademark considerations were known.

The name is not intended to reference, infringe, or imply association with any existing trademarks, companies, or products. It is not a commercial brand and is not claimed or assignable to any company or organization that contributors may be affiliated with.

If a naming conflict arises, the project name may be changed in the future.

🏗️ Architecture

SKaiNET uses a hybrid backend strategy that separates development iteration from production deployment.

Architecture diagram of SKaiNET compiler

Key features at a glance

SKaiNET is Data

Built-in Data Loaders: MNIST, Fashion-MNIST, CIFAR-10
I/O Formats: GGUF, ONNX, SafeTensors, JSON, Image (JPEG, PNG, etc.)
Models: Llama (via KLlama), Gemma, BERT (via KBert)
Transformation DSL: Compose complex preprocessing pipelines including image resizing, normalization, and tensor conversion, using a type-safe Kotlin DSL.

// Data Transformation Pipeline
val transform = transforms<PlatformBitmapImage, Tensor<FP32, Float>> {
    resize(224, 224)
    centerCrop(200, 200)
    toTensor(ctx)
    normalize(ctx, mean = floatArrayOf(0.485f, 0.456f, 0.406f), std = floatArrayOf(0.229f, 0.224f, 0.225f))
}

val processedTensor = transform.apply(rawImage)

// data loaders
val ds = MNIST.load(train = true)
val (x, y) = ds.nextBatch(64)

// Type-safe tensor creation via tensor DSL
val ctx = DefaultNeuralNetworkExecutionContext()
val mask = data<FP32, Float>(ctx) {
    tensor{
        shape(3, 3) {
            from(
                1f, 0f, 0f,
                1f, 1f, 0f,
                1f, 1f, 1f,
            )
        }
    }
}

val t = tensor<FP32, Float>(ctx, FP32::class) {
    tensor {
        shape(2, 3) {
            from(
                0f, 1f, 2f,
                10f, 11f, 12f
            )
        }
    }
}
println("shape=${t.shape} first=${t.data[0,0]}")

SKaiNET is Language

Kotlin DSLs for Data, Neural Nets, Graphs, and Pipelines

Neural network DSL (Sequential)

val model = nn {
    input(28 * 28)
    dense(out = 128)
    relu()
    dense(out = 10)
}

Graph DSL (Functional/DAG)

For complex architectures with arbitrary wiring (like YOLO or ResNet), use the Graph DSL:

val program = dag {
    val x = input<FP32>("input", spec)
    val c1 = conv2d(x, w1, b1, padding = 1 to 1)
    val c2 = conv2d(c1, w2, b2, padding = 1 to 1)
    val sum = add(x, c2)
    output(relu(sum))
}

Read the docs/graph-dsl.md for more details.

SKaiNET is Tools

Kotlin Notebook support Explorer and Notebook-friendly APIs

// Works smoothly in Kotlin Notebooks
display(model.summary())
println(ds.describe())

SKaiNET is Compiler

MLIR/StableHLO Backend: Lowering from high-level Kotlin DSL to MLIR StableHLO dialect.
Optimization Passes: Extensible transformation API for optimizing the compiled IR.

ConstantFoldingPass: Folds arithmetic operations with constant operands.
OperationFusionPass: Fuses multiple ops (e.g., Add + ReLU) into efficient kernels.
DeadCodeEliminationPass: Removes unused computations.

// Applying Compiler Optimizations
val optimizer = StableHloOptimizer.createDefault()
val optimizedModule = optimizer.optimize(mlirModule)

Arduino C Code Generation: Export models to standalone, optimized C99 code with static memory allocation.

// Export model to an Arduino library
val facade = CCodegenFacade()
facade.exportToArduinoLibrary(
    model = model,
    forwardPass = { ctx -> model.forward(input, ctx) },
    outputPath = "build/arduino",
    libraryName = "MyModel"
)

Read the docs/arduino-c-codegen.md for more details.

SKaiNET is for Developers

Clean APIs, growing docs, Maven Central artifacts
Get productive in minutes with minimal deps
First-class Java 21+ support with builder APIs, blocking/async facades, and complete documentation

Kotlin:

dependencies {
    implementation("sk.ainet.core:SKaiNET-lang-core:0.14.0")
    implementation("sk.ainet.core:SKaiNET-backend-cpu:0.14.0")

Java (Maven with BOM):

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>sk.ainet</groupId>
            <artifactId>skainet-bom</artifactId>
            <version>0.14.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <dependency>
        <groupId>sk.ainet</groupId>
        <artifactId>skainet-lang-core-jvm</artifactId>
    </dependency>
    <dependency>
        <groupId>sk.ainet</groupId>
        <artifactId>skainet-backend-cpu-jvm</artifactId>
    </dependency>
</dependencies>

// Hello Tensor in Java
var ctx = SKaiNET.context();
var a = SKaiNET.tensor(ctx, new int[]{2, 3}, DType.fp32(),
        new float[]{1, 2, 3, 4, 5, 6});
var b = SKaiNET.tensor(ctx, new int[]{3, 2}, DType.fp32(),
        new float[]{7, 8, 9, 10, 11, 12});
var c = TensorJavaOps.matmul(a, b);  // [2,2]

SKaiNET is for LLMs

Generate text with just a few lines of code using any Llama-based GGUF model:

Kotlin:

val ctx = DirectCpuExecutionContext()
val ingestion = LlamaIngestion(ctx)

// Load model and tokenizer
val weights = ingestion.load { SystemFileSystem.source(Path("model.gguf")).buffered() }
val tokenizer = GGUFTokenizer.fromSource(SystemFileSystem.source(Path("model.gguf")).buffered())

// Generate!
val runtime = LlamaRuntime(ctx, weights)
runtime.generate(tokenizer.encode("Once upon a time"), steps = 64) { token ->
    print(tokenizer.decode(token))
}

Java:

// Load and generate in ~5 lines
try (var session = KLlamaJava.loadGGUF(Path.of("model.gguf"))) {
    var config = GenerationConfig.builder().maxTokens(256).temperature(0.7f).build();
    session.generate("Once upon a time", config, token -> System.out.print(token));
}

Use it

From Kotlin code in apps, libraries, CLIs
From Java 21+ with idiomatic builder APIs and blocking/async facades
In Kotlin Notebooks for quick exploration
With sample projects to learn patterns

Quick start

Gradle (Kotlin DSL)

dependencyResolutionManagement {
    repositories {
        mavenCentral()
    }
}

dependencies {
    // minimal dependency with simple CPU backend
    implementation("sk.ainet.core:SKaiNET-lang-core:0.14.0")
    implementation("sk.ainet.core:SKaiNET-backend-cpu:0.14.0")

    // simple model zoo
    implementation("sk.ainet.core:SKaiNET-lang-models:0.14.0")

    // Optional I/O (e.g., GGUF loader, SafeTensors, JSON)
    implementation("sk.ainet.core:SKaiNET-io-core:0.14.0")
    implementation("sk.ainet.io:skainet-io-safetensors:0.14.0")
    implementation("sk.ainet.core:SKaiNET-io-gguf:0.14.0")

    // Apps & Runtimes
    implementation("sk.ainet.apps:skainet-llm:0.14.0") // Llama runtime
    implementation("sk.ainet.apps:skainet-bert:0.14.0") // BERT runtime
}

Maven (with BOM)

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>sk.ainet</groupId>
            <artifactId>skainet-bom</artifactId>
            <version>0.14.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <!-- Core tensor operations -->
    <dependency>
        <groupId>sk.ainet</groupId>
        <artifactId>skainet-lang-core-jvm</artifactId>
    </dependency>
    <!-- CPU backend -->
    <dependency>
        <groupId>sk.ainet</groupId>
        <artifactId>skainet-backend-cpu-jvm</artifactId>
    </dependency>
    <!-- Optional: MNIST/data loading -->
    <dependency>
        <groupId>sk.ainet</groupId>
        <artifactId>skainet-data-simple-jvm</artifactId>
    </dependency>
    <!-- Optional: LLM inference -->
    <dependency>
        <groupId>sk.ainet</groupId>
        <artifactId>skainet-kllama-jvm</artifactId>
    </dependency>
</dependencies>

Note for Java users: Add --enable-preview --add-modules jdk.incubator.vector to your JVM args for SIMD acceleration. See docs/java-getting-started.md for full setup instructions.

Examples and notebooks

See examples
examples/java/ — standalone Maven project with tensor ops, MNIST training, LLM inference
Kotlin Notebook: https://github.com/SKaiNET-developers/SKaiNET-notebook

0.14.0 highlights

First-Class Java 21+ Support: Complete Java API surface with SKaiNET entry point, TensorJavaOps, builder-pattern model definition, KLlamaJava/KBertJava facades, and JavaAgentLoop for tool-calling agents.
Maven BOM: sk.ainet:skainet-bom for one-line version management across all modules.
JVM Performance (Jlama Techniques): MemorySegment tensors, SIMD GEMM kernels, paged KV cache, batch prefill, and fused QKV projections for significantly faster LLM inference on JVM.
WasmWasi Target: New wasmWasi platform target across all KMP modules.
StableHLO MLIR Improvements: Fixed MLIR export for valid IREE-compilable output, new streaming API, and public HloGenerator.
Refactored Model Loading: Shared dequantization, registry, and decoder runtime extracted into reusable components.
Java Documentation: docs/java-getting-started.md | docs/java-llm-inference.md | docs/java-model-training.md

// Java: Build and train a model
var model = new SequentialModelBuilder(ctx)
        .input(784).dense(128).relu().dense(10).build();
var loop = TrainingLoop.builder()
        .model(model).loss(Losses.crossEntropy())
        .optimizer(Optimizers.adam(0.001)).context(ctx).build();
float loss = loop.step(x, y);

0.13.0 highlights

Agentic AI & Tool Calling: New skainet-kllama-agent module with support for function calling and tool use.
Gemma 3n Support (KGemma): Support for Google's newest Gemma 3n models, including SafeTensors and HuggingFace tokenizer support.
Enhanced SafeTensors: Unified loading support for SafeTensors across multiple runtimes.

// Example: KLlama tool calling
val agent = KLlamaAgent(llama, tools = listOf(WeatherTool()))
val response = agent.chat("What's the weather like in London?")

0.12.0 highlights

BERT Support (KBert): Pure Kotlin implementation for BERT-based models and Sentence-Transformers.
SafeTensors weight loading: Fast and secure loading of modern model weights.
WordPiece Tokenizer: Native implementation for BERT-style tokenization.

// Example: Generating embeddings with KBert
val runtime = BertRuntime(ctx, weights, FP32::class)
val emb = runtime.encode(inputIds, attentionMask, tokenTypeIds)

0.11.0 highlights

TinyFoA (AAAI 2025): Implemented missing operators to support TinyFoA training pipeline for memory-efficient on-device learning.
Multi-platform KLlama: Added macOS target support for the KLlama runtime.
Custom Backend Documentation: Added detailed guide and examples for injecting custom backends into KLlama.

0.10.1 highlights

Benchmarking & Profiling: New BenchmarkDsl and ExecutionObserver for detailed performance analysis.
RMSNormalization: Added support for RMSNorm layer, commonly used in modern LLMs.
KLlama Improvements: Better weight loading and experimental GPU acceleration.

0.9.2 highlights

SKaiNET for Generative AI: Simplified API for text generation with Llama GGUF models.
Improved GGUF Loading: Fixed critical bugs with column-major storage and added support for more quantization formats.
Better Tokenization: Automatic detection of tokenizer strategies and improved UTF-8 decoding.
Runtime Fixes: Fixed missing attention output projection in Llama models.

0.9.1 highlights

SafeTensors: Native support for the SafeTensors format for secure and fast model loading.
Generalized Weight Loading: Improved I/O pipeline with WeightMapper and progress tracking.
JVM Vector API: Optimized tensor kernels for JVM using SIMD instructions.
Llama & GGUF: Enhanced tokenizer and ingestion logic for Llama-based models.

// Example: Loading SafeTensors weights
val loader = SafeTensorsParametersLoader(ctx)
loader.load("model.safetensors", model)

See CHANGELOG.md for the full list.

0.8.3 highlights (with tiny snippets)

KLlama (Llama 2 port): Initial version supporting GGUF models with mmap for zero-copy loading.
Quantization & BitNet: Support for Q8_0, Q4_K, and BitNet/Ternary (TQ1_0, TQ2_0) formats.
Streaming & I/O: Added streaming support for GGUF/ONNX and improved GGUF metadata loading.
Advanced Operations: Added LeakyReLU, ELU, AvgPool2d, Conv1d, and Conv3d.
Optimizers & Metrics: New Adam, AdamW optimizers and Accuracy metrics.
Datasets & Transforms: Support for CIFAR-10, Fashion-MNIST, and a new Data Transform API.

// Example: Streaming inference with KLlama (GGUF)
val llama = KLlama.load("path/to/model.gguf")
llama.generate("Once upon a time") { token ->
    print(token) // streaming output
}

WASM/JS: Initial support for web-based deployments.
GGUF-only: Simplified I/O by focusing on GGUF (removed legacy formats).

See CHANGELOG.md for the full list.

0.7.1 highlights (with tiny snippets)

Autograd Engine: Initial support for automatic differentiation and reverse-mode gradients using DefaultGradientTape.
Optimization & Training: New SgdOptimizer and training DSL to build and run training loops.
Loss Functions: Added MSELoss and CrossEntropyLoss with configurable reduction strategies.

// Example training step with Autograd
val loss = MSELoss()
val optimizer = sgd(lr = 0.01)

val (tape, l) = record { loss.forward(model.forward(x, ctx), y, ctx) }
tape.computeGradients(targets = listOf(l), sources = model.parameters())
optimizer.step()

Improved Graph DSL with better wiring and recording support.
Stability improvements for StableHLO and CUDA backends.

See CHANGELOG.md for the full list.

0.6.3 highlights (with tiny snippets)

StableHLO and CUDA support via IREE

// Compile model to StableHLO and run on CUDA
val ir = Compile.toStableHlo(model)
println(ir.pretty())

Arduino C99 code generation

// Export model to an Arduino library
val facade = CCodegenFacade()
facade.exportToArduinoLibrary(
    model = model,
    forwardPass = { ctx -> model.forward(input, ctx) },
    outputPath = "build/arduino",
    libraryName = "MyModel"
)

KSP-based TracingOps generation for recording pipelines.
Improved HLO implementation and CUDA backend strategy.

See CHANGELOG.md for the full list.

0.5.0 highlights (with tiny snippets)

Kolmogorov–Arnold Networks (KAN/AKN) preview in the NN DSL

val model = nn {
    input(64)
    dense(out = 64)
    // KAN layer (preview) with residual when dims match
    kanLayer(outputDim = 64, gridSize = 16, useResidual = true)
    dense(out = 10)
}

Training/Eval phases made easy

val base = DefaultNeuralNetworkExecutionContext() // default = EVAL
val yTrain = train(base) { ctx -> model.forward(x, ctx) }
val yEval  = eval(base)  { ctx -> model.forward(x, ctx) }

Dropout and BatchNorm layers

val y = x
    .let { dropout(p = 0.1).forward(it, ctx) }
    .let { batchNorm(numFeatures = 64).forward(it, ctx) }

Conv2D + MaxPool in the NN DSL

val model = nn {
    conv2d(outChannels = 16, kernel = 3)
    maxPool2d(kernel = 2)
    dense(out = 10)
}

Data API with MNIST loader and JSON dataset support

val ds = MNIST.load(train = true) // platform-aware loader
val (batchX, batchY) = ds.nextBatch(64)

GGUF model loading (initial)

val gguf = GGUF.read("/path/to/model.gguf")
println("Tensors: ${gguf.tensors.size}")

SIMD/Vector API acceleration on JVM; MatMul, tril, pooling ops; forward hooks and simple tape recording; unified tensor creation contexts; nested data blocks returning tensors.

Experimental: Kolmogorov–Arnold Networks (KAN)

SKaiNET includes an initial KAN layer implementation that you can wire into the NN DSL. A KAN layer expands each input feature by a learnable grid of basis coefficients and then mixes them with a linear projection, with optional bias and residual connection.

Current status: experimental/preview. API and behavior may change.
Forward path uses broadcasted basis expansion and a matmul mixing step.
gridSize, useBias, useResidual, and a custom baseActivation are supported. The degree parameter is reserved for future spline/basis functions and is not yet used.

Quick usage example:

val model = nn {
    input(64)
    dense(out = 64)
    // Add a KAN layer that keeps the same dimensionality and uses a residual connection
    kanLayer(outputDim = 64, gridSize = 16, useResidual = true)
    dense(out = 10)
}

Notes and limitations:

Works with the default CPU backend; performance tuning and specialized kernels may arrive later.
Residuals are applied only when outputDim == inputDim.
You can customize initializers for the mixing weights, basis, and bias via the DSL block.

See source for details:

SKaiNET-lang/SKaiNET-kan/src/commonMain/kotlin/sk/ainet/lang/kan/KanDsl.kt
SKaiNET-lang/SKaiNET-kan/src/commonMain/kotlin/sk/ainet/lang/kan/KanLayer.kt

🚀 Sample Usage: Autograd

Minimize cosine distance between tensors with just a few lines:

skainet(ctx) {
    val a = tensor(1f, 0f, 0f).withRequiresGrad()
    val b = tensor(0f, 1f, 0f)

    // Record and compute gradients
    val (tape, distance) = record { a.cosineDistance(b) }
    tape.computeGradients(targets = listOf(distance), sources = listOf(a))

    // Optimize
    val optimizer = sgd(lr = 0.5)
    optimizer.addParameter(a)
    optimizer.step()
    
    println("Distance decreased to: ${a.cosineDistance(b).data.get()}")
}

🗺️ Roadmap

Q1 2026: Comprehensive documentation. ✅ (Java docs, architecture guides)
Q2 2026: Reference-based validation of the correctness of computations.
Q3 2026: Agentic AI enhancements. ✅ (Tool calling shipped in 0.13.0; ongoing improvements)
Q4 2026: Federated learning support for multi-device training.