Getting Started

This tutorial walks you through running text generation with a GGUF model using the unified skainet CLI.

This tutorial is part of the canonical SKaiNET Transformers five-minute start path — see the "Start in 5 minutes" section of the repository README.

Prerequisites

JDK 21+ with preview features (Vector API)
A GGUF model file is required — this tutorial does not download one for you. Use a small quantized model for the first run (e.g., tinyllama-1.1b-chat-v1.0.Q8_0.gguf).

Step 1: Build the Project

./gradlew :llm-apps:skainet-cli:classes

Step 2: Run Text Generation

./gradlew :llm-apps:skainet-cli:run \
  --args="-m tinyllama-1.1b-chat-v1.0.Q8_0.gguf 'The capital of France is'"

Expected output:

Architecture: llama, Family: LLaMA / Mistral
Backend: CPU (SIMD)
Loading GGUF model (LLaMA / Mistral, streaming)...
Generating 64 tokens with temperature=0.8...
---
The capital of France is Paris. It is also the largest city in France...
---
tok/s: 3.4

The CLI auto-detects the model architecture from GGUF metadata — no need to specify which runner to use.

Step 3: Interactive Chat

./gradlew :llm-apps:skainet-cli:run \
  --args="-m Qwen3-1.7B-Q8_0.gguf --chat"

This starts a multi-turn conversation with the model using the auto-detected chat template.

Step 4: Tool Calling Demo

./gradlew :llm-apps:skainet-cli:run \
  --args="-m Qwen3-1.7B-Q8_0.gguf --demo"

The demo provides calculator and list_files tools. Type a question like "What is 2 + 2?" and the model will call the calculator tool.

Common First-Run Problems

Problem What to check

Problem	What to check
Model file not found	Use an absolute path to the `.gguf` file for the first run.
`ClassCastException` / scalar fallback on `java -jar`	The Vector API needs `--enable-preview --add-modules jdk.incubator.vector`. Running through `./gradlew :llm-apps:skainet-cli:run` applies them for you.
Out of memory	Start with a smaller quantized model (e.g. a Q4/Q8 1B model) and close memory-heavy applications.
Gradle cannot resolve artifacts	Check that the version you use matches the one in the repository README.
Slow first run	The first run spends extra time resolving dependencies and loading the model.

Model file not found

Use an absolute path to the .gguf file for the first run.

ClassCastException / scalar fallback on java -jar

The Vector API needs --enable-preview --add-modules jdk.incubator.vector. Running through ./gradlew :llm-apps:skainet-cli:run applies them for you.

Out of memory

Start with a smaller quantized model (e.g. a Q4/Q8 1B model) and close memory-heavy applications.

Gradle cannot resolve artifacts

Check that the version you use matches the one in the repository README.

Slow first run

The first run spends extra time resolving dependencies and loading the model.

What’s Next

Tool calling in depth — integrate tool calling into your own application
CLI reference — all available flags and options
Architecture overview — understand the pipeline