Getting Started
This tutorial walks you through running text generation with a GGUF model using the unified skainet CLI.
Prerequisites
-
JDK 21+ with preview features (Vector API)
-
A GGUF model file (e.g.,
tinyllama-1.1b-chat-v1.0.Q8_0.gguf)
Step 2: Run Text Generation
./gradlew :llm-apps:skainet-cli:run \
--args="-m tinyllama-1.1b-chat-v1.0.Q8_0.gguf 'The capital of France is'"
Expected output:
Architecture: llama, Family: LLaMA / Mistral Backend: CPU (SIMD) Loading GGUF model (LLaMA / Mistral, streaming)... Generating 64 tokens with temperature=0.8... --- The capital of France is Paris. It is also the largest city in France... --- tok/s: 3.4
The CLI auto-detects the model architecture from GGUF metadata — no need to specify which runner to use.
Step 3: Interactive Chat
./gradlew :llm-apps:skainet-cli:run \
--args="-m Qwen3-1.7B-Q8_0.gguf --chat"
This starts a multi-turn conversation with the model using the auto-detected chat template.
Step 4: Tool Calling Demo
./gradlew :llm-apps:skainet-cli:run \
--args="-m Qwen3-1.7B-Q8_0.gguf --demo"
The demo provides calculator and list_files tools.
Type a question like "What is 2 + 2?" and the model will call the calculator tool.
What’s Next
-
Tool calling in depth — integrate tool calling into your own application
-
CLI reference — all available flags and options
-
Architecture overview — understand the pipeline