Running Smoke Tests
The smoke test suite verifies model loading, text generation, and optionally tool calling across all configured models.
Quick Start
./tests/smoke/smoke-test.sh
This uses tests/smoke/smoke-models.json to determine which models to test.
Configuration
Edit tests/smoke/smoke-models.json:
{
"defaults": {
"prompt": "The capital of France is",
"steps": 32,
"temperature": 0.0
},
"models": [
{
"name": "TinyLlama-1.1B-Q8",
"runner": "kllama",
"model": "tinyllama-1.1b-chat-v1.0.Q8_0.gguf",
"format": "gguf"
},
{
"name": "Qwen3-1.7B-Q8",
"runner": "kllama",
"model": "Qwen3-1.7B-Q8_0.gguf",
"format": "gguf",
"instruct": true,
"prompt": "What is the capital of France?",
"toolCalling": {
"prompt": "What is 2 + 2?",
"steps": 256
}
}
]
}
Qwen models use the same tensor layout as LLaMA, so the kllama runner handles them directly. The kqwen runner exists for Qwen-specific code paths but is not currently used in the smoke catalog.
|
Model Fields
| Field | Required | Description |
|---|---|---|
|
Yes |
Display name in the summary table |
|
Yes |
Runner to use: |
|
Yes |
Path to model file ( |
|
No |
|
|
No |
Override the default prompt |
|
No |
Override the default step count |
|
No |
Object with |
Tool Calling Tests
Models with a toolCalling field get an additional test phase.
The smoke test runs kllama --demo in single-shot mode and checks for [Tool Call] in the output.
Results are classified as:
-
OK — model produced a tool call
-
WARN — model ran but did not produce a tool call (model too small or prompt not triggering)
-
FAIL — model crashed or failed to load
Output
The test produces two summary tables:
Summary -- Generation Status Model Runner Size tok/s Wall OK TinyLlama-1.1B-Q8 kllama 1.1G 3.4 11.8s OK Qwen3-1.7B-Q8 kqwen 2.0G 2.0 20.8s Pass: 2 Fail: 0 Total: 2 Summary -- Tool Calling Status Model Tool Wall OK Qwen3-1.7B-Q8 calculator 45.2s Pass: 1 Fail: 0 Total: 1
Adding a New Model
-
Download or locate the GGUF file
-
Add an entry to
smoke-models.json -
Set
runnerto match the model architecture -
Optionally add
toolCallingfor models that support it -
Run
./tests/smoke/smoke-test.sh
Adding a New Runner
-
Add cases to
runner_task(),runner_compile_task(), andrunner_args()intests/smoke/smoke-test.sh. -
Reference the new runner name from any model entry in
smoke-models.json.
Environment Variables
| Variable | Purpose | Default |
|---|---|---|
|
Root directory for resolving relative model paths in the JSON config. Absolute paths ( |
Repository root |
|
Default prompt (legacy mode, no JSON config). |
|
|
Default step count (legacy mode). |
|
|
Default temperature (legacy mode). |
|
Currently Working Runners
The smoke catalog is curated to reflect what runs end-to-end today:
-
skainet— unified CLI; auto-detects architecture from GGUF metadata -
kllama— TinyLlama, Qwen 2/3 (same layout), Llama 3.x — text generation + tool calling -
kgemma— Gemma 4 — text generation only (tool-call format emission has a known gap; seegemma4_toolcall_status) -
kbert— BERT — embeddings
The previously available kqwen, kapertus, and kvoxtral runners were removed from the harness. Qwen is covered by kllama; Apertus and Voxtral runtimes remain as libraries but no longer ship a CLI.