Running Smoke Tests

The smoke test suite verifies model loading, text generation, and optionally tool calling across all configured models.

Quick Start

./tests/smoke/smoke-test.sh

This uses tests/smoke/smoke-models.json to determine which models to test.

Configuration

Edit tests/smoke/smoke-models.json:

{
  "defaults": {
    "prompt": "The capital of France is",
    "steps": 32,
    "temperature": 0.0
  },
  "models": [
    {
      "name": "TinyLlama-1.1B-Q8",
      "runner": "kllama",
      "model": "tinyllama-1.1b-chat-v1.0.Q8_0.gguf",
      "format": "gguf"
    },
    {
      "name": "Qwen3-1.7B-Q8",
      "runner": "kllama",
      "model": "Qwen3-1.7B-Q8_0.gguf",
      "format": "gguf",
      "instruct": true,
      "prompt": "What is the capital of France?",
      "toolCalling": {
        "prompt": "What is 2 + 2?",
        "steps": 256
      }
    }
  ]
}

Qwen models use the same tensor layout as LLaMA, so the kllama runner handles them directly. The kqwen runner exists for Qwen-specific code paths but is not currently used in the smoke catalog.

Model Fields

Field Required Description

Field	Required	Description
`name`	Yes	Display name in the summary table
`runner`	Yes	Runner to use: `skainet`, `kllama`, `kqwen`, `kgemma`, `kapertus`, `kvoxtral`, `kbert`
`model`	Yes	Path to model file (`~` is expanded, relative paths use `MODELS_ROOT`)
`format`	No	`gguf` or `safetensors` (informational)
`prompt`	No	Override the default prompt
`steps`	No	Override the default step count
`toolCalling`	No	Object with `prompt` and `steps` to enable tool calling test

name

Yes

Display name in the summary table

runner

Yes

Runner to use: skainet, kllama, kqwen, kgemma, kapertus, kvoxtral, kbert

model

Yes

Path to model file (~ is expanded, relative paths use MODELS_ROOT)

format

gguf or safetensors (informational)

prompt

Override the default prompt

steps

Override the default step count

toolCalling

Object with prompt and steps to enable tool calling test

Tool Calling Tests

Models with a toolCalling field get an additional test phase. The smoke test runs kllama --demo in single-shot mode and checks for [Tool Call] in the output.

Results are classified as:

OK — model produced a tool call
WARN — model ran but did not produce a tool call (model too small or prompt not triggering)
FAIL — model crashed or failed to load

Output

The test produces two summary tables:

Summary -- Generation
  Status Model                          Runner       Size      tok/s     Wall
  OK     TinyLlama-1.1B-Q8              kllama       1.1G       3.4    11.8s
  OK     Qwen3-1.7B-Q8                  kqwen        2.0G       2.0    20.8s
  Pass: 2  Fail: 0  Total: 2

Summary -- Tool Calling
  Status Model                          Tool                Wall
  OK     Qwen3-1.7B-Q8                  calculator        45.2s
  Pass: 1  Fail: 0  Total: 1

Adding a New Model

Download or locate the GGUF file
Add an entry to smoke-models.json
Set runner to match the model architecture
Optionally add toolCalling for models that support it
Run ./tests/smoke/smoke-test.sh

Adding a New Runner

Add cases to runner_task(), runner_compile_task(), and runner_args() in tests/smoke/smoke-test.sh.
Reference the new runner name from any model entry in smoke-models.json.

Environment Variables

Variable Purpose Default

Variable	Purpose	Default
`MODELS_ROOT`	Root directory for resolving relative model paths in the JSON config. Absolute paths (`/` or `~/`) are unaffected.	Repository root
`SMOKE_PROMPT`	Default prompt (legacy mode, no JSON config).	`The capital of France is`
`SMOKE_STEPS`	Default step count (legacy mode).	`32`
`SMOKE_TEMP`	Default temperature (legacy mode).	`0.0`

MODELS_ROOT

Root directory for resolving relative model paths in the JSON config. Absolute paths (/ or ~/) are unaffected.

Repository root

SMOKE_PROMPT

Default prompt (legacy mode, no JSON config).

The capital of France is

SMOKE_STEPS

Default step count (legacy mode).

32

SMOKE_TEMP

Default temperature (legacy mode).

0.0

Currently Working Runners

The smoke catalog is curated to reflect what runs end-to-end today:

skainet — unified CLI; auto-detects architecture from GGUF metadata
kllama — TinyLlama, Qwen 2/3 (same layout), Llama 3.x — text generation + tool calling
kgemma — Gemma 4 — text generation only (tool-call format emission has a known gap; see gemma4_toolcall_status)
kbert — BERT — embeddings

The previously available kqwen, kapertus, and kvoxtral runners were removed from the harness. Qwen is covered by kllama; Apertus and Voxtral runtimes remain as libraries but no longer ship a CLI.