The SKaiNET DType Model
|
Audience: SKaiNET maintainers and contributors. This page maps
the vocabulary used in the
dtype-policy
RFC (issue #615) onto the existing SKaiNET implementations.
Library consumers don’t need to read this — they call
|
The RFC distinguishes four dtype concepts; the engine mostly already implements them, but under different names. This page is the glossary that keeps the two consistent.
The four dtype concepts
| RFC term | What it means | SKaiNET implementation today | Notes |
|---|---|---|---|
source dtype |
The dtype stored in the on-disk model file ( |
Set by the loader. |
The loader-time mapping is the source-of-truth for what the file actually contains. |
logical dtype |
The dtype the tensor advertises to graph code (op contracts, shape inference, dispatch). |
|
The logical dtype is never inferred from physical storage shape (no "1D byte array patched into 2D" antipattern — every quantized |
required dtype |
The dtype an op, layer, or backend declares it needs. |
Today: implicit in the kernel SPI accessors ( |
The |
lowered dtype |
The dtype actually passed to the executable kernel. |
Whatever |
If a |
Loader source → logical mapping today
Both loaders are explicit about what each on-disk dtype becomes inside the engine. This table is the W0a audit promised by issue #615 — it makes the silent dequant cases visible so the loader-policy work (W0b / W0c) knows what to generalise.
StreamingGgufParametersLoader (skainet-io-gguf)
| GGUF source type | Logical dtype today | Storage class | Native or dequant? |
|---|---|---|---|
|
|
|
native |
|
|
|
native |
|
|
|
dequant on load — no |
|
|
|
dequant on load — no |
|
|
|
native |
|
|
|
native |
The two dequant rows (F16, BF16) are the gap. SafeTensors already
has a Bf16LoadPolicy.KEEP_NATIVE opt-in (see below) that returns
the BF16 bytes verbatim instead of expanding to FP32. The
equivalent for GGUF is W0c (StreamingGgufParametersLoader.loadWithPolicy).
SafeTensorsParametersLoader (skainet-io-safetensors)
| SafeTensors source type | Logical dtype today | Storage class | Native or dequant? |
|---|---|---|---|
|
|
|
native (F64 down-cast with warning) |
|
|
|
dequant on load — no |
|
|
|
policy-controlled: |
|
matching |
wrapped / reinterpreted appropriately |
native |
The BF16 row is the prior art for the RFC’s policy model. Bf16LoadPolicy.toDTypePolicy() (W2, shipped) maps the BF16-specific enum onto the generalised DTypePolicy:
Bf16LoadPolicy.DEQUANT_TO_FP32.toDTypePolicy() // → DTypePolicy.Require(FP32)
Bf16LoadPolicy.KEEP_NATIVE.toDTypePolicy() // → DTypePolicy.Require(BF16)
W0b extends this same idea to F16 and the integer dtypes so the
whole SafeTensors loader can be driven by a single DTypePolicy
argument.
The DType registry vs the kernel capability query
DType.findByName("Float32") returns the singleton FP32 object —
the sealed-interface registry is the source-of-truth for dtype
metadata (size in bits, name, promotion rules). It currently covers
floats and (un)signed integers from Ternary through FP64.
The quantized block formats (Q4_K, Q8_0, Q6_K, Q4_0, …)
are not DType arms — they live as TensorData subtypes in
skainet-lang-core/tensor/data/. That’s intentional: a DType is
a numeric type with promotion semantics, whereas Q4_K is a packed
block format with no scalar interpretation outside its block
context.
For the kernel capability query (KernelProvider.supports(opName,
dtypeKeys), W3), this means the second argument is List<String>
rather than List<DType> — the strings "Q4_K" and "Q8_0" slot
in alongside "Float32" and "BFloat16". The string convention
matches what GGUF / SafeTensors loaders and the StableHLO converter
already use for format identification.
Fail-fast: KernelStrictness
The RFC’s "fail before execution" rule has a small, ready affordance today (W4, shipped):
java -Dskainet.strict.kernels=true …
When set, DefaultCpuOpsJvm.matmul raises
NoSuchKernelException (with the failing dtype pair and the list
of currently-registered providers) just before its silent scalar
fallback would have run. Default off — adaptive behaviour is
preserved.
The constraint-resolution pass (W7) raises the same exception
shape at graph-prep time, before forward execution can even
start. The KernelStrictness affordance is the runtime equivalent
for cases where graph prep hasn’t been run (e.g. ad-hoc tensor-op
code that calls ctx.ops.matmul directly).
Anti-patterns this model prevents
The RFC calls out three concrete anti-patterns the engine must avoid; SKaiNET already prevents all three.
| Anti-pattern | What prevents it in SKaiNET today |
|---|---|
Marker-class dtype detection ( |
The sealed |
Packed bytes treated as logical shape (1D byte array patched into 2D after load) |
Every quantized |
GGUF Q8 confused with native int8 |
They’re different |
Related
-
rfc.md(repo root) — the design document this page implements. -
Issue #615 — implementation tracker.
-
Build tensors with the data DSL — the user-facing entry points that produce the
Tensor<T, V>values whose dtype this page describes. -
Engine benchmark program — runtime numbers that the kernel SPI produces.
-
Reading the matmul benchmark — how the kernel SPI’s dispatch actually shows up in measurements.