Package-level declarations

Types

Link copied to clipboard

Global hook for the active MemoryTracker.

Link copied to clipboard
data class AggregateMemoryReport(val tensorCount: Int, val totalLogicalBytes: Long, val totalPhysicalBytes: Long, val fileBackedBytes: Long, val ownedCount: Int, val borrowedCount: Int, val aliasedCount: Int, val fileBackedCount: Int, val copyCount: Long, val copyBytes: Long, val entries: List<TrackedEntry>)
Link copied to clipboard

Provides byte-level read access to a BufferHandle, regardless of its ownership mode.

Link copied to clipboard
sealed interface BufferHandle

Ownership / residency mode of a tensor's backing memory.

Link copied to clipboard

Factory and conversion utilities for creating BufferHandle instances from common Kotlin types and for slicing existing handles.

Link copied to clipboard
interface BufferResolver

Resolves a BufferHandle into a BufferAccessor that can read the underlying bytes. Platform-specific implementations handle file-backed and device-resident buffers; heap-backed handles are resolved generically.

Link copied to clipboard
class ByteArrayAccessor(data: ByteArray, offset: Int = 0, val sizeInBytes: Long = (data.size - offset).toLong()) : BufferAccessor
Link copied to clipboard
class CompressedKvAttention(cache: KvCacheStore, dequantStrategy: CompressedKvAttention.DequantStrategy = DequantStrategy.FULL_TILE)

Bridge between KvCacheStore and the SDPA execution path.

Link copied to clipboard

Default resolver that handles heap-backed handles directly and delegates file-backed handles to a fileBackedResolver.

Link copied to clipboard

Default KV cache implementation using dense FP32 storage.

Link copied to clipboard
Link copied to clipboard
annotation class KvCache(val preset: String = "none", val keyBits: Int = 4, val valueBits: Int = 4, val useQjl: Boolean = false, val maxSeqLen: Int = 0, val device: DeviceKind = DeviceKind.AUTO)

Configures TurboQuant KV-cache compression for an attention layer.

Link copied to clipboard

Resolves KvCache annotations to KvCacheStore instances.

Link copied to clipboard

Disables TurboQuant compression for a specific layer.

Link copied to clipboard
data class KvCacheConfig(val numLayers: Int, val numHeads: Int, val headDim: Int, val maxSeqLen: Int, val keyEncoding: TensorEncoding = TensorEncoding.Dense(4), val valueEncoding: TensorEncoding = TensorEncoding.Dense(4), val placement: Placement = Placement.CPU_HEAP.copy(residency = Residency.PERSISTENT))

Configuration for asymmetric K/V encoding policies.

Link copied to clipboard
data class KvCacheMemoryReport(val numLayers: Int, val numHeads: Int, val headDim: Int, val maxSeqLen: Int, val currentSeqLen: Int, val keyEncoding: TensorEncoding, val valueEncoding: TensorEncoding, val placement: Placement, val keyPhysicalBytes: Long, val valuePhysicalBytes: Long, val keyLogicalBytes: Long, val valueLogicalBytes: Long)

Memory report for a KV cache instance.

Link copied to clipboard
interface KvCacheStore

Dedicated KV-cache storage abstraction for inference.

Link copied to clipboard

Logical numeric type — what the tensor values mean semantically.

Link copied to clipboard
Link copied to clipboard
class MemoryPlanner(availableDevices: Set<DeviceKind> = setOf(DeviceKind.CPU))

Resolves Placement intent into concrete buffer allocation decisions.

Link copied to clipboard

Tracks memory allocation events and reports aggregate statistics across all live TensorStorage instances.

Link copied to clipboard
Link copied to clipboard

Shared contract for all packed/quantized block tensor storage formats.

Link copied to clipboard
annotation class Place(val device: DeviceKind = DeviceKind.AUTO, val memory: MemoryDomain = MemoryDomain.HOST_HEAP, val requirement: Requirement = Requirement.PREFERRED)

Declares placement intent for a tensor parameter or property.

Link copied to clipboard
data class Placement(val device: DeviceKind = DeviceKind.CPU, val domain: MemoryDomain = MemoryDomain.HOST_HEAP, val residency: Residency = Residency.PERSISTENT, val requirement: Requirement = Requirement.PREFERRED, val fallback: DeviceKind = DeviceKind.CPU)

High-level placement descriptor: where a tensor lives and how the runtime should manage it.

Link copied to clipboard
Link copied to clipboard
Link copied to clipboard
data class ResolvedPlacement(val actual: Placement, val usedFallback: Boolean)
Link copied to clipboard
data class StorageMemoryReport(val shape: Shape, val logicalType: LogicalDType, val encoding: TensorEncoding, val ownership: Ownership, val placement: Placement, val logicalBytes: Long, val physicalBytes: Long, val isFileBacked: Boolean, val isAlias: Boolean, val isMutable: Boolean)

Diagnostic snapshot of a single tensor's memory characteristics.

Link copied to clipboard
data class StorageSpec(val logicalType: LogicalDType, val encoding: TensorEncoding = TensorEncoding.Dense(logicalType.sizeInBytes), val ownership: Ownership = Ownership.OWNED, val placement: Placement = Placement.CPU_HEAP)

A storage specification that captures both logical type AND physical encoding + placement intent. This enables factory routing that goes beyond dtype-only decisions.

Link copied to clipboard
sealed interface TensorEncoding

Physical storage encoding — how tensor data is laid out in memory.

Link copied to clipboard
data class TensorStorage(val shape: Shape, val logicalType: LogicalDType, val encoding: TensorEncoding, val buffer: BufferHandle, val placement: Placement = Placement.CPU_HEAP, val byteOffset: Long = 0, val strides: LongArray? = null, val isContiguous: Boolean = true)

Runtime descriptor for a tensor's backing memory.

Link copied to clipboard

Factory methods for constructing TensorStorage from existing SKaiNET types and from raw data. These bridge the old TensorData world to the new storage model.

Link copied to clipboard
data class TrackedEntry(val name: String, val report: StorageMemoryReport)
Link copied to clipboard

KV cache store with TurboQuant compression.

Link copied to clipboard
annotation class Weights(val memory: MemoryDomain = MemoryDomain.MMAP_FILE)

Marks a tensor as an immutable weight that should be file-backed (memory-mapped) when possible.