Package-level declarations

Types

Link copied to clipboard

Standard GGUF tensor naming for LLaMA-family models.

Link copied to clipboard
data class LlamaLayerWeights<T : DType>(val attnNorm: Tensor<T, Float>, val wq: Tensor<T, Float>, val wk: Tensor<T, Float>, val wv: Tensor<T, Float>, val wo: Tensor<T, Float>, val ffnNorm: Tensor<T, Float>, val ffnGate: Tensor<T, Float>, val ffnDown: Tensor<T, Float>, val ffnUp: Tensor<T, Float>)
Link copied to clipboard
data class LlamaModelMetadata(val architecture: String, val embeddingLength: Int, val contextLength: Int, val blockCount: Int, val headCount: Int, val kvHeadCount: Int, val feedForwardLength: Int, val ropeDimensionCount: Int?, val vocabSize: Int)
Link copied to clipboard
data class LlamaRuntimeWeights<T : DType>(val metadata: LlamaModelMetadata, val tokenEmbedding: Tensor<T, Float>, val ropeFreqReal: Tensor<T, Float>?, val ropeFreqImag: Tensor<T, Float>?, val layers: List<LlamaLayerWeights<T>>, val outputNorm: Tensor<T, Float>, val outputWeight: Tensor<T, Float>, val quantTypes: Map<String, GGMLQuantizationType> = emptyMap())
Link copied to clipboard
Link copied to clipboard

Adapter that loads LLaMA weights from GGUF files and emits them in the canonical GGUF tensor naming scheme. Validation covers metadata presence and basic shape consistency for the tensors we materialize.

Link copied to clipboard

Converts loader-emitted tensors to a typed structure ready for runtime/module wiring. Enforces basic shape sanity against the metadata to fail early before graph construction.

Link copied to clipboard
data class LlamaWeights<T : DType, V>(val metadata: LlamaModelMetadata, val tensors: Map<String, Tensor<T, V>>, val quantTypes: Map<String, GGMLQuantizationType> = emptyMap())
Link copied to clipboard

Memory-mapped GGUF loader that provides zero-copy tensor access.

Link copied to clipboard

Factory for creating quantized tensor data from raw GGUF bytes.

Link copied to clipboard

JVM extensions for QuantizedTensorFactory that produce MemorySegment-backed quantized tensor data for SIMD-friendly access patterns.

Functions

Link copied to clipboard
suspend fun loadLlamaRuntimeWeights(ctx: ExecutionContext, sourceProvider: () -> Source, quantPolicy: QuantPolicy = QuantPolicy.RAW_BYTES, allowQuantized: Boolean = false): LlamaRuntimeWeights<FP32>

Backward-compatible overload defaulting to FP32.

suspend fun <T : DType> loadLlamaRuntimeWeights(ctx: ExecutionContext, sourceProvider: () -> Source, dtype: KClass<T>, quantPolicy: QuantPolicy = QuantPolicy.RAW_BYTES, allowQuantized: Boolean = false): LlamaRuntimeWeights<T>

Convenience loader: reads weights from GGUF source, maps them into runtime structure.

Link copied to clipboard
suspend fun loadLlamaRuntimeWeightsDequantized(ctx: ExecutionContext, sourceProvider: () -> Source): LlamaRuntimeWeights<FP32>

Backward-compatible overload defaulting to FP32.

suspend fun <T : DType> loadLlamaRuntimeWeightsDequantized(ctx: ExecutionContext, sourceProvider: () -> Source, dtype: KClass<T>): LlamaRuntimeWeights<T>

Convenience helper to force dequantization to FP32 (where supported) and fail if any unsupported quant types remain.

Backward-compatible overload defaulting to FP32.

Load LLaMA runtime weights using streaming API with dequantization. Suitable for large models >2GB.

Link copied to clipboard
suspend fun loadLlamaRuntimeWeightsStreaming(ctx: ExecutionContext, randomAccessProvider: () -> RandomAccessSource, quantPolicy: QuantPolicy = QuantPolicy.RAW_BYTES, allowQuantized: Boolean = false): LlamaRuntimeWeights<FP32>

Backward-compatible overload defaulting to FP32.

suspend fun <T : DType> loadLlamaRuntimeWeightsStreaming(ctx: ExecutionContext, randomAccessProvider: () -> RandomAccessSource, dtype: KClass<T>, quantPolicy: QuantPolicy = QuantPolicy.RAW_BYTES, allowQuantized: Boolean = false): LlamaRuntimeWeights<T>

Load LLaMA runtime weights using streaming API. Parses metadata only (~1MB memory), loads tensors on-demand. Suitable for models of any size (100+ GB) that exceed Java array limits.

Link copied to clipboard
fun Tensor<Int8, Byte>.toQ4_0MemSeg(logicalShape: Shape, arena: Arena): Q4MemorySegmentTensorData

Extension: convert raw Int8 tensor to Q4_0 MemorySegment-backed data.

Link copied to clipboard

Extension function to convert raw tensor to Q4_KTensorData.

Link copied to clipboard
fun Tensor<Int8, Byte>.toQ8_0MemSeg(logicalShape: Shape, arena: Arena): Q8MemorySegmentTensorData

Extension: convert raw Int8 tensor to Q8_0 MemorySegment-backed data.

Link copied to clipboard

Extension function to convert raw tensor to Q8_0TensorData.