Package-level declarations
Types
Standard GGUF tensor naming for LLaMA-family models.
Adapter that loads LLaMA weights from GGUF files and emits them in the canonical GGUF tensor naming scheme. Validation covers metadata presence and basic shape consistency for the tensors we materialize.
Converts loader-emitted tensors to a typed structure ready for runtime/module wiring. Enforces basic shape sanity against the metadata to fail early before graph construction.
Memory-mapped GGUF loader that provides zero-copy tensor access.
Factory for creating quantized tensor data from raw GGUF bytes.
JVM extensions for QuantizedTensorFactory that produce MemorySegment-backed quantized tensor data for SIMD-friendly access patterns.
Functions
Backward-compatible overload defaulting to FP32.
Convenience loader: reads weights from GGUF source, maps them into runtime structure.
Backward-compatible overload defaulting to FP32.
Convenience helper to force dequantization to FP32 (where supported) and fail if any unsupported quant types remain.
Backward-compatible overload defaulting to FP32.
Load LLaMA runtime weights using streaming API with dequantization. Suitable for large models >2GB.
Backward-compatible overload defaulting to FP32.
Load LLaMA runtime weights using streaming API. Parses metadata only (~1MB memory), loads tensors on-demand. Suitable for models of any size (100+ GB) that exceed Java array limits.
Extension: convert raw Int8 tensor to Q4_0 MemorySegment-backed data.
Extension function to convert raw tensor to Q4_KTensorData.
Extension: convert raw Int8 tensor to Q8_0 MemorySegment-backed data.
Extension function to convert raw tensor to Q8_0TensorData.