skainet-io-gguf/sk.ainet.io.gguf.llama/loadLlamaRuntimeWeightsDequantizedStreaming

loadLlamaRuntimeWeightsDequantizedStreaming

suspend fun <T : DType> loadLlamaRuntimeWeightsDequantizedStreaming(ctx: ExecutionContext, randomAccessProvider: () -> RandomAccessSource, dtype: KClass<T>): LlamaRuntimeWeights<T>(source)

Load LLaMA runtime weights using streaming API with dequantization. Suitable for large models >2GB.

suspend fun loadLlamaRuntimeWeightsDequantizedStreaming(ctx: ExecutionContext, randomAccessProvider: () -> RandomAccessSource): LlamaRuntimeWeights<FP32>(source)

Backward-compatible overload defaulting to FP32.