loadLlamaRuntimeWeightsDequantizedStreaming
suspend fun <T : DType> loadLlamaRuntimeWeightsDequantizedStreaming(ctx: ExecutionContext, randomAccessProvider: () -> RandomAccessSource, dtype: KClass<T>): LlamaRuntimeWeights<T>(source)
Load LLaMA runtime weights using streaming API with dequantization. Suitable for large models >2GB.
suspend fun loadLlamaRuntimeWeightsDequantizedStreaming(ctx: ExecutionContext, randomAccessProvider: () -> RandomAccessSource): LlamaRuntimeWeights<FP32>(source)
Backward-compatible overload defaulting to FP32.