loadValuesForAttention

fun loadValuesForAttention(layer: Int, startPos: Int = 0, endPos: Int = cache.currentSeqLen): FloatArray(source)

Load cached values for attention, dequantizing as needed.

Parameters

layer

Layer index

startPos

Start of the attention window (inclusive)

endPos

End of the attention window (exclusive)