loadKeysForAttention

fun loadKeysForAttention(layer: Int, startPos: Int = 0, endPos: Int = cache.currentSeqLen): FloatArray(source)

Load cached keys for attention, dequantizing as needed.

When the cache uses compressed encoding, this performs tile-level decompression. The returned array is shaped numHeads, seqLen, headDim.

Parameters

layer

Layer index

startPos

Start of the attention window (inclusive)

endPos

End of the attention window (exclusive)