loadKeysForAttention
fun loadKeysForAttention(layer: Int, startPos: Int = 0, endPos: Int = cache.currentSeqLen): FloatArray(source)
Load cached keys for attention, dequantizing as needed.
When the cache uses compressed encoding, this performs tile-level decompression. The returned array is shaped numHeads, seqLen, headDim.
Parameters
layer
Layer index
startPos
Start of the attention window (inclusive)
endPos
End of the attention window (exclusive)