Embeddings — Getting Started

This tutorial walks through producing dense vector embeddings for text — the kind you feed into a vector store for semantic search, RAG, or sentence similarity. The runtime is BERT-style; the public API is the provider-neutral EmbeddingModel SPI from llm-api.

Prerequisites

JDK 21+ (Java 25 preferred for the Vector API)
Nothing else — models download from the Hugging Face Hub on first use and are cached under ~/.cache/skainet/models/ (or point at a local sentence-transformers SafeTensors snapshot)

From the CLI

The fastest way to verify embeddings work end-to-end:

./gradlew :llm-apps:kbert-cli:run \
  --args="MongoDB/mdbr-leaf-mt 'The quick brown fox jumps over the lazy dog'"

Or with a document for similarity (a local snapshot directory works in place of the repo id):

./gradlew :llm-apps:kbert-cli:run \
  --args="MongoDB/mdbr-leaf-mt 'pangram' 'A pangram is a sentence that contains every letter of the alphabet.'"

From Kotlin / Java — `EmbeddingModel` SPI

The neutral SPI lives in llm-api:

public interface EmbeddingModel : AutoCloseable {
    public fun call(request: EmbeddingRequest): EmbeddingResponse
    public fun embed(text: String): FloatArray
    public fun embed(texts: List<String>): List<FloatArray>
    public val dimensions: Int
}

The one-call factory in llm-providers/BertEmbeddingModel.kt builds the whole stack — DSL network (bertNetwork()), weight mapping, tokenizer, encoder runtime — behind the SPI:

import sk.ainet.llm.providers.BertEmbeddingModel

// Straight from the Hugging Face Hub (downloads + caches on first use;
// picks up HF_TOKEN for gated repos):
val model: EmbeddingModel = BertEmbeddingModel.fromHuggingFace("MongoDB/mdbr-leaf-mt")

// Or from a local sentence-transformers snapshot directory
// (auto-detects config.json, vocab.txt / tokenizer.json, model.safetensors,
// and the optional 2_Dense/ projection head):
val local: EmbeddingModel = BertEmbeddingModel.fromSafeTensors(Path.of("/models/mdbr-leaf-mt"))

// Single text — convenience overload.
val vector: FloatArray = model.embed("The quick brown fox")
println("dim=${vector.size}")

// Batch — the response preserves request order.
val vectors: List<FloatArray> = model.embed(listOf(
    "Cats are mammals.",
    "The Eiffel Tower is in Paris.",
))

The runtime already applies mean pooling over token embeddings and L2 normalization internally, so cosine similarity reduces to a dot product:

fun cosine(a: FloatArray, b: FloatArray): Float {
    require(a.size == b.size)
    var dot = 0f
    for (i in a.indices) dot += a[i] * b[i]
    return dot   // already L2-normalised; no division needed
}

From Java

BertEmbeddingModel is @JvmStatic throughout, and KBertJava offers a smaller session-style surface for pure-Java consumers:

import sk.ainet.llm.api.EmbeddingModel;
import sk.ainet.llm.providers.BertEmbeddingModel;

EmbeddingModel model = BertEmbeddingModel.fromHuggingFace("MongoDB/mdbr-leaf-mt");
float[] vector = model.embed("The quick brown fox");

// Or the session facade over a local snapshot:
import sk.ainet.models.bert.java.KBertJava;
import sk.ainet.models.bert.java.KBertSession;

try (KBertSession session = KBertJava.loadSafeTensors(Path.of("/models/mdbr-leaf-mt"))) {
    float[] v = session.encode("The quick brown fox");
    float sim = session.similarity("query text", "document text");
}

Verifying it Runs

The smoke harness includes a kbert entry — see Running Smoke Tests:

./tests/smoke/smoke-test.sh

For the BERT entry, the script computes embeddings for the prompt and the document and prints the cosine similarity.

What’s Next

Getting Started with LEAF — a model-focused walkthrough with a complete semantic-search example.
BERT Completely Defined in the DSL — how the encoder behind this API is defined, executed, and exported.
How embeddings work — pooling, normalization, dimensionality — the deeper "why".
Getting Started for Java Developers — the analogous Java surface for chat / tool calling.