LlamaBridge is the main multiplatform entry point for GGUF-based language models in Llamatik.
It exposes a compact API for:
- text generation
- retrieval-friendly embeddings
- streaming tokens
- JSON and JSON Schema constrained output
- generation parameter tuning
- KV cache reuse and session persistence
Under the hood, the actual implementation is platform-specific, but the public API stays the same.
Platform support#
Current support in the library code:
- Android: supported
- iOS: supported
- JVM / Desktop: supported
- WASM: text generation is supported, but embeddings and KV session persistence are currently not available, and synchronous
generateContinue()is not supported in worker-only mode
Model path helper#
fun getModelPath(modelFileName: String): StringThis returns a platform-usable path for a model file.
Typical usage:
val modelPath = LlamaBridge.getModelPath("qwen2.5-0.5b-instruct-q4_k_m.gguf")Use this when your app stores or ships models differently across platforms and you want one common entry point before initialization.
Embeddings API#
fun initEmbedModel(modelPath: String): Boolean
fun embed(input: String): FloatArrayinitEmbedModel(modelPath)#
Loads a model for embeddings.
Returns true on success.
val ok = LlamaBridge.initEmbedModel(modelPath)
check(ok) { "Failed to initialize embedding model" }embed(input)#
Computes a vector representation of the input text. This is useful for:
- semantic search
- document retrieval
- clustering
- reranking pipelines
- PDF RAG and other retrieval workflows
val vector = LlamaBridge.embed("How do coroutines work?")
println("Embedding size = ${vector.size}")Important notes:
- Use the same embedding model for all vectors inside one index.
- Initialize the embedding model before calling
embed. - On WASM, embeddings are currently not implemented.
Generation model initialization#
fun initGenerateModel(modelPath: String): BooleanLoads a generation model and prepares it for inference.
val ok = LlamaBridge.initGenerateModel(modelPath)
check(ok) { "Failed to initialize generation model" }Call this before any generation method.
One-shot generation#
fun generate(prompt: String): StringGenerates a full response from a single prompt. This is the simplest API and a good default when each request is independent.
val answer = LlamaBridge.generate("Explain Kotlin Multiplatform in one paragraph.")
println(answer)generate(...) starts a fresh generation flow. If you are building a chat interface and want continuity across turns, prefer the session APIs described below.
Context-aware generation#
fun generateWithContext(systemPrompt: String, contextBlock: String, userPrompt: String): StringUse this when you want clearer structure between instructions, external context, and user input.
val result = LlamaBridge.generateWithContext(
systemPrompt = "You are a concise technical assistant.",
contextBlock = "Project: Llamatik is a Kotlin Multiplatform library for on-device AI.",
userPrompt = "Write a short summary for the README."
)This is useful for:
- RAG results
- chat prompts with a fixed system role
- structured prompting where you want deterministic sections
JSON generation#
fun generateJson(prompt: String, jsonSchema: String? = null): String
fun generateJsonWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
jsonSchema: String? = null
): StringThese methods instruct the model to return JSON output. If you provide a JSON Schema, generation is additionally constrained to that schema.
val schema = """
{
"type": "object",
"additionalProperties": false,
"properties": {
"title": { "type": "string" },
"priority": { "type": "integer" }
},
"required": ["title", "priority"]
}
""".trimIndent()
val json = LlamaBridge.generateJson(
prompt = "Return one task object.",
jsonSchema = schema
)This is especially useful when parsing model output into Kotlin data classes.
Streaming APIs#
fun generateStream(prompt: String, callback: GenStream)
fun generateStreamWithContext(systemPrompt: String, contextBlock: String, userPrompt: String, callback: GenStream)
fun generateJsonStream(prompt: String, jsonSchema: String? = null, callback: GenStream)
fun generateJsonStreamWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
jsonSchema: String? = null,
callback: GenStream
)These are the streaming equivalents of the one-shot APIs above. They are the best choice for chat interfaces because the UI can render tokens incrementally.
LlamaBridge.generateStream(
prompt = "Write a haiku about local AI.",
callback = object : GenStream {
override fun onDelta(text: String) = print(text)
override fun onComplete() = println("\nDone")
override fun onError(message: String) = println("Error: $message")
}
)Lambda-friendly streaming helper#
fun generateWithContextStream(
system: String,
context: String,
user: String,
onDelta: (String) -> Unit,
onDone: () -> Unit,
onError: (String) -> Unit
)This is a convenience wrapper around the callback-based context streaming API when you prefer lambdas.
LlamaBridge.generateWithContextStream(
system = "You are helpful.",
context = "Product: Llamatik",
user = "Write a short tagline.",
onDelta = { print(it) },
onDone = { println("\nDone") },
onError = { println("Error: $it") }
)KV cache and sessions#
fun sessionReset(): Boolean
fun sessionSave(path: String): Boolean
fun sessionLoad(path: String): Boolean
fun generateContinue(prompt: String): StringThese methods are designed for multi-turn interactions.
generateContinue(prompt)#
Continues generation using the current KV cache instead of starting from scratch.
LlamaBridge.initGenerateModel(modelPath)
val first = LlamaBridge.generate("Explain Kotlin coroutines")
val second = LlamaBridge.generateContinue("Now show a short example")sessionSave(path) and sessionLoad(path)#
Persist the current KV/session state to disk and restore it later.
LlamaBridge.sessionSave(sessionPath)
LlamaBridge.initGenerateModel(modelPath)
LlamaBridge.sessionLoad(sessionPath)
val resumed = LlamaBridge.generateContinue("Continue the explanation")sessionReset()#
Clears the current session while keeping the model loaded. This is useful when you want to start a new conversation without paying the full model load cost again.
Important notes:
- Session files are tied to the same model and runtime assumptions.
- On WASM, session persistence is currently not implemented.
- If no active session exists yet,
generateContinue()behaves like a fresh generation.
Runtime controls#
fun nativeCancelGenerate()
fun updateGenerateParams(
temperature: Float,
maxTokens: Int,
topP: Float,
topK: Int,
repeatPenalty: Float,
)
fun shutdown()nativeCancelGenerate()#
Requests cancellation of the current generation. Useful when the user taps Stop or leaves the screen.
updateGenerateParams(...)#
Updates the runtime generation parameters used by the native backend.
LlamaBridge.updateGenerateParams(
temperature = 0.7f,
maxTokens = 256,
topP = 0.95f,
topK = 40,
repeatPenalty = 1.1f,
)shutdown()#
Releases native resources when you are done with the bridge. In long-lived apps this is usually called when the feature or process is shutting down, not after every request.