KV cache and sessions • Llamatik Documentation

KV cache reuse is one of the most important performance features for chat-like experiences. Instead of rebuilding the full prompt state from scratch for every turn, the model can continue from what it has already processed.

Core methods#

LlamaBridge.sessionReset()
LlamaBridge.sessionSave(path)
LlamaBridge.sessionLoad(path)
LlamaBridge.generateContinue(prompt)

Fresh turn vs continued turn#

Fresh generation#

val answer = LlamaBridge.generate("Explain Kotlin coroutines.")

Use this when the request is independent and you do not care about previous turns.

Continued generation#

val answer2 = LlamaBridge.generateContinue("Now show a short example.")

Use this when you want the next prompt to continue from the current session state.

Typical chat flow#

LlamaBridge.initGenerateModel(modelPath)

val first = LlamaBridge.generate("Explain Kotlin coroutines in simple words.")
val second = LlamaBridge.generateContinue("Now give one practical example.")
val third = LlamaBridge.generateContinue("Summarize both answers in 3 bullets.")

Saving a session#

val sessionPath = "/tmp/demo.session"
val saved = LlamaBridge.sessionSave(sessionPath)
check(saved)

This lets you persist the current conversation state across app restarts or later reuse.

Loading a session#

LlamaBridge.initGenerateModel(modelPath)

val loaded = LlamaBridge.sessionLoad(sessionPath)
check(loaded)

val resumed = LlamaBridge.generateContinue("Continue from where we stopped.")

Resetting the session#

LlamaBridge.sessionReset()

This is useful when the user starts a new conversation but you want to keep the model loaded.

Important limitations#

Session persistence is currently unavailable on WASM.
Session files should be used with the same model setup that created them.
If no active session exists, generateContinue(...) falls back to fresh generation behavior.

When this feature is worth using#

Use KV sessions when:

you are building a chat app
the user asks follow-up questions
you want faster continuation across turns

Skip it when:

every request is independent
you deliberately want each run to start with a clean state