KV cache reuse is one of the most important performance features for chat-like experiences. Instead of rebuilding the full prompt state from scratch for every turn, the model can continue from what it has already processed.

Core methods#

LlamaBridge.sessionReset()
LlamaBridge.sessionSave(path)
LlamaBridge.sessionLoad(path)
LlamaBridge.generateContinue(prompt)

Fresh turn vs continued turn#

Fresh generation#

val answer = LlamaBridge.generate("Explain Kotlin coroutines.")

Use this when the request is independent and you do not care about previous turns.

Continued generation#

val answer2 = LlamaBridge.generateContinue("Now show a short example.")

Use this when you want the next prompt to continue from the current session state.

Typical chat flow#

LlamaBridge.initGenerateModel(modelPath)

val first = LlamaBridge.generate("Explain Kotlin coroutines in simple words.")
val second = LlamaBridge.generateContinue("Now give one practical example.")
val third = LlamaBridge.generateContinue("Summarize both answers in 3 bullets.")

Saving a session#

val sessionPath = "/tmp/demo.session"
val saved = LlamaBridge.sessionSave(sessionPath)
check(saved)

This lets you persist the current conversation state across app restarts or later reuse.

Loading a session#

LlamaBridge.initGenerateModel(modelPath)

val loaded = LlamaBridge.sessionLoad(sessionPath)
check(loaded)

val resumed = LlamaBridge.generateContinue("Continue from where we stopped.")

Resetting the session#

LlamaBridge.sessionReset()

This is useful when the user starts a new conversation but you want to keep the model loaded.

Important limitations#

  • Session persistence is currently unavailable on WASM.
  • Session files should be used with the same model setup that created them.
  • If no active session exists, generateContinue(...) falls back to fresh generation behavior.

When this feature is worth using#

Use KV sessions when:

  • you are building a chat app
  • the user asks follow-up questions
  • you want faster continuation across turns

Skip it when:

  • every request is independent
  • you deliberately want each run to start with a clean state