Streaming lets your app consume generation output incrementally. This usually feels much better than waiting for the full answer, especially on-device where responses may take time.

Plain streaming#

LlamaBridge.generateStream(
    prompt = "Stream a short poem.",
    callback = object : GenStream {
        override fun onDelta(text: String) = print(text)
        override fun onComplete() = println("\nDone")
        override fun onError(message: String) = println("Error: $message")
    }
)

Streaming with context#

LlamaBridge.generateStreamWithContext(
    systemPrompt = "You are concise.",
    contextBlock = "Topic: on-device LLMs.",
    userPrompt = "Give me 3 bullet points.",
    callback = object : GenStream {
        override fun onDelta(text: String) = print(text)
        override fun onComplete() = println("\nDone")
        override fun onError(message: String) = println("Error: $message")
    }
)

Streaming JSON#

LlamaBridge.generateJsonStream(
    prompt = "Return a JSON object for one task.",
    jsonSchema = schema,
    callback = object : GenStream {
        override fun onDelta(text: String) = print(text)
        override fun onComplete() = println("\nDone")
        override fun onError(message: String) = println("Error: $message")
    }
)

Lifecycle tips#

  • Cancel generation when the user navigates away.
  • Buffer deltas and render efficiently.
  • Do not start multiple heavy streams on the same small device unless you have tested the behavior carefully.
  • On WASM, prefer streaming APIs when running in worker-only mode.