Streaming

Streaming lets your app consume generation output incrementally. This usually feels much better than waiting for the full answer, especially on-device where responses may take time.

Plain streaming#

LlamaBridge.generateStream(
    prompt = "Stream a short poem.",
    callback = object : GenStream {
        override fun onDelta(text: String) = print(text)
        override fun onComplete() = println("\nDone")
        override fun onError(message: String) = println("Error: $message")
    }
)

Streaming with context#

LlamaBridge.generateStreamWithContext(
    systemPrompt = "You are concise.",
    contextBlock = "Topic: on-device LLMs.",
    userPrompt = "Give me 3 bullet points.",
    callback = object : GenStream {
        override fun onDelta(text: String) = print(text)
        override fun onComplete() = println("\nDone")
        override fun onError(message: String) = println("Error: $message")
    }
)

Streaming JSON#

LlamaBridge.generateJsonStream(
    prompt = "Return a JSON object for one task.",
    jsonSchema = schema,
    callback = object : GenStream {
        override fun onDelta(text: String) = print(text)
        override fun onComplete() = println("\nDone")
        override fun onError(message: String) = println("Error: $message")
    }
)

Lifecycle tips#

Cancel generation when the user navigates away.
Buffer deltas and render efficiently.
Do not start multiple heavy streams on the same small device unless you have tested the behavior carefully.
On WASM, prefer streaming APIs when running in worker-only mode.