Llamatik exposes a small set of runtime generation parameters through updateGenerateParams(...).
LlamaBridge.updateGenerateParams(
temperature = 0.7f,
maxTokens = 256,
topP = 0.95f,
topK = 40,
repeatPenalty = 1.1f,
)What each parameter does#
Temperature#
Controls randomness.
- lower values: more deterministic
- higher values: more varied and creative
maxTokens#
Sets the maximum number of tokens the model may generate. Use this to control response length and latency.
topP#
Nucleus sampling threshold.
The model samples from the smallest set of likely tokens whose cumulative probability reaches topP.
topK#
Limits sampling to the K most likely next tokens.
repeatPenalty#
Discourages repeated phrases and loops. This is often useful for chat, summaries, and structured outputs.
Tuning advice#
- Start from moderate values and test on a fixed prompt set.
- Change one parameter at a time.
- For extraction or JSON, lean toward lower temperature.
- For brainstorming or creative tasks, slightly higher temperature can help.