Llamatik exposes a small set of runtime generation parameters.

LlamaBridge.updateGenerateParams(
  temperature = 0.7f,
  maxTokens = 256,
  topP = 0.95f,
  topK = 40,
  repeatPenalty = 1.1f,
)

What they do (quick intuition)#

  • temperature: higher → more random; lower → more deterministic
  • maxTokens: hard cap for output length
  • topP / topK: sampling constraints that trade creativity vs consistency
  • repeatPenalty: discourages repeating the same phrases

If you’re tuning, change one parameter at a time and test on a fixed prompt set.