Llamatik uses GGUF models (the standard file format used by llama.cpp).

Picking a model#

  • Choose a model that has a GGUF release.
  • Choose a quantization that matches your device constraints:
    • Smaller quantizations run faster and use less memory, with some quality tradeoff.
    • Larger quantizations can be higher quality but require more RAM.

Where to get GGUF models#

Common places include Hugging Face model pages that provide .gguf files.

  • Tiny/mini models (very small) to validate that your build + loading works.
  • Then move to your target model size/quality once everything is stable.

Shipping models#

Models can be large. Consider:

  • Android: ship a smaller default model, or download on first run.
  • iOS: Apple size limits apply; use on-demand resources or download after install.
  • Desktop: bundle or download, depending on your distribution model.