When a general-purpose language model does not meet your requirements, you have two primary options: fine-tuning (training the model on your data) or retrieval-augmented generation (giving the model access to your data at inference time). These are not interchangeable.
When to use RAG
- Your knowledge base changes frequently — product documentation, pricing, policies.
- You need to cite sources — RAG returns the source documents alongside the answer.
- You have a large, heterogeneous knowledge base — more data than fits in a fine-tuning dataset.
- Speed-to-deployment matters — RAG can be operational in days.
When to fine-tune
- You need the model to adopt a specific style, tone, or format consistently.
- You are performing a narrow, well-defined task at high volume — classification, extraction, structured generation.
- Latency matters and you need to reduce prompt length — fine-tuned models can perform tasks with shorter prompts.
The hybrid approach
Most production systems use both. Fine-tune for style and task format; use RAG for dynamic knowledge. The fine-tuned model knows how to respond; RAG tells it what to respond about.
Neither technique eliminates the need for evaluation. Build your evals first.
