When a RAG system finds the wrong document, people often blame embeddings, vector databases, or the model. Sometimes that is fair. More often, the retrieval system is being asked to search a messy library with no librarian.
What causes bad retrieval
- Old documents remain in the index.
- New documents have weak titles or missing metadata.
- Large files are chunked in ways that split context.
- Near-duplicate documents compete with each other.
- The system has no source hierarchy.
- Permissions are applied after retrieval instead of before it.
How to improve it
Start by cleaning the knowledge base. Decide which sources are authoritative. Add useful metadata. Remove stale documents. Create rules for how policies, product sheets, contracts, and notes should be indexed.
Then tune retrieval around real questions from users, not synthetic demo prompts. The system should be judged on whether it finds the source a knowledgeable employee would have used.
RAG quality is knowledge operations quality.
Lucendata builds RAG systems with the source, metadata, permission, and retrieval rules needed for reliable internal answers.