Retrieval Augmented Generation (RAG) is an architecture that combines information retrieval from unstructured data with text generation to improve the accuracy of AI-driven tasks, such as question answering.
The RAG process follows a specific cycle to ensure the AI provides accurate and context-aware answers:

- Data Ingestion
- Retrieval
- Generation
The Ingestion stage involves loading documents from multiple sources and configuring how the system processes them.
You can load data in various formats from different sources. The system also considers other data sources generated by consumers during this phase.
You divide documents into smaller organizational fragments called "chunks" to enable efficient segmentation. This process integrates with the Index Profile to ensure the data is aligned with the specific requirements of the index.
The default chunking strategy is:
- chunkSize: 1000 characters.
- chunkOverlap: 100 characters.
During the Retrieval stage, you access the ingested information stored in a vector database. This stage relies on two main components:
- Embeddings: These are numeric arrays that capture the contextual essence of your documents and queries.
- Vector Store: This component uses embeddings and metadata to connect with providers and distance metrics. It ensures you retrieve the data chunks most relevant to your specific query.
The Generation stage focuses on producing relevant responses based on your RAG Assistant configuration.
You define the search strategy by configuring the following elements:
- Prompts: These act as guides to contextualize the responses.
- LLMs: The selected model ensures the generated content is consistent and relevant.
- Parameters: You can add variable-based adjustments and filters to customize the responses to your specific needs.
You interact with the RAG Assistant through the Chat API or the Workspace. This interface facilitates communication between you and the Assistant, providing efficient answers to your queries.