Skip to main content

Introduction to RAG

Retrieval Augmented Generation of AI large language models, referred to as RAG technology, is an important application scenario for the current usage of AI technology.

RAG technology solves the problem of incomplete and outdated information in large language models. After the large language model training is completed, the information contained in it no longer changes. If the materials used for large language model training do not contain certain aspects of information, when generating content, some confusing content will be generated, and it is easy to answer questions incorrectly. For internal enterprise applications, this problem almost certainly exists, because the base model cannot obtain private data within the enterprise during training. Similarly, the data generated after model training cannot be used by the model.

For this problem, one solution is to fine-tune the base model and integrate the internal data. But fine-tuning the model is not a simple matter and requires professional knowledge, a lot of resources and time. Fine-tuning the model won't solve the problem of not being able to use new data. The data of some applications is updated very frequently, and you cannot fine-tune a large model every time new data is generated or modified.

The second solution is function calling, which provides customized functions to the large language model and connects the model to the API of the external system.

The last solution is to use Retrieval Augmented Generation technology. The principle behind this technology is actually very simple. Since the large language model is missing some information, then providing this information as part of the input to the model will naturally solve the problem. Limited by the context window size of models, the context information provided cannot be too long. This requires finding the content most relevant to the input as context to get more accurate results.

RAG

In the RAG implementation, there will be a vector database to save all reference documents. For an input, documents that are semantically similar to the input text are first queried from the vector database, and the documents are combined as part of the prompt, assembled with the original input text. This will be the final input sent to the model. Since the prompt already contains enough contextual information, the model can give reasonable output based on this information.

In order to find documents with similar content, text embedding technology is needed. Text embedding technology converts a piece of text into a vector. This is also the reason for using vector databases. After all reference documents are embedded, the resulting vectors are stored in the vector database. The input query text undergoes the same embedding process to obtain a vector. The similarity between the query text and the reference document is converted into the similarity between the corresponding vectors. Similarity can be determined by calculating the distance between vectors. The shorter the distance between two vectors, the more similar the two corresponding documents are. You don’t need to pay attention to the specific distance calculation details when implementing the application. You can just use the API provided by the vector database directly.

The above RAG process does not include the import of documents. In actual applications, it is often necessary to import data from enterprise systems into vector databases. If the data of the system changes, the relevant changes also need to be synchronized to the vector database.