当前位置：首页 > news >正文

Does a vector database maintain pre-vector chunked data for RAG systems?

news 2025/9/15 2:47:24

题意：一个向量数据库是否为RAG系统维护预向量化分块数据？

问题背景：

I believe that when using an LLM with a Retrieval-Augmented Generation (RAG) approach, the results retrieved from a vector search must ultimately be presented in text form. Otherwise, the prompt would just contain a series of numbers (vectors), which would be meaningless. I assume that the pre-vector chunked data needs to be stored somewhere within the vector database. Is this usually maintained within the vector database itself?

我相信，当使用带有检索增强生成（RAG）方法的大型语言模型（LLM）时，从向量搜索中检索到的结果最终必须以文本形式呈现。否则，提示将仅包含一系列数字（即向量），这将毫无意义。我假设预向量化分块的数据需要存储在向量数据库中的某个位置。这通常是存储在向量数据库本身内部的吗？

问题解决：

In a RAG system, the vector database stores only numbers that represent the text, not the text itself. The actual text is kept in a different place. When you search for something, the system uses these numbers to find the relevant text and then presents it to you.

在RAG系统中，向量数据库仅存储代表文本的数字，而非文本本身。实际的文本被保存在不同的位置。当你搜索某些内容时，系统使用这些数字来找到相关的文本，然后将其呈现给你。

In a typical RAG system, the vector database does not maintain the pre-vector chunked data. Instead, the vector database stores only the vector representations of the text data. The pre-vector chunked data, which includes the original text passages or documents, is usually stored separately in another database or data source. When a retrieval is performed using vectors, the system retrieves the corresponding pre-vector chunked data from this separate source based on the vectors retrieved from the vector database.

在典型的RAG系统中，向量数据库并不维护预向量化分块的数据。相反，向量数据库仅存储文本数据的向量表示。预向量化分块的数据，包括原始文本段落或文档，通常被单独存储在另一个数据库或数据源中。当使用向量进行检索时，系统会基于从向量数据库中检索到的向量，从这个单独的源中检索相应的预向量化分块数据。