Reference:Datawhale wow agent day06
What is RAG?#
We retrieved the following answer from Doubao:
In the field of natural language processing, language models are typically trained on large-scale text corpora to learn patterns and semantic information of the language. However, these models have issues such as outdated knowledge and the potential for generating content that may contain erroneous information (such as hallucination phenomena). The emergence of RAG technology aims to address these problems.
It combines information retrieval and text generation. Simply put, before generating an answer, it first retrieves relevant information from external knowledge bases (such as document databases, knowledge graphs, etc.), and then uses this retrieved information to assist the language model in generating more accurate and targeted answers. It's like consulting relevant books and materials before organizing a better answer to a question.
Workflow:
- Retrieval Phase
When a user question is received (for example, in a Q&A system), the system uses some retrieval mechanism (such as vector space models, inverted indexes, etc.) to search for text fragments related to the question in the knowledge base. For instance, if the question is "How to treat a cold?", the system will search the medical knowledge base for document paragraphs containing information related to "cold treatment methods."
These retrieved text fragments can be complete sentences, paragraphs, or even parts of multiple documents, which are used as reference information for generating answers. - Generation Phase
The retrieved information and the original question are input into a language generation model (such as a generative language model based on the Transformer architecture). The language model generates the final answer based on these inputs, combining its learned language knowledge. For example, the language model analyzes and integrates the retrieved text related to cold treatment methods to generate an answer such as "A cold can be treated by drinking more water, resting, and taking appropriate medication."
Building an Index#
# Read from specified file, input as List
from llama_index.core import SimpleDirectoryReader, Document
documents = SimpleDirectoryReader(input_files=['../docs/Q&A_Manual.txt']).load_data()
# Build nodes
from llama_index.core.node_parser import SentenceSplitter
transformations = [SentenceSplitter(chunk_size=512)]
from llama_index.core.ingestion.pipeline import run_transformations
nodes = run_transformations(documents, transformations=transformations)
# Build index
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss
from llama_index.core import StorageContext, VectorStoreIndex
emb = embedding.get_text_embedding("Hello there")
vector_store = FaissVectorStore(faiss_index=faiss.IndexFlatL2(len(emb)))
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(
nodes=nodes,
storage_context=storage_context,
embed_model=embedding,
)
Code Explanation: The purpose of this code is to convert text data into vector representations and build a vector index that can be used for fast retrieval. This way, efficient text search and similarity matching can be achieved.
- Use SimpleDirectoryReader to read data from the specified path docs\Q&A_Manual.txt. The load_data() method loads the file content as a list of Document objects.
- Use SentenceSplitter to split the text into smaller chunks, each chunk being 512 characters in size. transformations is a list containing the splitting rules.
- The run_transformations function applies the previously defined transformations to convert documents into nodes, which will be used to build the index.
- Use FaissVectorStore and the faiss library to create a vector store. IndexFlatL2 is an index based on L2 distance.
StorageContext is used to manage the storage context, and the from_defaults method creates the context using default settings.
VectorStoreIndex builds the index using nodes, storage context, and embedding model.
Building a Q&A Engine#
# Build retriever
from llama_index.core.retrievers import VectorIndexRetriever
# To customize parameters, a parameter dictionary can be constructed
kwargs = {'similarity_top_k': 5, 'index': index, 'dimensions': len(emb)} # Necessary parameters
retriever = VectorIndexRetriever(**kwargs)
# Build synthesizer
from llama_index.core.response_synthesizers import get_response_synthesizer
response_synthesizer = get_response_synthesizer(llm=llm, streaming=True)
# Build Q&A engine
from llama_index.core.query_engine import RetrieverQueryEngine
engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
)
Code Explanation: This code snippet's main function is to build a Q&A system that utilizes a vector retriever and response synthesizer to handle and generate conversational responses.
- Import modules:
- Import the VectorIndexRetriever class from the llama_index library to create a vector index retriever.
- Construct a parameter dictionary:
kwargs = {'similarity_top_k': 5, 'index': index, 'dimensions': len(emb)}: Create a dictionary kwargs to store the parameters for the retriever.
similarity_top_k: Set to 5, indicating that the top 5 most similar results will be returned during retrieval.
index: Use the previously constructed index object.
dimensions: Set to len(emb), indicating the dimension of the embedding vector.
Create retriever:
retriever = VectorIndexRetriever(**kwargs): Use the parameter dictionary kwargs to create a VectorIndexRetriever object retriever for performing vector retrieval.
- Import the get_response_synthesizer function from the llama_index library to obtain the response synthesizer.
Create synthesizer: Call the get_response_synthesizer function, passing the parameters llm (language model) and streaming=True, to create a response synthesizer response_synthesizer for generating conversational responses. - Import modules: Import the RetrieverQueryEngine class from the llama_index library to create a Q&A engine.
Create Q&A engine:
Use the previously created retriever and response_synthesizer objects to create a RetrieverQueryEngine object engine for handling Q&A queries.
Test Results#
# Ask a question
question = "What are the applications of Agent AI systems?"
response = engine.query(question)
for text in response.response_gen:
print(text, end="")