Guide on Constructing a RAG with Qwen3

In a recent development, a Retrieval Augmented Generation (RAG) system has been built using the Qwen3 models on a website with approximately 40 blogs. This article primarily focuses on the 'Qwen3-Instruct-2507' 4B variant, one of the three sizes available in the Qwen3 models.

The Qwen3 models, developed by Alibaba, were launched a few months ago and are now open-source. They are available on popular platforms such as Hugging Face and Kaggle. The Qwen3-4B-Instruct-2507 model, with 4 billion parameters, serves as the backbone of the RAG system.

The script for building the RAG includes several steps. First, data is downloaded, followed by installing the necessary requirements. The data is then unzipped, and the script is run.

The heart of the Retriever in the RAG system is the Qwen3-Embedding-0.6B model. This model is used to convert text to dense vector representations, which are essential for creating a vector store using FAISS. With FAISS, a top-15 list of documents is retrieved based on similarity with a query.

A log file, 'rag_retrieval_log.txt', provides additional information about the documents retrieved and their similarity scores with the query and the reranker scores. The 'Qwen3-Reranker-0.6B' model is then used to score and order the retrieved chunks against a query, ultimately resulting in the top-3 documents.

The final output from the Retrieval Augmented Generator is obtained by passing the top-3 documents to the instruct model. The content is chunked into pieces of size 800 with an overlap of 100 to maintain context relevancy in consecutive documents, ensuring a smooth and coherent output.

Mounish V, a graduate of Vellore Institute of Technology, is the author of this article. Currently working as a Data Science Trainee, Mounish has a keen interest in Deep Learning and Generative AI. He has been instrumental in the development and implementation of the RAG system on the website.

It's worth noting that the Qwen3 models support 119 languages and dialects, making them a versatile tool for various applications. The Qwen3-Embedding-0.6B model, in particular, plays a crucial role in this diversity, as it is used to convert text to dense vector representations, regardless of the language.

The script uses the PYPDF2 library to load the content of articles in PDF format, and a function is defined to read blog content in both .txt and .pdf formats. This flexibility ensures that the RAG system can handle a wide range of content types.

In conclusion, the Qwen3 models have proven to be a powerful tool in the development of the RAG system. Their open-source nature, versatility, and ability to handle various languages make them an attractive choice for many applications in the field of Deep Learning and Generative AI.