Gemma 3 - RAG - PDF

RAG with Gemma 3: Intelligent Document Analysis

This project combines the power of Google’s Gemma 3 model with the RAG (Retrieval-Augmented Generation) architecture to create an intelligent document analysis solution. The tool allows users to upload, process, and query PDF documents, generating precise and contextually relevant answers based on the extracted content.

Features

PDF Loading: Extracts text from PDF documents and prepares it for analysis.

Semantic Retrieval: Uses embeddings and FAISS to find relevant sections of the document.

Answer Generation: Utilizes the Gemma 3 model to generate contextually accurate responses.

Intelligent Querying: Enables users to ask complex questions and receive clear answers.

Technologies Used

Gemma 3: Google's language model for generating responses.

LangChain: Framework for integrating document processing pipelines.

FAISS: Vector database for efficient semantic searches.

PyPDF: Library for loading and processing PDFs.

Hugging Face Transformers: For embeddings and natural language processing.

This project is a powerful example of how RAG and Gemma 3 can be combined to create a robust document analysis tool. Feel free to share your thoughts, suggestions, or questions below!

# Importar as bibliotecas necessárias
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
import torch

# Carregar o PDF
loader = PyPDFLoader("caminho/para/seu/arquivo.pdf")
documents = loader.load()

# Criar embeddings e banco vetorial
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vectorstore = FAISS.from_documents(documents, embeddings)

# Configurar o Gemma-3 como LLM
model_name = 'gemma-3-4b-it'  # Escolha o modelo desejado
model_id = f"google/{model_name}"

model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id, device_map="auto", torch_dtype=torch.bfloat16,
).eval()

processor = AutoProcessor.from_pretrained(model_id)

def get_model_response(prompt: str, model, processor):
    # Prepare the messages for the model.
    messages = [
        {
            "role": "system",
            "content": [{"type": "text", "text": "Você é um assistente útil. Responda apenas com a resposta à pergunta feita e evite usar texto adicional em sua resposta como 'aqui está a resposta.'."}]
        },
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt}
            ]
        }
    ]

    # Tokenize inputs and prepare for the model.
    inputs = processor.apply_chat_template(
        messages, add_generation_prompt=True, tokenize=True,
        return_dict=True, return_tensors="pt"
    ).to(model.device, dtype=torch.bfloat16)

    input_len = inputs["input_ids"].shape[-1]

    # Generate response from the model.
    with torch.inference_mode():
        generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
        generation = generation[0][input_len:]

    # Decode the response.
    response = processor.decode(generation, skip_special_tokens=True)
    return response

# Configurar a cadeia de consulta
def qa_chain(query):
    # Retrieve relevant documents
    docs = vectorstore.similarity_search(query, k=3)
    context = " ".join([doc.page_content for doc in docs])

    # Prepare the prompt with context
    prompt = f"Com base no seguinte contexto: {context}\n\nResponda: {query}"

    # Get response from Gemma-3
    response = get_model_response(prompt, model, processor)
    return response

# Fazer consultas
query = "Qual é o tema principal do documento?"
response = qa_chain(query)
print(response)
2 Likes

Much appreciated! I really want to utilize pdf doc for my doc. Is it possible to finetune the Gemma 3 -RAG ?

1 Like

The official version of Hugging Face’s Transformers for Gemma 3 hasn’t been released yet, so it’s a little difficult to use, but if you wait a little while, you should be able to fine-tune it easily.