DocVQA test on Qwen-2.5VL-3B lora not happening

sourabhY · February 16, 2025, 12:02pm

I want to test my peft lora on DocVQA (Qwen-2.5-VL-3B) but I am unable to do so. here is the code to reproduce error.

from peft import get_peft_model, PeftModel, LoraConfig, TaskType
from transformers import AutoProcessor, AutoModelForImageTextToText , BitsAndBytesConfig
from trl import SFTConfig, SFTTrainer
import torch

device = “cuda” if torch.cuda.is_available() else “cpu”
model_id = “Qwen/Qwen2.5-VL-3B-Instruct”

min_pixels = 2242828
max_pixels = 2242828
processor = AutoProcessor.from_pretrained(model_id, min_pixels=min_pixels, max_pixels=max_pixels)
processor.tokenizer.padding_side = “right”

base_model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.float16,
load_in_8bit=True, # Load in 8-bit precision
device_map=“auto”
)

base_model.enable_input_require_grads()

model = PeftModel.from_pretrained(base_model, “./docmat/iter-6”, is_trainable=False)
model.print_trainable_parameters()

from datasets import load_dataset
from tqdm import tqdm
import torch
from collections import defaultdict

def evaluate_docvqa(model, processor, split=“test”, num_samples=None):
dataset = load_dataset(“lmms-lab/DocVQA”, “DocVQA”, split=f"{split}[:1]")
if num_samples:
dataset = dataset.select(range(min(num_samples, len(dataset))))

correct = 0
results = defaultdict(list)

model.eval()
with torch.no_grad():
    for sample in tqdm(dataset):
        # Add the <image> token to the question text
        text_with_image = f"<image>{sample['question']}"
        
        # Process the image and text
        inputs = processor(
            images=sample['image'], 
            text=text_with_image,  # Include <image> token
            return_tensors="pt",
            padding=True,
            truncation=True
        ).to(device)
        
        # Generate the output
        outputs = model.generate(**inputs, max_length=50, num_beams=5, early_stopping=True)
        predicted_answer = processor.batch_decode(outputs, skip_special_tokens=True)[0]
        
        # Check if the predicted answer matches any ground truth answer
        is_correct = any(predicted_answer.lower() == ans.lower() for ans in sample['answers'])
        correct += int(is_correct)
        
        # Store results
        results['questions'].append(sample['question'])
        results['predictions'].append(predicted_answer)
        results['ground_truth'].append(sample['answers'])
        results['correct'].append(is_correct)

accuracy = correct / len(dataset)
return accuracy, results

accuracy, results = evaluate_docvqa(model, processor, num_samples=100)
print(f"Test Accuracy: {accuracy:.4f}")

John6666 · February 16, 2025, 8:19pm

There are various problems, but the biggest problem is that the answers are always None. This makes it impossible to evaluate.

        for sample in tqdm(dataset):
            print(sample)

'answers': None,

sourabhY · February 17, 2025, 2:02am

Also i went on to check some places and i saw that docVQA test dataset you have to send your predictions to them and they will evaluate the scores and everything.

But anyways thanks for your input and could you please tell like how can i handle data for training / evaluation?
Is there some fixed way or are there any resources which i can look. It would be really helpfull.

John6666 · February 17, 2025, 4:57am

how can i handle data for training / evaluation?

In this case, the dataset itself has a test and validation, so it’s a little special, but I think there are many cases like the following. Also, your Trainer usage is not fundamentally wrong, but it is quite difficult to properly do Evaluation in VLM or LLM training (because it is difficult to evaluate objectively numerically).

Personally, I recommend practicing first with the Smol Course. This will help you remember everything.

sourabhY · February 17, 2025, 5:26am

Thanks ill go through them and as i said i just had to change my dataset from train to validate and slight modifications in my code and it worked.

John6666 · February 17, 2025, 5:35am

Congratulations! I also found this. I think you can evaluate the performance of VLM after it is completed by looking for things like this or leaderboards.

system · February 17, 2025, 5:35pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DocVQA test dataset evaluation on qwen2.5-VL-3B Intermediate	0	60	February 16, 2025
Llama 2 7B fine-tuned with IA3 errors when performing inference 🤗Transformers	2	645	January 16, 2024
Eval with trainer not running with PEFT LoRA model 🤗Transformers	1	1590	September 10, 2023
Training CodeLlama2 using LORA doesnt save any memory Beginners	0	697	November 23, 2023
Fine tune and then successfully AWQ quantize Beginners	3	2836	February 16, 2024

DocVQA test on Qwen-2.5VL-3B lora not happening

base_model.enable_input_require_grads()

Related topics