Hi, I am trying to follow the finetuning tutorial for question answering here. All I can do as of now is load the model in 4-bit using the bnb_config. I’m trying out Flan T5:
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, TrainingArguments, BitsAndBytesConfig
from datasets import load_dataset
from peft import LoraConfig
from trl import SFTTrainer
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype='bfloat16',
bnb_4bit_quant_type='nf4',
bnb_4bit_use_double_quant=True
)
training_args = TrainingArguments(
output_dir="my_awesome_qa_model",
evaluation_strategy="epoch",
learning_rate=2e-5,
logging_steps=50,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=3,
max_steps=1,
weight_decay=0.01,
)
model = AutoModelForQuestionAnswering.from_pretrained(
"google/flan-t5-base",
quantization_config=bnb_config,
device_map={"": 0},
low_cpu_mem_usage=True,
torch_dtype=torch.bfloat16,
return_dict=True,
use_cache=True
)
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")
I tried to run SFTTrainer:
dataset = load_dataset('json', data_files=DATA_PATH, split='train')
peft_config = LoraConfig(
r=64,
lora_alpha=16,
lora_dropout=0.05,
target_modules=['q', 'k', 'v', 'o', 'wi_0', 'wi_1'],
bias="none",
task_type="QUESTION_ANS",
)
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
eval_dataset=dataset,
peft_config=peft_config,
dataset_text_field='prompt',
max_seq_length=128,
tokenizer=tokenizer,
args=training_args,
packing=False,
)
trainer.train()
and it led to this error:
TypeError: T5ForQuestionAnswering.forward() got an unexpected keyword argument 'labels'
My question is: can *ForQuestionAnswering models be finetuned using PEFT/QLoRA with SFTTrainer? I have read in a github issue that SFTTrainer is only for language modeling, and my thinking is that question answering is a language modeling task, so it must be possible. Is my thinking correct? If so, how do I finetune *ForQA models with SFTTrainer?