Hi everyone,
I’m recently fine-tuning the Falcon-7b sharded model on the pubmedQA dataset given by Hugging face. I’m using prefix tuning for this one. However, at the last step when I call the trainer, it pops the following error
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-17-3435b262f1ae> in <cell line: 1>()
----> 1 trainer.train()
16 frames
~/.cache/huggingface/modules/transformers_modules/vilsonrodrigues/falcon-7b-sharded/5206b4cb8d6be73aa3d0d52360009437d196f28f/modeling_falcon.py in <genexpr>(.0)
625 return tuple(
626 (
--> 627 layer_past[0].view(batch_size_times_num_heads, kv_length, head_dim),
628 layer_past[1].view(batch_size_times_num_heads, kv_length, head_dim),
629 )
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
This is my trainer object:
from trl import SFTTrainer
trainer = SFTTrainer(
model=model,
args=training_args,
peft_config=peft_config,
tokenizer=tokenizer,
dataset_text_field="text",
train_dataset=dataset_train,
eval_dataset=dataset_val,
)
This is my training arguments:
from transformers import TrainingArguments
#Arguments needed for training process
output_dir = "falcon-7b-sharded"
per_device_train_batch_size = 5
gradient_accumulation_steps = 4
device = "cuda"
num_epochs = 10
#Torch adamw optimization algorithm is used in QLoRA
optim = "adamw_torch"
save_steps = 10
logging_steps = 10
learning_rate = 5e-5
max_grad_norm = 0.3
max_steps = 200
warmup_ratio = 0
lr_scheduler_type = "linear"
training_args = TrainingArguments(
output_dir=output_dir,
per_device_train_batch_size=per_device_train_batch_size,
num_train_epochs=num_epochs,
gradient_accumulation_steps=gradient_accumulation_steps,
optim=optim,
save_steps=save_steps,
logging_steps=logging_steps,
learning_rate=learning_rate,
fp16=True,
max_grad_norm=max_grad_norm,
max_steps=max_steps,
warmup_ratio=warmup_ratio,
group_by_length=True,
report_to=None,
lr_scheduler_type=lr_scheduler_type
)
model = model.to(device)
As far as I concern, this error lies in the hidden code behind the pre-trained model, which I can’t access to modify. Does anyone has any solutions for this?
Thank you