Falcon-7b sharded model - RuntimeError: view size is not compatible with input tensor's size and stride

Hi everyone,

I’m recently fine-tuning the Falcon-7b sharded model on the pubmedQA dataset given by Hugging face. I’m using prefix tuning for this one. However, at the last step when I call the trainer, it pops the following error

You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-17-3435b262f1ae> in <cell line: 1>()
----> 1 trainer.train()

16 frames
~/.cache/huggingface/modules/transformers_modules/vilsonrodrigues/falcon-7b-sharded/5206b4cb8d6be73aa3d0d52360009437d196f28f/modeling_falcon.py in <genexpr>(.0)
    625         return tuple(
    626             (
--> 627                 layer_past[0].view(batch_size_times_num_heads, kv_length, head_dim),
    628                 layer_past[1].view(batch_size_times_num_heads, kv_length, head_dim),
    629             )

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

This is my trainer object:

from trl import SFTTrainer

trainer = SFTTrainer(
    model=model,
    args=training_args,
    peft_config=peft_config,
    tokenizer=tokenizer,
    dataset_text_field="text",
    train_dataset=dataset_train,
    eval_dataset=dataset_val,
)

This is my training arguments:

from transformers import TrainingArguments

#Arguments needed for training process
output_dir = "falcon-7b-sharded"
per_device_train_batch_size = 5
gradient_accumulation_steps = 4
device = "cuda"
num_epochs = 10
#Torch adamw optimization algorithm is used in QLoRA
optim = "adamw_torch"
save_steps = 10
logging_steps = 10
learning_rate = 5e-5
max_grad_norm = 0.3
max_steps = 200
warmup_ratio = 0
lr_scheduler_type = "linear"


training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    num_train_epochs=num_epochs,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    report_to=None,
    lr_scheduler_type=lr_scheduler_type

)

model = model.to(device)

As far as I concern, this error lies in the hidden code behind the pre-trained model, which I can’t access to modify. Does anyone has any solutions for this?

Thank you