Dimension Error After Prompt-tuning the Gemma2 model

AliHamzeh · January 23, 2025, 3:45pm

I’m trying to prompt-tune gemma2-it using this code:

from peft import  get_peft_model, PromptTuningConfig, TaskType, PromptTuningInit

tuning_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,
    prompt_tuning_init=PromptTuningInit.RANDOM,  
    num_virtual_tokens=2, 
    tokenizer_name_or_path=TOKENIZER_ID
)

peft_model = get_peft_model(model, tuning_config)

training_args = SFTConfig(
        output_dir="./gemma_nn_1b_freeze",
        num_train_epochs=1,                     
        per_device_train_batch_size=1,          
        gradient_accumulation_steps=8,          
        max_seq_length=2503,
        optim="adamw_torch_fused",              
        learning_rate=2e-4,                     
        lr_scheduler_type="constant",   
        warmup_ratio=0.03,                  
        logging_steps=1, 
        save_steps=200,                      
        save_strategy="steps",                  
        bf16=True,                              
        fp16=False,                             
        max_grad_norm=0.3,                      
        gradient_checkpointing=True,            
        packing=True,        
        report_to="none",
        disable_tqdm=False, 
        dataset_kwargs={
        "add_special_tokens": False, # We template with special tokens
        "append_concat_token": False, # No need to add additional separator token
    },
    )

trainer = SFTTrainer(
    model=model,
    processing_class=tokenizer,
    train_dataset=train_sample,    
    args=training_args,  
    
    peft_config=tuning_config
)

trainer.train()

but after training finished when I want to use the model I get this error:

def get_outputs(model, inputs, max_new_tokens=256):
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=max_new_tokens,
        repetition_penalty=1.5,  
        early_stopping=True,  
        eos_token_id=tokenizer.eos_token_id,
    )
    return outputs

trained_model = trainer.model

input_prompt = tokenizer("I want you to act as a motivational coach. ", return_tensors="pt")

loaded_model_sentences_outputs = get_outputs(trained_model, input_prompt)
print(tokenizer.batch_decode(loaded_model_sentences_outputs, skip_special_tokens=True))

Error:

      1 def get_outputs(model, inputs, max_new_tokens=256):
----> 2     outputs = model.generate(
      3         input_ids=inputs["input_ids"],
      4         attention_mask=inputs["attention_mask"],
      5         max_new_tokens=max_new_tokens,
      6         repetition_penalty=1.5,  
      7         early_stopping=True,  
      8         eos_token_id=tokenizer.eos_token_id,
      9     )
     10     return outputs

File c:\Users\ALI\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\peft_model.py:1640, in PeftModelForCausalLM.generate(self, *args, **kwargs)
   1638             outputs = self.base_model.generate(*args, **kwargs)
   1639     else:
-> 1640         outputs = self.base_model.generate(**kwargs)
   1641 except:
...
   1682     )
   1684 if model_kwargs.get("position_ids", None) is not None:
   1685     warnings.warn("Position ids are not supported for parameter efficient tuning. Ignoring position ids.")

RuntimeError: Tensors must have same number of dimensions: got 2 and 4

my tokenizer is philschmid/gemma-tokenizer-chatml but I also tried the default tokenizer of gemma2. I think the problem comes from the extra tokens that the prompt-tuned model adds but I have no idea how it’s done and how I can solve the error. Help is appreciated.

Alanturner2 · January 23, 2025, 3:55pm

Hi there!
The error you’re encountering (RuntimeError: Tensors must have same number of dimensions: got 2 and 4 ) is likely due to a mismatch in the input tensor dimensions when using the prompt-tuned model for generation. This issue often arises because the prompt-tuning process adds virtual tokens to the input, and the model expects these tokens to be handled correctly during inference.

from peft import PeftModel, PeftConfig

def get_outputs(model, inputs, max_new_tokens=256):
    # Ensure the model is in evaluation mode
    model.eval()
    
    # Get the prompt with virtual tokens
    inputs = model.prepare_inputs_for_generation(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=max_new_tokens,
        repetition_penalty=1.5,
        early_stopping=True,
        eos_token_id=tokenizer.eos_token_id,
    )
    
    # Generate outputs
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=max_new_tokens,
        repetition_penalty=1.5,
        early_stopping=True,
        eos_token_id=tokenizer.eos_token_id,
    )
    return outputs

# Load the trained model
trained_model = trainer.model

# Prepare the input prompt
input_prompt = tokenizer("I want you to act as a motivational coach. ", return_tensors="pt")

# Get the outputs
loaded_model_sentences_outputs = get_outputs(trained_model, input_prompt)
print(tokenizer.batch_decode(loaded_model_sentences_outputs, skip_special_tokens=True))

AliHamzeh · January 23, 2025, 4:01pm

Thanks but i’m still getting the same error.

it seems that this part is the problem:

File c:\Users\ALI\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\peft_model.py:1680, in PeftModelForCausalLM.prepare_inputs_for_generation(self, task_ids, *args, **kwargs)
   1678     size = model_kwargs["input_ids"].shape[0], peft_config.num_virtual_tokens
   1679     prefix_attention_mask = torch.ones(size).to(model_kwargs["input_ids"].device)
-> 1680     model_kwargs["attention_mask"] = torch.cat(
   1681         (prefix_attention_mask, model_kwargs["attention_mask"]), dim=1
   1682     )

rbelanec · June 26, 2025, 9:14am

Hi, I think it has something to do with the caching of generated results. This problem only arises with Gemma models, from what I experienced.

I had a similar problem a month ago. I was able to fix it by deploying the fix by @BenjaminB

github.com/huggingface/peft

FIX Prompt learning issue with 4d attention mask

main ← BenjaminBossan:fix-prompt-tuning-4d-attention-mask

opened 11:30AM - 27 Mar 25 UTC

BenjaminBossan

+100 -12

Resolves #2452 Some causal language models in transformers have 4d attention …masks at the input preparation stage. So far, we have assumed 2d attention masks, which results in an error in that case. This PR fixes the situation. My first attempt was to transform the 2d prefix attention mask (from the virtual tokens) into a 4d attention mask before concatenating them. However, this was error prone and I was unsure if my approach would generalize to other model architectures than the one tested (gemma), as it involved using private transformers methods (`model._prepare_4d_causal_attention_mask_with_cache_position`). The simpler approach was thus to just create a 2d attention mask and let the model handle it. The test suite has been extended to include a tiny gemma model. To prevent the test suite from ballooning, I removed another model. Specifically, this was GPT neox, which from HF download stats seems to be one of the least popular architectures from our test suite. I also extended the default parameters in `constants.py` for the different PEFT methods to support gemma. Unfortunately, some tests are failing with gemma. When they were unrelated to changes in this PR, I chose to just skip those tests, as I consider them out of scope for this PR.

It seems that it will be fixed in release 0.15.3. From now on, you can install the package from the main branch.

python -m pip install -U git+https://github.com/huggingface/peft.git@main

Topic		Replies	Views
Peft Prompt Tuning - ValueError: `create_and_replace` does not support prompt learning and adaption prompt yet Models	0	500	January 4, 2024
Error with get_peft_model() and PromptTuningConfig 🤗Transformers	1	1547	November 6, 2023
Trouble running SFT with PEFT model Beginners	2	1015	March 19, 2024
Regenerate Prompt tuning result with appended prompt on base model Intermediate	0	881	August 6, 2023
Dimension error when trying to use Neuron compiled HF model on inferentia Amazon SageMaker	4	1235	May 20, 2022

Dimension Error After Prompt-tuning the Gemma2 model

Related topics