ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length,medalpaca & lora


im trying to use PEFT LoRa on MedAlpaca but i get this error, im stuck i found no solution


here’s my data

i’ve enabled ‘padding=True’ ‘truncation=True’ in tokenizer but noe result

I am having similar issue training Mistral 7B Instruct.
Actually all was working yesterday so I am not sure if update to any package used could be the issue.

I created google collab with exact example and visible error.

Same problem, yesterday was ok, today i got this error.

I tried with this version of packages, and is working.

!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

UPDATE , try using the tokenize function here in the notebook
it worked for me

Does not work for mistral


KeyError Traceback (most recent call last)
Cell In[15], line 13
4 compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
6 bnb_config = BitsAndBytesConfig(
7 load_in_4bit=use_4bit,
8 bnb_4bit_quant_type=bnb_4bit_quant_type,
9 bnb_4bit_compute_dtype=compute_dtype,
10 bnb_4bit_use_double_quant=use_nested_quant,
11 )
—> 13 base_model = AutoModelForCausalLM.from_pretrained(
14 model_name,
15 quantization_config=bnb_config,
16 device_map={“”: 0}
17 )
19 base_model.config.use_cache = False
20 base_model.config.pretraining_tp = 1

File /opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:461, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
458 if kwargs.get(“torch_dtype”, None) == “auto”:
459 _ = kwargs.pop(“torch_dtype”)
→ 461 config, kwargs = AutoConfig.from_pretrained(
462 pretrained_model_name_or_path,
463 return_unused_kwargs=True,
464 trust_remote_code=trust_remote_code,
465 **hub_kwargs,
466 **kwargs,
467 )
469 # if torch_dtype=auto was passed here, ensure to pass it on
470 if kwargs_orig.get(“torch_dtype”, None) == “auto”:

File /opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:998, in AutoConfig.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
996 return config_class.from_pretrained(pretrained_model_name_or_path, **kwargs)
997 elif “model_type” in config_dict:
→ 998 config_class = CONFIG_MAPPING[config_dict[“model_type”]]
999 return config_class.from_dict(config_dict, **unused_kwargs)
1000 else:
1001 # Fallback: use pattern matching on the string.
1002 # We go from longer names to shorter names to catch roberta before bert (for instance)

File /opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:710, in _LazyConfigMapping.getitem(self, key)
708 return self._extra_content[key]
709 if key not in self._mapping:
→ 710 raise KeyError(key)
711 value = self._mapping[key]
712 module_name = model_type_to_module_name(key)

KeyError: ‘mistral’

Here is what works for me, I suspect that maybe try could be the problematic one:

!pip install -q torch
!pip install -q transformers==4.36.0 #huggingface transformers for downloading models weights
!pip install -q datasets #huggingface datasets to download and manipulate datasets
!pip install -q peft==0.4.0 #Parameter efficient finetuning - for qLora Finetuning
!pip install -q bitsandbytes==0.41.1 #For Model weights quantisation
!pip install -q trl==0.4.7 #Transformer Reinforcement Learning - For Finetuning using Supervised Fine-tuning
!pip install -q accelerate==0.21.0
!pip install -q wandb -U #Used to monitor the model score during training
!pip install -q scipy
!pip install -q tensorboard
!pip install -q matplotlib