Tensor types mismatch when trying to enable GPU

Hello,

I need help figuring out what I’m missing from this tutorial about Whisper fine-tuning.

I’m working in a containerized Docker environment with the following specifications:

  • Base image: ubuntu:22.04
  • Python 3.10
  • Pipenv to manage dependencies and Python environment

I’m using the Italian subset of the original dataset common_voice and working with just 1% of the subset’s data. Here’s the code snippet:

common_voice["train"] = load_dataset("mozilla-foundation/common_voice_11_0", "it", split="train[:1%]+validation[:1%]", use_auth_token=True)
common_voice["test"] = load_dataset("mozilla-foundation/common_voice_11_0", "it", split="test[:1%]", use_auth_token=True)

For now, my objective is to implement the entire tutorial. I’m focusing on the implementation of the entire application rather than the quality of the final training. I have successfully implemented all steps regarding dataset loading and data preparation. However, when it is time to run the trainer.train method, the process takes too long, and the progress bar remains at 0% for hours, despite having a relatively small dataset. Here is the code for my trainer configuration:

# -------- #
# training #
# -------- #

print("Configuring training parameters...")
training_args = Seq2SeqTrainingArguments(
    optim="adamw_torch",
    output_dir="./whisper-small-it",  # Change to a repo name of your choice
    per_device_train_batch_size=16,
    gradient_accumulation_steps=1,  # Increase by 2x for every 2x decrease in batch size
    learning_rate=1e-5,
    warmup_steps=500,
    max_steps=4000,
    gradient_checkpointing=True,
    fp16=False,  # FP16 half precision evaluation (`--fp16_full_eval`) can only be used on CUDA devices
    evaluation_strategy="steps",
    per_device_eval_batch_size=8,
    predict_with_generate=True,
    generation_max_length=225,
    save_steps=1000,
    eval_steps=1000,
    logging_steps=25,
    report_to=["tensorboard"],
    load_best_model_at_end=True,
    metric_for_best_model="wer",
    greater_is_better=False,
    push_to_hub=True,
)

trainer = Seq2SeqTrainer(
    args=training_args,
    model=model,
    train_dataset=common_voice["train"],
    eval_dataset=common_voice["test"],
    data_collator=data_collator.DataCollatorSpeechSeq2SeqWithPadding(processor=processor),
    compute_metrics=compute_metrics,
    tokenizer=processor.feature_extractor,
)

trainer.train()

Due to the slow training process, I decided to move outside my container and run the application on my MacBook Pro (Apple M2 Max) to utilize my GPU. I configured Accelerate using the following command:

python -c "from accelerate.utils import write_basic_config; write_basic_config(mixed_precision='fp16')"

I also edited my training arguments to enable fp16=True. However, when running the application on my Mac with this setup, I encountered the following error at the trainer.train method:

Traceback (most recent call last):
  File "/macos-local/main.py", line 126, in <module>
    trainer.train()
  File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers

/trainer.py", line 1645, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers/trainer.py", line 1938, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers/trainer.py", line 2759, in training_step
    loss = self.compute_loss(model, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers/trainer.py", line 2784, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers/models/whisper/modeling_whisper.py", line 1419, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers/models/whisper/modeling_whisper.py", line 1268, in forward
    encoder_outputs = self.encoder(
                      ^^^^^^^^^^^^^
  File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers/models/whisper/modeling_whisper.py", line 822, in forward
    inputs_embeds = nn.functional.gelu(self.conv1(input_features))
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 313, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site

-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Mismatched Tensor types in NNPack convolutionOutput
  0%|    

I believe I’m missing something in the Accelerate configuration. If anyone can help me, I would really appreciate it!

Thank you!