Hello,
I need help figuring out what I’m missing from this tutorial about Whisper fine-tuning.
I’m working in a containerized Docker environment with the following specifications:
- Base image: ubuntu:22.04
- Python 3.10
- Pipenv to manage dependencies and Python environment
I’m using the Italian subset of the original dataset common_voice
and working with just 1% of the subset’s data. Here’s the code snippet:
common_voice["train"] = load_dataset("mozilla-foundation/common_voice_11_0", "it", split="train[:1%]+validation[:1%]", use_auth_token=True)
common_voice["test"] = load_dataset("mozilla-foundation/common_voice_11_0", "it", split="test[:1%]", use_auth_token=True)
For now, my objective is to implement the entire tutorial. I’m focusing on the implementation of the entire application rather than the quality of the final training. I have successfully implemented all steps regarding dataset loading and data preparation. However, when it is time to run the trainer.train
method, the process takes too long, and the progress bar remains at 0% for hours, despite having a relatively small dataset. Here is the code for my trainer configuration:
# -------- #
# training #
# -------- #
print("Configuring training parameters...")
training_args = Seq2SeqTrainingArguments(
optim="adamw_torch",
output_dir="./whisper-small-it", # Change to a repo name of your choice
per_device_train_batch_size=16,
gradient_accumulation_steps=1, # Increase by 2x for every 2x decrease in batch size
learning_rate=1e-5,
warmup_steps=500,
max_steps=4000,
gradient_checkpointing=True,
fp16=False, # FP16 half precision evaluation (`--fp16_full_eval`) can only be used on CUDA devices
evaluation_strategy="steps",
per_device_eval_batch_size=8,
predict_with_generate=True,
generation_max_length=225,
save_steps=1000,
eval_steps=1000,
logging_steps=25,
report_to=["tensorboard"],
load_best_model_at_end=True,
metric_for_best_model="wer",
greater_is_better=False,
push_to_hub=True,
)
trainer = Seq2SeqTrainer(
args=training_args,
model=model,
train_dataset=common_voice["train"],
eval_dataset=common_voice["test"],
data_collator=data_collator.DataCollatorSpeechSeq2SeqWithPadding(processor=processor),
compute_metrics=compute_metrics,
tokenizer=processor.feature_extractor,
)
trainer.train()
Due to the slow training process, I decided to move outside my container and run the application on my MacBook Pro (Apple M2 Max) to utilize my GPU. I configured Accelerate
using the following command:
python -c "from accelerate.utils import write_basic_config; write_basic_config(mixed_precision='fp16')"
I also edited my training arguments to enable fp16=True
. However, when running the application on my Mac with this setup, I encountered the following error at the trainer.train
method:
Traceback (most recent call last):
File "/macos-local/main.py", line 126, in <module>
trainer.train()
File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers
/trainer.py", line 1645, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers/trainer.py", line 1938, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers/trainer.py", line 2759, in training_step
loss = self.compute_loss(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers/trainer.py", line 2784, in compute_loss
outputs = model(**inputs)
^^^^^^^^^^^^^^^
File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers/models/whisper/modeling_whisper.py", line 1419, in forward
outputs = self.model(
^^^^^^^^^^^
File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers/models/whisper/modeling_whisper.py", line 1268, in forward
encoder_outputs = self.encoder(
^^^^^^^^^^^^^
File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/transformers/models/whisper/modeling_whisper.py", line 822, in forward
inputs_embeds = nn.functional.gelu(self.conv1(input_features))
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.local/share/virtualenvs/macos-local-Q10oLr9O/lib/python3.11/site
-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Mismatched Tensor types in NNPack convolutionOutput
0%|
I believe I’m missing something in the Accelerate
configuration. If anyone can help me, I would really appreciate it!
Thank you!