Hi,
I tried to fine-tune Whisper (all model sizes) with the event python script (run_speech_recognition_seq2seq_streaming.py) and the following code from the event page on Lambda GPU (and Google Colab with the Whisper tiny model) but it failed because of 2 errors.
I found a solution for the first one but not for the second one.
echo 'python run_speech_recognition_seq2seq_streaming.py \
--model_name_or_path="openai/whisper-small" \
--dataset_name="mozilla-foundation/common_voice_11_0" \
--dataset_config_name="es" \
--language="spanish" \
--train_split_name="train+validation" \
--eval_split_name="test" \
--model_index_name="Whisper Small Spanish" \
--max_steps="5000" \
--output_dir="./" \
--per_device_train_batch_size="64" \
--per_device_eval_batch_size="32" \
--logging_steps="25" \
--learning_rate="1e-5" \
--warmup_steps="500" \
--evaluation_strategy="steps" \
--eval_steps="1000" \
--save_strategy="steps" \
--save_steps="1000" \
--generation_max_length="225" \
--length_column_name="input_length" \
--max_duration_in_seconds="30" \
--text_column_name="sentence" \
--freeze_feature_encoder="False" \
--report_to="tensorboard" \
--gradient_checkpointing \
--fp16 \
--overwrite_output_dir \
--do_train \
--do_eval \
--predict_with_generate \
--do_normalize_eval \
--use_auth_token \
--push_to_hub' >> run.sh
First error (during training)
"use_cache=True" is incompatible with gradient checkpointing. Setting "use_cache=False"...
The correction must be done in line 393. Then, the new config is the following one:
config = AutoConfig.from_pretrained(
model_args.config_name if model_args.config_name else model_args.model_name_or_path,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
use_cache=False if training_args.gradient_checkpointing,
)
Second error (during evaluation)
Traceback (most recent call last):
File "run_speech_recognition_seq2seq_streaming.py", line 607, in <module>
main()
File "run_speech_recognition_seq2seq_streaming.py", line 556, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/ubuntu/mwpt/lib/python3.8/site-packages/transformers/trainer.py", line 1527, in train
return inner_training_loop(
File "/home/ubuntu/mwpt/lib/python3.8/site-packages/transformers/trainer.py", line 1852, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/home/ubuntu/mwpt/lib/python3.8/site-packages/transformers/trainer.py", line 2115, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/home/ubuntu/mwpt/lib/python3.8/site-packages/transformers/trainer_seq2seq.py", line 78, in evaluate
return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
File "/home/ubuntu/mwpt/lib/python3.8/site-packages/transformers/trainer.py", line 2811, in evaluate
output = eval_loop(
File "/home/ubuntu/mwpt/lib/python3.8/site-packages/transformers/trainer.py", line 3096, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
File "run_speech_recognition_seq2seq_streaming.py", line 509, in compute_metrics
wer = 100 * metric.compute(predictions=pred_str, references=label_str)
File "/home/ubuntu/mwpt/lib/python3.8/site-packages/evaluate/module.py", line 444, in compute
output = self._compute(**inputs, **compute_kwargs)
File "/home/ubuntu/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--wer/85bee9e4216a78bb09b2d0d500f6af5c23da58f9210e661add540f5df6630fcd/wer.py", line 103, in _compute
measures = compute_measures(reference, prediction)
File "/home/ubuntu/mwpt/lib/python3.8/site-packages/jiwer/measures.py", line 179, in compute_measures
raise ValueError("one or more groundtruths are empty strings")
ValueError: one or more groundtruths are empty strings
How to solve this issue?