Hey @navissivan and @ksoky,
- Don’t worry about the
use_cache
warning, it just means that we cannot use thek,v
cache for the attention mechanism with gradient checkpointing. If you want to disable the warning, load the model and then setuse_cache
to False:
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model.config.use_cache = False
The operation of the model is the same with and without cache - we just use cache to speed up decoding. Cache isn’t compatible when we use gradient checkpointing, so it’s disabled by the Trainer and a warning shown instead.
-
It shouldn’t stay idle for that long - usually this happens when we set
group_by_length=True
but haven’t specifiedinput_lengths
in outprepare_dataset
function. Have you modified theprepare_dataset
function? Could you make sure the dataset that you pass to trainer has theinput_lengths
column? -
A progress bar should show - you need to set
disable_tqdm=False
in your training args.
You have a couple of options for running it in the background:
- tmux: call tmux and then run Jupyter notebooks from the tmux shell:
tmux new -s mysession
jupyter lab
Then run your shell as normal. The process will continue running even when you close your shell. When you re-open your shell, you can reattach through:
tmux a -t mysession
Check out the docs for more info.
- The other option is to export the ipynb notebook as a python script, and then run it using tmux or nohup:
From File → Export Notebook As… in the Jupyter Lab menu select ‘Export Notebook to Executable Script’. This will give you a Python script to download. Then run it using tmux (as above) or nohup:
nohup python fine-tuning-whisper.py
You can open a new window to view the output:
vim nohup.out
-
The table generates automatically by the Trainer if you perform evaluation over the course of training.
-
It’s possible. The model checkpoint saved at step 1000 saves in the output directory under
/home/sivan/whisper_base_fl_ch/checkpoint-1000
You can load a model checkpoint from the saved checkpoint at step 1000 as follows:
model = WhisperForConditionalGeneration.from_pretrained("/home/sivan/whisper_base_fl_ch/checkpoint-1000")
You can then run a validation step:
from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer
training_args = Seq2SeqTrainingArguments(
output_dir="/home/sivan/whisper_base_fl_ch/validation_step",
do_train=False,
do_eval=True,
per_device_eval_batch_size=8,
predict_with_generate=True,
generation_max_length=225,
save_strategy="no",
report_to=["tensorboard"],
push_to_hub=False,
disable_tqdm=False,
)
trainer = Seq2SeqTrainer(
args=training_args,
model=model,
eval_dataset=fleurs_ch["validation"], # set to your val set
data_collator=data_collator,
compute_metrics=compute_metrics,
tokenizer=processor.feature_extractor,
)
trainer.evaluate()
You can then repeat this for the checkpoints in directories checkpoint-2000
, checkpoint-3000
and so on.