Very Slow Fine Tuning Performance for Speech?

picheny · November 2, 2022, 6:53pm

I am trying to fine tune the Facebook 60K hour wav2vec2 model as described in Patrick von Platen’s article: Fine-Tune Wav2Vec2 for English ASR in Hugging Face with 🤗 Transformers

I have about 200 hours of speech data. I am using 3 RTX8000 GPUs on a 48 Core Lenovo SR670 with 369 GB of memory (on our NYU compute cluster).

It seems to be using all 3 GPUs (utilization is high) but seems as slow as molasses. If I read the output right it looks like it will take over a week to fine tune the model with 30 epochs.

Does this sound correct?

Thanks
Michael Picheny

seba3y · August 14, 2023, 6:06pm

are you found out the problem?

picheny · August 14, 2023, 6:35pm

I just ran it for a week.

seba3y · August 14, 2023, 8:30pm

dam, i’m working with 1200 hour of Bengal speech and use Lora and PEFT for finetune the model but time and resource usage is equivalent to full finetune of the model.

Topic		Replies	Views
Wav2Vec: Best WER for hours < 30 finetuning 🤗Transformers	0	217	January 12, 2023
Fine-tuning Wav2Vec2 for English ASR with 🤗 on local machine Transformers 🤗Transformers	1	421	August 10, 2021
How much fire power are we expected to have in order to fine tune the W2V2 XLSR model? 🤗Transformers	4	879	March 27, 2021
Fine tuning whisper on custom dataset Beginners	3	934	January 11, 2024
Wav2vec2 not converging when finetuning 🤗Transformers	7	2550	June 15, 2021

Very Slow Fine Tuning Performance for Speech?

Related topics