GPU memory error when trying to fine tune the whisper model with a custom data set

RobbieJimersonJr · December 17, 2022, 9:39pm

I’m trying to fine tune the whisper model with a custom data set but I’m getting a memory error.

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 779. GiB for an array with shape (480000, 435681) and data type float32

I’m using the fine-tune-whisper-non-streaming code and the error is thrown at the following line of code:

batch["input_features"] = processor.feature_extractor(audio["array"], sampling_rate=audio["sampling_rate"]).input_features[0]

I’m concerned something is wrong with the audio["array"] part of my custom dataset. In an effort to reuse the most code I changed the format of my custom dataset to match that of the common voice dataset used. My dataset starts as a CSV in the format of “wav file, transcription”

/home/username/location/of/wavfileOne.wav,transcription of the first utterance
/home/username/location/of/wavfileTwo.wav,transcription of the second utterance

I use the following code to go from my CSV to the DataSet Dict:

test_sentence = []
test_audio = []
with open('custom_dataset.csv', 'r') as read_obj:
    csv_reader = reader(read_obj)
    for row in csv_reader:
        path = row[0]
        speech_array, sampling_rate = librosa.load(path, sr=None)
        audio_stuff = {'path':row[0], 'array': speech_array, 'sampling_rate': sampling_rate}
        test_audio.append(audio_stuff)
        test_sentence.append(row[1])

dataset_dict = {'audio': test_audio, 'sentence': test_sentence}
custom_dataset = Dataset.from_dict(dataset_dict)

The custom_dataset[audio][array] is of float32 just like common voice array. Again my biggest concern is if I’m creating the array data correct ie speech_array, sampling_rate = librosa.load(path, sr=None). I’m using a subset (5 samples) of my dataset for testing. The longest wav file in the subset is 12 seconds long. I’m able to use the same dataset to fine tune from the wav2vec XLSR pretrained models with out any issues.

Topic		Replies	Views
Help needed with issues while trying fine-tune Whisper Beginners	2	1401	April 19, 2024
Wav2vec2.0 memory issue Models	13	11499	December 25, 2024
RuntimeError: The size of tensor a (553) must match the size of tensor b (448) at non-singleton dimension 1 Beginners	3	1087	July 17, 2024
Wav2vec2-xls-r-2b out of memory issues on A100 (40 GB) Models	0	682	May 13, 2022
How to fit custom audio dataset during pre-process? Batch? Stream? Shard? Beginners	1	252	May 26, 2023

GPU memory error when trying to fine tune the whisper model with a custom data set

Related topics