I’ve been working on Wav2Vec2ForCTC model for a while. I used to have small audio files, i.e., audio files with relatively short durations (~ 1 min). When I tested the model on a large file (~ 14 mins), the model could not handle it in GPU, so, I shifted to use CPU. I notices that it used more than 200 GB of RAM to decode! I’ve tried to split the audio file into smaller audio files and use hidden states to link them together while decoding each segment but I could not find a way to feed the hidden states of the current audio file to the next audio file for the model to use it while decoding!
Any ideas or suggestions?