ArnauC
October 13, 2021, 4:32pm
1
Hi all,
I’m trying to finetune a summarization model (bigbird-pegasus-large-bigpatent) on my own data.
Of course even with premium colab I’m having memory issues, so I tried to set gradient_checkpointing = True in the Seq2SeqTrainingArguments, which is supposed to save some memory altgough increasing the computation time.
The problem is that when starting the training this argument rises an error:
AttributeError: module ‘torch.utils’ has no attribute ‘checkpoint’
Has anyone experienced this same error?
I read in the Github discussion:
https://github.com/huggingface/transformers/issues/9617
https://github.com/huggingface/transformers/issues/11193
https://github.com/huggingface/transformers/issues/9919
that in some other cases the same error was appearing but it was supposed to be solved here:
https://github.com/huggingface/transformers/pull/9626
Any help would be appreciated.
Thanks
sfalk
December 16, 2021, 2:16pm
2
Hi! I am facing a similar issue. Have you been able to solve it?
The code that causing the problem here is the following:
model_path = "facebook/s2t-small-librispeech-asr"
# Initialize the model
model = Speech2TextForConditionalGeneration.from_pretrained(model_path)
model = model.eval()
# Attach decoder
model = SpeechRecognizer(model, labels=labels)
# Apply quantization / script / optimize for mobile
quantized_model = torch.quantization.quantize_dynamic(model, qconfig_spec={torch.nn.Linear}, dtype=torch.qint8)
scripted_model = torch.jit.script(quantized_model)
The SpeechRecognizer
is just a simple torch.nn.Module
wrapper.
ArnauC
December 17, 2021, 11:13am
3
Hi! I think that instead of adding gradient_checkpointing as an argument to training arguments, I used this line when I defined the model:
model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
model.gradient_checkpointing_enable()
Anyway we ended up training this model in a GCP, it was too big.
Hope this helps!
3 Likes
@ArnauC did it behave wierdly ?