Pegasus max_token_len restriction

Hi everyone, I was fine_tuning a few model using the “google/pegasus_xsum” for question-answering task, I want to take in “context” then generate the “question”. My problem is that: I can only set my source_max_token_len = 512 for the “context”, if it > 512, I’ll receive:
“RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.”
I think that Google Colab RAM memory only allow me to run with the max_token_len less than or equal to 512
Does anyone know how to handle this?