I’m encountering a compatibility issue between the transformers
library and TensorFlow 2.18 during training with the TFLongformerForQuestionAnswering
model. The setup includes TensorFlow 2.18, Transformers (latest version as of November 2024), and a custom model head for Q&A fine-tuning.
During training, I receive the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
required broadcastable shapes
[[node tf_longformer_for_question_answering/longformer/encoder/layer_._0/attention/self/dropout_1/dropout/SelectV2]]
The error persists despite reducing batch size, disabling dropout, and updating to the latest transformers
library version. It seems to originate within the Longformer’s attention mechanism, potentially due to shape incompatibilities or dropout inconsistencies specific to TensorFlow 2.18.
Troubleshooting Steps Taken
- Verified input and attention mask shapes, ensuring compatibility with the model’s expected
(batch_size, sequence_length)
dimensions.
- Removed dropout layers and tried varying
batch_size
settings.
- Updated the
transformers
library and TensorFlow to the latest versions.
Is there an official statement or ongoing update addressing compatibility issues between TensorFlow 2.18 and the transformers
library?
1 Like
The error you’re encountering often suggests a mismatch in tensor shapes during operations that require broadcasting. Since you’ve already confirmed the input shapes and tried various configurations, this might be a deeper compatibility issue between TensorFlow 2.18 and the transformers library.
While there might not be an official statement specifically addressing this issue, TensorFlow and Hugging Face are constantly working on improving compatibility. It’s possible that this is a known issue being worked on for future releases.
In the meantime, consider checking GitHub issues for both TensorFlow and the transformers library. Developers and users often report such issues there, and you might find workarounds or patches shared by the community. Additionally, if possible, try running your setup with an earlier version of TensorFlow (e.g., 2.17) to see if the problem persists, as this might help identify if it’s a version-specific issue. If all else fails, reaching out directly to Hugging Face’s support or community forums may provide additional insights or solutions.
2 Likes
Thanks, Steve, for the clarifications. So far, have not found any issue reported in GitHub regarding this broadcastable shapes. Or issues broadly related with attention dropouts.
After checking the versions, I see that both TensorFlow 2.17 and TensorFlow 2.18 run with the same Transformers library version, 4.46.2. So, it appears that compatibility between Transformers and TensorFlow 2.18 has not been fully addressed yet.
To assist others with the same issue, I was able to control the problem by reducing the depth size at certain points in the model architecture, which seemed to cause the issue.
1 Like