Is there a timeline for when
Transformers 4.6.0 will be supported in the HuggingFace SDK on SageMaker?
I’ve recently been having issues with CUDA running out of memory while training a distilBert model:
RuntimeError: CUDA out of memory. Tried to allocate 6.87 GiB (GPU 0; 15.78 GiB total capacity; 7.35 GiB already allocated; 2.79 GiB free; 11.78 GiB reserved in total by PyTorch)
It seems like this has been acknowledged and fixed in a recent commit - also similarly described here. Looks like this has also been added to
Transformers 4.6.0 and I can confirm that using this latest version (without the HuggingFace SDK) fixes the OOM issues for me.
When configuring the HuggingFace estimator, it seems like the latest supported version of
Transformers is version
ValueError: Unsupported huggingface version: 4.6.0. You may need to upgrade your SDK version (pip install -U sagemaker) for newer huggingface versions. Supported huggingface version(s): 4.4.2, 4.5.0, 4.4, 4.5.
Does anyone have an idea of when we can expect version
4.6.0 to be supported?