Space AutoTrain Docker failed after huggingface outage today

Hi,

I tried to finetune mistral instruct 2 from the space autotrain and got the below error today :
ERROR | 2024-04-24 02:15:17 | autotrain.trainers.common:wrapper:119 - train has failed due to an exception: Traceback (most recent call last):
File “/app/env/lib/python3.10/site-packages/transformers/utils/import_utils.py”, line 1510, in _get_module
return importlib.import_module(“.” + module_name, self.name)
File “/app/env/lib/python3.10/importlib/init.py”, line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 1050, in _gcd_import
File “”, line 1027, in _find_and_load
File “”, line 1006, in _find_and_load_unlocked
File “”, line 688, in _load_unlocked
File “”, line 883, in exec_module
File “”, line 241, in _call_with_frames_removed
File “/app/env/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py”, line 49, in
from flash_attn import flash_attn_func, flash_attn_varlen_func
File “/app/env/lib/python3.10/site-packages/flash_attn/init.py”, line 3, in
from flash_attn.flash_attn_interface import (
File “/app/env/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py”, line 10, in
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /app/env/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py”, line 116, in wrapper
return func(*args, **kwargs)
File “/app/env/lib/python3.10/site-packages/autotrain/trainers/clm/main.py”, line 308, in train
model = AutoModelForCausalLM.from_pretrained(
File “/app/env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 562, in from_pretrained
model_class = _get_model_class(config, cls._model_mapping)
File “/app/env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 383, in _get_model_class
supported_models = model_mapping[type(config)]
File “/app/env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 734, in getitem
return self._load_attr_from_module(model_type, model_name)
File “/app/env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 748, in _load_attr_from_module
return getattribute_from_module(self._modules[module_name], attr)
File “/app/env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 692, in getattribute_from_module
if hasattr(module, attr):
File “/app/env/lib/python3.10/site-packages/transformers/utils/import_utils.py”, line 1500, in getattr
module = self._get_module(self._class_to_module[name])
File “/app/env/lib/python3.10/site-packages/transformers/utils/import_utils.py”, line 1512, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.mistral.modeling_mistral because of the following error (look up to see its traceback):
/app/env/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

ERROR | 2024-04-24 02:15:17 | autotrain.trainers.common:wrapper:120 - Failed to import transformers.models.mistral.modeling_mistral because of the following error (look up to see its traceback):
/app/env/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
INFO | 2024-04-24 02:15:17 | autotrain.trainers.common:pause_space:77 - Pausing space…

Thanks

Wayne