I am using the AutoTrain Advanced UI feature for training the Mixtral-8X7B-Instruct-v0.1
model. I have upgraded my hardware of space to Nvidia 4XA10G Large
which has 184 GB RAM
and 96 GB VRAM
.
I think this is powerful hardware to train my small data set. Still, I am facing the following error:
ERROR | 2024-01-08 13:15:32 | autotrain.trainers.common:wrapper:90 - train has failed due to an exception: Traceback (most recent call last):
File "/app/src/autotrain/trainers/common.py", line 87, in wrapper
return func(*args, **kwargs)
File "/app/src/autotrain/trainers/clm/__main__.py", line 186, in train
model = AutoModelForCausalLM.from_pretrained(
File "/app/env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/app/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3694, in from_pretrained
) = cls._load_pretrained_model(
File "/app/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4104, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/app/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 786, in _load_state_dict_into_meta_model
set_module_quantized_tensor_to_device(
File "/app/env/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py", line 98, in set_module_quantized_tensor_to_device
new_value = bnb.nn.Params4bit(new_value, requires_grad=False, **kwargs).to(device)
File "/app/env/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 179, in to
return self.cuda(device)
File "/app/env/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 157, in cuda
w_4bit, quant_state = bnb.functional.quantize_4bit(w, blocksize=self.blocksize, compress_statistics=self.compress_statistics, quant_type=self.quant_type)
File "/app/env/lib/python3.10/site-packages/bitsandbytes/functional.py", line 812, in quantize_4bit
absmax = torch.zeros((blocks,), device=A.device, dtype=torch.float32)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 21.99 GiB of which 19.06 MiB is free. Process 14595 has 21.96 GiB memory in use. Of the allocated memory 21.50 GiB is allocated by PyTorch, and 174.69 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ā ERROR | 2024-01-08 13:15:32 | autotrain.trainers.common:wrapper:91 - CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 21.99 GiB of which 19.06 MiB is free. Process 14595 has 21.96 GiB memory in use. Of the allocated memory 21.50 GiB is allocated by PyTorch, and 174.69 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I have changed various hardware but still getting this error. Please help me to figure out what am I doing wrong.
Thanks in advance!!