My desktop is running on Win11, and RTX 3070. Now I have a NLP task which uses model_content = AutoModelForSequenceClassification.from_pretrained(self.model_path, config=self.model_config)
so I would love to leverage my GPU in the machine.
I pip installed torch with this:
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
Everything is successful.
print(torch.__version__)
print(torch.cuda.get_arch_list())
# Check if CUDA is available
if torch.cuda.is_available():
print("CUDA is available.")
print("CUDA Version:", torch.version.cuda)
print("Number of GPUs:", torch.cuda.device_count())
print("Current CUDA Device Index:", torch.cuda.current_device())
print("Current CUDA Device Name:", torch.cuda.get_device_name(torch.cuda.current_device()))
else:
print("CUDA is not available.")
it returns:
1.9.0+cu111
['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37']
CUDA is available.
CUDA Version: 11.1
Number of GPUs: 1
Current CUDA Device Index: 0
Current CUDA Device Name: NVIDIA GeForce RTX 3070 Laptop GPU
However when I run my script, it throws this error:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
c:\SynologyDrive\Python\nlp\commonlit2\deberta\large\debertav3-baseline-origin - Copy.ipynb Cell 14 line 2
1 for target in ["content", "wording"]:
----> 2 train_by_fold(
3 train,
4 model_name=CFG.model_name,
5 save_each_model=False,
6 target=target,
7 learning_rate=CFG.learning_rate,
8 hidden_dropout_prob=CFG.hidden_dropout_prob,
9 attention_probs_dropout_prob=CFG.attention_probs_dropout_prob,
10 weight_decay=CFG.weight_decay,
11 num_train_epochs=CFG.num_train_epochs,
12 n_splits=CFG.n_splits,
13 batch_size=CFG.batch_size,
14 save_steps=CFG.save_steps,
15 max_length=CFG.max_length
16 )
19 train = validate(
20 train,
21 target=target,
(...)
26 max_length=CFG.max_length
27 )
...
1769 )
1770 AcceleratorState._reset_state(reset_partial_state=True)
1771 self.distributed_state = None
ImportError: Using the `Trainer` with `PyTorch` requires `accelerate>=0.20.1`: Please run `pip install transformers[torch]` or `pip install accelerate -U`
I then ofcourse installed the suggested “accelerate” library. First of all, it failed, and then the worst of all, it deleted my torch 1.9.0+cu111, and installed torch 2.1.0. No matter which command I used, pip install transformers[torch]
or pip install accelerate -U
, it erased my stable torch 1.9.0+cu111 in the end.
The error message also said “accelerate” needs at least torch 1.10. Then I tried to move up one step, and get the cuda version. But unfortunately there is no. There are lots of other combination, but seems nothing works for my 3070.
Does anyone know how to fix this? make the huggingface pretrained deBerta model train with RTX 3070 with CUDA on?
I have also tried to set deepspeed=None, but no use.
training_args = TrainingArguments(
output_dir=model_fold_dir,
load_best_model_at_end=True, # select best model
learning_rate=learning_rate,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=8,
num_train_epochs=num_train_epochs,
weight_decay=weight_decay,
report_to='none',
greater_is_better=False,
save_strategy="steps",
evaluation_strategy="steps",
eval_steps=save_steps,
save_steps=save_steps,
metric_for_best_model="rmse",
save_total_limit=1,
deepspeed=None # Add this line to remove the accelerate dependency
)
Many thanks in advance.