ValueError: Mixed precision training with AMP or APEX (`--fp16`) and FP16 evaluation can only be used on CUDA devices

I am trying to tune Wav2Vec2 Model with a dataset on my local device using my CPU (I don’t have a GPU or Google Colab pro), I am using this as my reference. When I try to execute

from transformers import TrainingArguments

training_args = TrainingArguments(
  # output_dir="/content/gdrive/MyDrive/wav2vec2-base-timit-demo",
  output_dir="./wav2vec2-medical",
  group_by_length=True,
  per_device_train_batch_size=32,
  evaluation_strategy="steps",
  num_train_epochs=30,
  fp16=True,
  save_steps=500,
  eval_steps=500,
  logging_steps=500,
  learning_rate=1e-4,
  weight_decay=0.005,
  warmup_steps=1000,
  save_total_limit=2,
)

I am getting following error:


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-f9014a6221db> in <module>
      1 from transformers import TrainingArguments
      2 
----> 3 training_args = TrainingArguments(
      4   # output_dir="/content/gdrive/MyDrive/wav2vec2-base-timit-demo",
      5   output_dir="./wav2vec2-medical",

~/Library/Python/3.8/lib/python/site-packages/transformers/training_args.py in __init__(self, output_dir, overwrite_output_dir, do_train, do_eval, do_predict, evaluation_strategy, prediction_loss_only, per_device_train_batch_size, per_device_eval_batch_size, per_gpu_train_batch_size, per_gpu_eval_batch_size, gradient_accumulation_steps, eval_accumulation_steps, learning_rate, weight_decay, adam_beta1, adam_beta2, adam_epsilon, max_grad_norm, num_train_epochs, max_steps, lr_scheduler_type, warmup_ratio, warmup_steps, logging_dir, logging_strategy, logging_first_step, logging_steps, save_strategy, save_steps, save_total_limit, no_cuda, seed, fp16, fp16_opt_level, fp16_backend, fp16_full_eval, local_rank, tpu_num_cores, tpu_metrics_debug, debug, dataloader_drop_last, eval_steps, dataloader_num_workers, past_index, run_name, disable_tqdm, remove_unused_columns, label_names, load_best_model_at_end, metric_for_best_model, greater_is_better, ignore_data_skip, sharded_ddp, deepspeed, label_smoothing_factor, adafactor, group_by_length, length_column_name, report_to, ddp_find_unused_parameters, dataloader_pin_memory, skip_memory_metrics, use_legacy_prediction_loop, push_to_hub, resume_from_checkpoint, mp_parameters)

~/Library/Python/3.8/lib/python/site-packages/transformers/training_args.py in __post_init__(self)
    609 
    610         if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
--> 611             raise ValueError(
    612                 "Mixed precision training with AMP or APEX (`--fp16`) and FP16 evaluation can only be used on CUDA devices."
    613             )

ValueError: Mixed precision training with AMP or APEX (`--fp16`) and FP16 evaluation can only be used on CUDA devices.

I understand that the error is because I am not using a GPU, as mixed precision can not be carried out without a GPU. I want to run it on my CPU, how can I resolve the error

You should remove fp16=True then.

7 Likes

Issue solved, thank you for replying

how?? Have you removed fp16=True ?? or you have made some other changes?

I change the fp16 parameter to become False, but the consequence is the training process took much longer time…

1 Like

Me too. Is there any other way?

It’s settled. It’s because I’m not in a cuda environment. You need to execute the command conda activate ‘your environment’

I got the same error running on a Macbook Pro with M1 CPU. Here the device is “mps”, not “cuda”. I presume that’s the root cause for the error. The device is correctly set up, but doesn’t seem to be supported by PyTorch, yet. There is a discussion around this here:

Not sure how to fix the problem, though. Setting both fp16 and fp16_full_eval=False didn’t help

yes working now, thank you