Getting error while fine tuning Deberta v3 Large

NDugar · November 9, 2021, 5:14am

I have been trying to fine tune the model using the instructions given in - microsoft/deberta-v3-large · Hugging Face

but I am getting

ImportError: This example requires a source install from HuggingFace Transformers (see https://huggingface.co/transformers/installation.html#installing-from-source), but the version found is 4.11.3.

so I cloned the transformers repo on my device and now I am getting an error saying it can’t run the run_glue.py.

What am I doing incorrectly?

Thank you.

NDugar · November 9, 2021, 5:36am

full error looks like

FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  FutureWarning,
/usr/bin/python3: can't open file ' run_glue.py': [Errno 2] No such file or directory
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 6570) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/nikhil/.local/lib/python3.6/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/home/nikhil/.local/lib/python3.6/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/home/nikhil/.local/lib/python3.6/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home/nikhil/.local/lib/python3.6/site-packages/torch/distributed/run.py", line 713, in run
    )(*cmd_args)
  File "/home/nikhil/.local/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/nikhil/.local/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
 run_glue.py FAILED
------------------------------------------------------------

and although it says no such file found, the run_glue.py file is there in the correct folder

nielsr · November 9, 2021, 11:32am

Can you post the command you used?

Typically, if you just run python run_glue.py, then you must be in the directory of the run_glue.py script, otherwise it won’t find it.

You can of course also do: python transformers/examples/pytorch/text-classification/run_glue.py, if you run from the root of the Transformers repo.

NDugar · November 9, 2021, 11:45am

I ran:-

python -m torch.distributed.launch --nproc_per_node=${num_gpus} \
  run_glue.py \
  --model_name_or_path microsoft/deberta-v3-large \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --evaluation_strategy steps \
  --max_seq_length 256 \
  --warmup_steps 50 \
  --per_device_train_batch_size ${batch_size} \
  --learning_rate 6e-6 \
  --num_train_epochs 2 \
  --output_dir $output_dir \
  --overwrite_output_dir \
  --logging_steps 1000 \
  --logging_dir $output_dir

in the commands you are suggesting, how do input which model I want to train?

nielsr · November 9, 2021, 11:52am

This line determines which model you’d like to fine-tune. It can be a model name from one of the models on the hub, or a path to a local folder.

However, as you’re getting a “/usr/bin/python3: can’t open file ’ run_glue.py’: [Errno 2] No such file or directory”, this means you are probably running the script from a directory outside the “examples/pytorch/text-classification” directory of Transformers.

NDugar · November 9, 2021, 11:53am

If I run just python3 run_glue.py then I get this

python3 run_glue.py
Traceback (most recent call last):
  File "run_glue.py", line 50, in <module>
    check_min_version("4.13.0.dev0")
  File "/home/nikhil/.local/lib/python3.6/site-packages/transformers/utils/__init__.py", line 35, in check_min_version
    "Check out https://huggingface.co/transformers/examples.html for the examples corresponding to other "
ImportError: This example requires a source install from HuggingFace Transformers (see `https://huggingface.co/transformers/installation.html#installing-from-source`), but the version found is 4.11.3.
Check out https://huggingface.co/transformers/examples.html for the examples corresponding to other versions of HuggingFace Transformers.

nielsr · November 9, 2021, 2:06pm

As explained by the error, you need to install Transformers from source.

What I usually do is using the following command:

!rm -r transformers
!git clone https://github.com/huggingface/transformers.git
!cd transformers
!pip install -q ./transformers

NDugar · November 9, 2021, 8:52pm

same error no difference

nielsr · November 10, 2021, 9:32am

I’ve created a notebook for you: Google Colab

NDugar · November 10, 2021, 9:57am

Thank you so much. Not getting an error on this one

NDugar · November 10, 2021, 1:29pm

after training where is the trained model saved? I am only seeing these files and not the model.

nielsr · November 10, 2021, 3:19pm

Everything is stored in the --output_dir you specified.

NDugar · November 10, 2021, 3:31pm

the image is that of the output dir

Topic		Replies	Views
Error while finding module specification for 'run_glue.py' Amazon SageMaker	7	5253	November 18, 2021
DDP + Compile + Torch Dynamo + Huggingface Trainer 🤗Transformers	0	90	August 28, 2024
Can someone help guide how to finetune DeBERTa V3 model? Models	1	1199	August 25, 2024
Data type error while trying to fine tune Deberta v3 Large Models	13	2153	November 19, 2021
How to save hugging face fine tuned model using pytorch and distributed training 🤗Transformers	0	1293	April 12, 2022

Getting error while fine tuning Deberta v3 Large

Related topics