TensorFlow trainer

OsOne · May 26, 2021, 8:12am

Hi HuggingFace Team

We are at the beginning of a new DL project. In this project, we work with transformers in TF2.
Before we start this we’d love to know what are your plans about creating a training framework for TF2.

In our search for answers, we came across a GitHub issue that mentioned that TFtrainer will be changed/removed.

github.com/huggingface/transformers

evaluation in TFTrainer does not run on GPU

opened 10:38PM - 04 May 21 UTC

closed 09:18PM - 10 Jun 21 UTC

rohanshingade

## Environment info  - `transformers` version: 4.5.1 - Platform: ubuntu 18.04 - Python version: 3.6.9 - PyTorch version (GPU?): - Tensorflow version (GPU?): tensorflow-gpu==2.4.1 - Using GPU in script?: Yes - Using distributed or parallel set-up in script?: No ### Who can help  @patil-suraj @Rocketknight1 ## Information I'm using TFT5ForConditionalGeneration for masked language modelling task. During training GPU utilisation is above 95% but as soon as evaluation starts it goes to 0%. Evaluation is slow. Even though [evaluate function is in strategy.scope()](https://github.com/huggingface/transformers/blob/c065025c4720a87783ca504a9018454893d00649/src/transformers/trainer_tf.py#L580). it does not use gpu. The problem arises when using: * [ ] the official example scripts: (give details below) I'm using the official example script of TFTrainer and modified `run_tf_glue.py` a bit for custom data input. The tasks I am working on is: * [ ] my own task or dataset: (give details below) Final train_dataset and eval_dataset (input to TFTrainer) have the form `({"input_ids": , "attention_mask": ,"decoder_attention_mask": }, labels)` ## To reproduce Steps to reproduce the behavior: I tried reproducing the error using run_tf_squad.py and run_tf_glue.py but both the scripts gave error as the inputs to the trainer were not compatible. Only MRPC task worked, but it had only 400 examples in evaluation so hard to determine. Rest of them simply didn't work, there was an error. If possible I would like to contribute to TFTrainer in terms of running evaluation on GPU and processing squad and glue dataset to match dimensions to TFTrainer inputs. Guidance is really appreciated.  ## Expected behavior

We were hoping you could shed more light on your plans for the integration of the transformers library with TF2. More concretely -
Do you intend to release a TF Trainer?
Will it be using Keras?
Any date expectations?

Thanks,

sgugger · May 26, 2021, 11:50am

Hi there!
Yes the TFTrainer will be deprecated and removed in v5, we will focus on better integrating with Keras (though the means of Keras callbacks if we need to add functionality). Checkout the new classification example for an example of where we are going.

Topic		Replies	Views
Finetuning GPT2 using Multiple GPU and Trainer 🤗Transformers	14	6791	May 22, 2023
Why GPU not be use in Evaluation of Trainer 🤗Transformers	0	96	May 28, 2024
HuggingFace Trainer() does nothing - only on Vertex AI workbench, works on colab 🤗Transformers	2	1865	September 5, 2022
Training using multiple GPUs Beginners	20	20107	February 25, 2024
GPT2 with TensorFlow? 🤗Transformers	1	372	November 14, 2020

TensorFlow trainer

Related topics