Minor Bug: HF (run_text_classification) attempts to use XLA on CUDA device

Neel-Gupta · July 17, 2021, 11:22am

A minor inconsistency: on a GPU runtime, when I execute:

!pip install -q cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl

to install the TPU client, HuggingFace will try to use XLA even if a CUDA device is present.

RuntimeError: tensorflow/compiler/xla/xla_client/computation_client.cc:273 : Missing XLA configuration

Shouldn’t there be checks to verify if XLA/TPU cores flag is not passed, it should fall back to CUDA->CPU rather than trying to run via XLA?

sgugger · July 17, 2021, 2:10pm

Good point, which flags were you thinking of? If you feel up to it, don’t hesitate to open a PR with those changes!

Neel-Gupta · July 17, 2021, 2:44pm

These were the flags in my mind:-

--xla [XLA]           Whether to activate the XLA compilation or not
--tpu_name TPU_NAME   Name of TPU
--tpu_zone TPU_ZONE   Zone of TPU
--tpu_num_cores TPU_NUM_CORES
                        TPU: Number of TPU cores (automatically passed by
                        launcher script)

which indicates TPU/XLA is being used.
I don’t think I am ready to open PR’s yet though I am happy to assist in other ways

sgugger · July 19, 2021, 9:24am

Ok, will add it then.

Topic		Replies	Views
Trainer.train stalls when using tutorial example on GCP instance Beginners	2	891	February 15, 2023
Pipeline device issue, torch_xla generation() bug, flax models malloc errors 🤗Transformers	0	173	April 21, 2024
Disable XLA for T5 fine tuning using Tensorflow on M1 Mac Models	6	1770	August 10, 2023
Set TPU device in Trainer Beginners	5	2612	October 15, 2024
Tutorials for using Colab TPUs with Huggingface Transformers? 🤗Transformers	16	20648	June 3, 2024

Minor Bug: HF (run_text_classification) attempts to use XLA on CUDA device

Related topics