Pipeline device issue, torch_xla generation() bug, flax models malloc errors

vivek9840 · April 21, 2024, 7:17pm

I am inferring llms for certain text generation tasks with tpu. i originally wrote code using transformers pipeline with device_map set to “auto”. but it never pick up the tpu device. then i give it as xm.xla_device() as parameters but after doing that system, automatically crashes.

after that i wrote code for generation. the models loaded on device but the generation was happening on cpu. this problem persist for other users too. referring to this

Generate text with `model.generate` on TPU does not work · Issue #12322 · huggingface/transformers · GitHub .

then i tried to change torch model to flax. i used FlaxAutoModelForCausalLM for converting pytorch model to flax model. since flax version where not available. but while converting torch model to flax model. i got error for XLA buffer. which stated i had 29.7 MB space and i need 32.0 MB space for buffer. i tried to change environment variable for it but the buffer size did not changed.

now i am currently trying to use jax models which where re-created by this person from pytoch to jax.

GitHub - ayaka14732/llama-2-jax: JAX implementation of the Llama 2 model .

thank you.

Topic		Replies	Views
LLM ingores max_memory in inference Models	0	130	June 20, 2024
Attempt to generate Text, but its to slow Beginners	0	156	July 25, 2024
`text-generation` `Pipeline` prohibitively slow to load, even with cached model 🤗Transformers	1	4376	May 23, 2023
Load a large model to multipe, specific GPUs (without CUDA_VISIBLE_DEVICES) 🤗Transformers	0	163	November 22, 2024
Should we optimize the logic for enabling TorchXLA in a GPU environment 🤗Accelerate	3	420	October 27, 2023

Pipeline device issue, torch_xla generation() bug, flax models malloc errors

Related topics