Running BigBird on TPUs

This is in continuation of my previous issue filed in this forum (here).

I wanted to try and run the run_mlm_flax script with a Colab TPU v2, but I am facing an error:-

Module Name: <module 'run_mlm_flax' from '/content/run_mlm_flax.py'>

WARNING:root:TPU has started up successfully with version pytorch-1.9
Traceback (most recent call last):
  File "xla_spawn.py", line 87, in <module>
    main()
  File "xla_spawn.py", line 83, in main
    xmp.spawn(mod._mp_fn, args=(), nprocs=args.num_cores)
AttributeError: module 'run_mlm_flax' has no attribute '_mp_fn'

To my understanding, the script for spawning processes in each TPU core required a Huggingface Trainer object to work with. However, we apparently do not have a Trainer script for MLM/CLM for Bigbird. Compounding the difficulties is the whole Flax-noTrainer-TPU combination which I do not think BigBird has been tested out fully.

Is there any possible approach to run this on TPUs?

Hey, I think you are mixing flax & pytorch scripts. how can you get pytorch error in flax script otherwise?

FlaxBigBird works perfectly on cloud TPU. You can refer this script also.

Also Currently :hugs:Trainer is for only PyTorch & Tensorflow.

sigh, it’s the third time that I have had to change my scripts again. Why is that with HF half the time you have to change your entire code? The Trainer API was a gamechanger, until it doesn’t work with some models :disappointed:
Why do we need to actually use the Flax version of BigBird to run it on TPUs?

I am not asking you give me a 2-line code solution, but the least HuggingFace could strive to do is to not be sooo fragmented as it is now. At this point, Transformers has become so bloated with different tasks, changing APIs, outdated notebooks that its a nightmare (I honestly can’t even comphrend how Open source devs even contribute here).

Honestly, it would rather be easier if it be married to a single framework like pytorch - allowing simple DataLoaders to interface directly with Huggingface Models and be trained easily in a pytorch-style manner with XLA spawn. I think Huggingface tries to take control of too many parts of the expereince - Datasets for making datasets, and it doesn’t always mesh well.

Or perhaps I am just too dumb, and probably unfair to expect pytorch like simplicity from such a huge lib.

As I have said in the other topic, you don’t. Why are you using an example located in examples/flax/ if you don’t want to use FLAX?

There is really no need for the rant after, it’s completely inappropriate when people are replying to you for free on their own time.

1 Like

It was Patrick Von Platen’s suggestion and PR merge for using the Flax version with BigBird. I asked the same thing, but I didn’t get a reply from him as to what was the reason.

Honestly, if you really think that asking basic questions about the direction of HF is a rant then that clearly shows the attitude of the company as a whole.