Thanks in advance for helping! First I just have to say, as my first post here, Huggingface is awesome. We have been using the tools/libraries for a while for NLP work and it is just a pleasure to use and so applicable to real-world problems!!!
We are “graduating” if you will from single GPU to multi-GPU models/datasets. Looking through different platforms / libraries to do this. I think overall for our applications, we don’t need to customize our training loops - the Huggingface Trainer is our bread and butter (with the exception of ~5% of applications where we do our own Pytorch training loops). So my question is - for the Huggingface trainer - is there some boilerplate code that works using torch.distributed? I understand that - at a basic level - torch.distributed launches the same code a bunch of times and you need to know which process instance you are running. My understanding is the trainer itself handles all of this for you - but what about when you instantiate the model? When you instantiate your datasets? Etc. What is the bare minimum you need to do to get a Trainer working in a torch.distributed environment? The examples I have found thusfar are pretty heavy - contain a lot of code to parse your arguments, etc. (Don’t get me wrong - they are awesome and well documented - again this is all kind of too good to be true).
What we’re trying to do is a large ViT to Text model based on this:
Trying to use the Seq2Seq trainer - but using multi-node, multi-GPU for a very large dataset and much higher resolution images.
Any pointers to some simple examples would be much appreciated!!