Cannot launch multi-gpu training?

I am trying to run training on multiple GPUs following this script here:

single gpu is fine. When I switch to multi-gpu I got:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! 

So I tried wrapping the dataloader in accelerate.prepare.
then I get:

AttributeError: 'DataLoaderShard' object has no attribute 'map'

What is the correct path forward?