Hi, I want to train a model using the Trainer. In the model_init function, I instantiate a model and perform heavy calculations for experimental weight initialization. These calculations should be performed on a cuda device which is why I instantiate my model on a cuda device by manually moving parameters onto cuda (not cuda:0 or cuda:1).
My system has two cuda cards which I both want to use. When I run the training (trainer.train()), I get the error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0.
I expected the trainer to take care of data parallelism. When I instantiate the model without moving parameters to cuda (which takes an eternity), everything works fine.
Could somebody please share any insights into how I can use cuda for model initialization without running into this error during training? Many thanks!