Jointly train two-stage models using Trainer

Hi all,

I want to train a two-stage model containing model1 and model2, in which model2 takes model1’s result as input. For now, I could train model2 properly through Huggingface Seq2SeqTrainer, but I have no clue of how to jointly train model1 and model2 through the Huggingface Trainer. Could someone give me some advice? Thank you very much

I don’t think this is possible. For suc a use case, you should definitely check out Accelerate and use your custom training loop.

Shouldn’t a custom model work with the trainer? a simple model that inherits PreTrainedModel and contains the two models with a custom forward method.

I’m new and researching the possibility of doing customized models within the Huggingface ecosystem. Am I wasting my time?

Example:
https://stackoverflow.com/questions/70814490/uploading-models-with-custom-forward-functions-to-the-huggingface-model-hub

I have a similar case as well—a model with two towers. Basically, two models were initialized with the from_pretrained method.

I can confirm that the trainer fails when using multiple GPUs.

If we want to do that, we need to follow the CLIP implementation, where re-inventing the wheel.

@sgugger So, do you suggest doing this stuff with the accelerator?