A new model for the Hub depending on another Hub model


I have properly prototyped a new architecture A that internally uses a frozen generative language model from the Hub, let’s call it B, and extensively builds on top of it. Subclassing B is not an option, as A may use virtually any generative language model.

I would like to make a few example pre-trained versions of A pushable to the Hub. For example, I would like to be able to have a pre-trained version of A depending on (frozen) B=OpenAI GPT, another version of A depending on B=Facebook OPT, etc…

The simplest thing to do would be to store an identifier of the dependency model B as a string and then load it from the Hub every time A is being loaded from its own pre-trained weights. But what if B is updated? My weights in A rely on the hidden_states coming from B and thus the (now old) weights of A would immediately become useless.

What I seem to need to be able to do is to push a particular version of B together with the weights of my A. This would somehow require nested configs and nested model weight loading.

Before I embark on trying to hack the heck out of the transformers library, any suggestions on how to approach this in a clean manner?