A new model for the Hub depending on another Hub model

pbelcak · April 6, 2023, 2:03pm

Hi,

I have properly prototyped a new architecture A that internally uses a frozen generative language model from the Hub, let’s call it B, and extensively builds on top of it. Subclassing B is not an option, as A may use virtually any generative language model.

I would like to make a few example pre-trained versions of A pushable to the Hub. For example, I would like to be able to have a pre-trained version of A depending on (frozen) B=OpenAI GPT, another version of A depending on B=Facebook OPT, etc…

The simplest thing to do would be to store an identifier of the dependency model B as a string and then load it from the Hub every time A is being loaded from its own pre-trained weights. But what if B is updated? My weights in A rely on the hidden_states coming from B and thus the (now old) weights of A would immediately become useless.

What I seem to need to be able to do is to push a particular version of B together with the weights of my A. This would somehow require nested configs and nested model weight loading.

Before I embark on trying to hack the heck out of the transformers library, any suggestions on how to approach this in a clean manner?

Topic		Replies	Views
Pushing Model through CLI Beginners	0	291	August 16, 2023
Availability of models pushed to Hub 🤗Hub	2	985	September 22, 2021
Upload a conversational model trained with gpt-j and other models to the hub Beginners	1	222	January 31, 2023
How to upload and load multiple model from hub 🤗Hub	0	160	August 8, 2024
Fastest way to upload custom TensorFlow model/weights? Beginners	2	456	April 21, 2022

A new model for the Hub depending on another Hub model

Related topics