I searched around a lot of places to no avail, so I am just going to ask:
What am I supposed to do in HuggingFace, to get the equivalent of a KerasLayer wrapped around a summarizer model? My model is application-specific, it’s big, and it consists of several parts from HF ideally, and it has custom losses watching several multitasking model outputs. So backprop needs to flow through many (but not all) of my application model’s layers (where no backprop is needed I will indicate trainable=false), especially the summarizer. The summarizer needs to be trainable in this application.
I mean, I can do what is needed if I were to use strictly TF hub models (they have the keraslayer option) but HF is nice because it offers some more summarizers that I would prefer to use.
We all know TF is not the only supported backend in HF. But y’all have done a nice job of integrating PT and HF thus far, and others too (jax?)
In other words, the summarizer from HF is going to be participating as just one big layer within a larger model, such that, critically, backprop will happen in a context dependent manner.
This is in contrast to pretraining independently for fine tuning, which I will also be doing prior to training the final big model.