I am looking to see if I can add a stateful module inside of an adapter. The idea is that it would be treated like a LoRA (initialized to output zero and add it’s result to the output of the block along side the residual).
The problem I am having is maintaining/passing the stateful variables to and from these adapters on each execution. I have tried using PyTorch hooks and also some janky class wrapper functions (all of which have failed) but, at the end of the day, the most elegant solution would be to use the PEFT library given it’s close integration with HuggingFace model types. The main points I am curious about are:
- Can I input hidden states and somehow get the resulting hidden states from adapters that have statefulness (like the hidden states of an LSTM or state matrix of Mamba)
- Can I wrap entire blocks via keys like this (as in the entire
self_attn
ormlp
block of a Qwen model) - Can it all be done in the same way that other PEFT adapters work (the QoL of the library is pretty nice with module management and freezing the parent model)
I am on the verge of just overriding entire model source code but that feels like the nuclear option and I just am hoping there is a more elegant way that works within the HuggingFace ecosystem of libraries. Thanks!