A potential method to add emotional implicit memory and explicit memory to transformers

oesb · March 29, 2024, 10:03pm

I have been playing with a method that bolts onto the attention block of GPT-2 and allows a persistent external memory to be added to a fraction of specified layers tokens-heads. When activated on certain layers with a positive memory it appears to perform as a crude emotional implicit memory. For recall it only appears to require a fraction of one layer (5 out of 2160 total token-heads on one example) and appears to be context length independent. More details are available in the write-up I put together on github, but a more capable model than GPT-2 is necessary to give a clearer answer as to how well this method actually works.

As this is a bolt-on method, in theory, it shouldn’t be a complex task to adapt it to more capable models with possible caveats such as how positional encoding is done. Needless to say, I am quite curious how this method would work on a more capable model and if someone is curious enough to try it out on a more capable model, I would love to hear about it.

Github page:
https://github.com/MTMTransformer/MTMTransformer

Topic		Replies	Views
Resources for model design (number of layers, attention heads, etc) Beginners	2	609	January 4, 2021
How to add a pointer network on top of GPT2? Models	0	736	January 22, 2022
Understanding attention output from generate method in GPT model Beginners	0	615	November 8, 2023
Conceptual questions about transformers 🤗Transformers	10	1083	August 26, 2021
Adding custom layer to GPT-2 Models	0	458	September 27, 2022

A potential method to add emotional implicit memory and explicit memory to transformers

Related topics