I would like to initiate a discussion on the recent publication by Geoffrey Hinton proposing an alternative to the traditional backpropagation algorithm - The Forward-Forward Algorithm: Some Preliminary Investigations and the paper by Alexander Ororbia and Ankur Mali - The Predictive Forward-Forward Algorithm which suggests incorporating a generative circuit into the original FF network.
I am interested in hearing the thoughts and insights of the community on these papers. I am particularly interested in discussing the potential benefits of layer level weights update in the Forward-Forward algorithm as it could potentially allow for training a network layer by layer without the need for a huge amount of VRAM.
I am attempting to build a mini-GPT version using the Forward-forward idea.
I cant find much of anything using it in generative language models, or any example of the NLP benchmark referenced in the Hinton paper.
if anyone has any thoughts or repos to provide that type of Implementing of the Forward-Forward Algorithm it would be very helpful.
best so far is a few not working repos:
nebuly-ai: nebullvm/apps/accelerate/forward_forward at 5fb48f6cda4d2ab756f20a91eea7b482f38ca50f · nebuly-ai/nebullvm · GitHub
and kyleliang919: GitHub - kyleliang919/forward_forward_gpt: Using the forward forward algorithm to train large language model
The implementation of the predictive forward-forward algorithm has been released publicly:
has anyone tried to train the famous deep spiking neural networks using forward-forward ?
Yes, there was work that came out about a month or so ago that proposed a generalization of forward-forward (and predictive forward-forward) for (deep) spiking networks - this was called the event-driven forward-forward algorithm (as they had to craft a formulation that worked with spikes themselves):
An implementation which is more native to pytorch
I think the idea of high layer-activations only for the positive data, interesting. The network essentially isn’t giving an Output like in backpropagation, but it’s now the Property of the network to “light up” for correct labels, and therefore indicating whether it’s a positive data or not. I enjoyed this interview given by Hinton about his paper.
Find my notebook implementation based on the work of Mohammad Pezeshki. It’s modular so you can experiment with different candidates for goodness functions, layerwise loss functions and negative data generation.
I am finding it difficult to implement FF algorithm to convnets. I suspect that it might be due to the label information overlayed on the input getting diffused so much. Could someone guide me on this? My attempt is uploaded to my repo in the previous response. Thanks!