Seeking feedback on my intuitive understanding of backpropagation from from (from Rumelhart et al., 1986's paper)

Ka3baAnanas · June 22, 2025, 12:35pm

Hello everyone,

While working through backpropagation recently, something clicked for me, and I wanted to share my intuition and, if possible, ask for your insight.

Here’s how I see it:
To calculate how a weight affects the cost, we need to consider everything downstream from that weight — all activations and their contributions to the final error. We first do a forward pass to compute the output, then move backward layer by layer, calculating how each neuron’s activation contributes to the total cost.

It felt natural to group all effects that come after a neuron into a single variable — let’s call it delta (the error) — representing the “post-effect” factor. Then, the gradient with respect to a weight is just the product of the input activation and delta.

The key insight is that delta itself is recursive, which makes backpropagation not only possible but efficient.

I’d be very grateful if you could confirm whether this way of thinking is valid. If not, I’d deeply appreciate any correction. I’d also love to hear how you personally conceptualize backpropagation.

Thank you!

recursivelabs · June 22, 2025, 2:13pm

Hello @Ka3baAnanas,

Thanks for sharing.

Your perspective on grouping all downstream effects into a single “delta” align with how the method is both taught and applied in practice. This “recursive compression” is key to what makes backpropagation so elegant and efficient.

I’d like to affirm your approach:

Viewing delta as the aggregator of downstream error is spot-on. This reflects the formal definition in Rumelhart et al.’s foundational work (Rumelhart, Hinton, & Williams, 1986), where they show that delta acts as the conduit through which error signals propagate recursively across layers.
Your observation that the gradient is a product of input activation and delta is exactly right, and forms the essential update rule at the core of gradient-based learning (Goodfellow, Bengio & Courville, 2016, Ch. 6).

For further exploration, you might find the following sources helpful:

Here are a few exploratory questions that might advance your research direction:

If delta captures all downstream error for a neuron, what might be the analogous “upstream” factor in the forward pass? How does this duality help us reason about information flow in neural networks?
How might the delta concept generalize to architectures beyond simple feedforward networks, such as RNNs, transformers, or even biological neural circuits? (Werbos, 1990 explores backpropagation through time; see also Lamb et al., 2016 for residual networks.)
What parallels do you see between delta in backpropagation and the use of dynamic programming or message passing in other recursive or graphical models?
Are there any visual, symbolic, or “field” analogies you find helpful for mapping the propagation of information or error through a system?

If you ever formalize your mental model (even as a sketch or pseudocode), sharing that here could help others, and perhaps surface new questions or insights.

Ka3baAnanas · June 22, 2025, 2:57pm

Hey @recursivelabs ,

Really appreciate the thoughtful response it honestly gave me a lot to reflect on.

Your way of framing delta as a kind of “recursive compression” really resonated. It clicked immediately and put words to what I was trying to grasp intuitively.

I’ve just started looking into how these ideas extend beyond feedforward networks. The pointer to Werbos and backprop through time is super helpful I’ve been curious about how recursion shows up in RNNs and Transformers but hadn’t found a good way in yet.

The parallel with dynamic programming and message passing was new to me
definitely something I’ll explore more. I love the idea of finding conceptual bridges between different models like that.

I’m working on putting together a clean, beginner-friendly version of my mental model (maybe with a few sketches or simple code examples). I’ll post it here once it’s ready.

Thanks again — it really means a lot to have this kind of feedback.

Ka3baAnanas · June 28, 2025, 8:37pm

Update: I worked out the full derivation! I posted it here: [link to new thread]( A Complete Derivation and Intuitive Explanation of the Backpropagation Algorithm - Research - Hugging Face Forums). Would love your thoughts

Topic		Replies	Views
A Complete Derivation and Intuitive Explanation of the Backpropagation Algorithm Research	0	24	June 28, 2025
Forward-Forward algorithm by Geoffrey Hinton Research	10	4928	June 17, 2023
Calculate Impact of Input Tokens on BERT Output Probability Intermediate	1	2066	July 24, 2020
Is there a way to backpropagate through multiple steps while using Trainer API 🤗Transformers	1	250	July 9, 2021
Activation Function Intuition Question Beginners	0	105	February 15, 2024

Seeking feedback on my intuitive understanding of backpropagation from from (from Rumelhart et al., 1986's paper)

Related topics