Does this intuition behind why an activation function is used in a neural network make sense mathematically :

For this example lets consider a fully connected (NOT CONVOLUTIONAL) network that classifies numbers 0-9. A network WITHOUT an activation function will be able to do well on numbers that are similar to the sizes of the numbers in the training set but it will struggle if numbers appear to be darker or lighter because it is linear and cannot take both size and lightness/darkness into account. In a neural network completing the same task but WITH activation functions will be able to take into account both orientation and lightness/darkness because the weights will learn all possible relationships between the pixel values in the data set and the sigmoid(or other activation function) will then take number that are slightly lighter or darker and transform/smooth them so that the network could output the same probability as if it were the normal darkness/lightness. Does this intuition sound about right or is it incorrect in some way ?

Here is another example of what I am trying to describe if this makes more sense: I understand it is a non linearity but maybe let me try and explain what I am thinking in a different way. In a network that is detecting numbers 0-9 the network has learned for example all the relationships between the pixels of a 7 of standard darkness in all sorts of positions. Then during test time a lighter 7 comes along with slightly lighter pixel values. In one of the neurons in the input layer after the lighter 7â€™s input has been multiplied by the weights and biases for that neuron when it is transformed by the sigmoid function it will have approximately the same value the darker one after passing through sigmoid in neuron one would be 0.994 and the lighter one 0.992 so then the lighter one will get treated the same through all later layer of the network and get classified correctly. Does that make sense ?