Neural networks are a mystery to most of us and this post will try to address that fact by visualizing how the inner structures change during supervised learning using matrix heatmaps.
But I just want the code? No problem.
Let us visualize a 5×5 matrix ∈ [-1, 1) and see how it responds to a few popular activation functions. A blue coloured cell indicates a negative value and a red coloured cell indicates a positive value. The darker the colour, the higher the value. A transparent cell indicates a value close to zero.
An important characteristic of activation functions is that they are
differentiable. Below we apply the derivative of the corresponding
activation function to the input matrix.
Relevant: https://en.wikipedia.org/wiki/Activation_function
We will now model regression
by backpropagate errors into an intermediary matrix called syn0.
To make it a bit easier to evaluate performance, I set our desired
output = -input, so that the algebraic solution
to the matrix equation input = syn0 * output conveniently
becomes syn0 = -I, where I is
the identity matrix.
Let's see how our model performs (they are capped at 100 iterations and a learning rate of α = 0.1):
Pretty cool! Most of the time it converges to the correct solution for
syn0, but even if it doesn't it will predict the output
almost correctly.
The fun part begins! The above regression model can actually be referred to as a neural network without hidden layers.
Let's try extending the above tanh-regression with a hidden tanh layer using the same learning rate as before:
As you might have figured out by now, only the identity and tanh
activation functions work with negative-valued output. However, it's enough to
use these activations at the last layer. Let's try another one: