Neural networks are a mystery to most of us and this post will try to address that fact by visualizing how the inner structures change during supervised learning using matrix heatmaps.
But I just want the code? No problem.
Let us visualize a 5×5 matrix ∈ [-1, 1) and see how it responds to a few popular activation functions. A blue coloured cell indicates a negative value and a red coloured cell indicates a positive value. The darker the colour, the higher the value. A transparent cell indicates a value close to zero.
An important characteristic of activation functions is that they are
differentiable. Below we apply the derivative of the corresponding
activation function to the input
matrix.
Relevant: https://en.wikipedia.org/wiki/Activation_function
We will now model regression
by backpropagate errors into an intermediary matrix called syn0
.
To make it a bit easier to evaluate performance, I set our desired
output = -input
, so that the algebraic solution
to the matrix equation input = syn0 * output
conveniently
becomes syn0 = -I
, where I
is
the identity matrix.
Let's see how our model performs (they are capped at 100 iterations and a learning rate of α = 0.1):
Pretty cool! Most of the time it converges to the correct solution for
syn0
, but even if it doesn't it will predict the output
almost correctly.
The fun part begins! The above regression model can actually be referred to as a neural network without hidden layers.
Let's try extending the above tanh-regression with a hidden tanh layer using the same learning rate as before:
As you might have figured out by now, only the identity
and tanh
activation functions work with negative-valued output. However, it's enough to
use these activations at the last layer. Let's try another one: