Visualizing neural networks with matrix heatmaps

Neural networks are a mystery to most of us and this post will try to address that fact by visualizing how the inner structures change during supervised learning using matrix heatmaps.

But I just want the code? No problem.

Activation function

Let us visualize a 5×5 matrix ∈ [-1, 1) and see how it responds to a few popular activation functions. A blue coloured cell indicates a negative value and a red coloured cell indicates a positive value. The darker the colour, the higher the value. A transparent cell indicates a value close to zero.

An important characteristic of activation functions is that they are differentiable. Below we apply the derivative of the corresponding activation function to the input matrix.

Relevant: https://en.wikipedia.org/wiki/Activation_function

Regression

We will now model regression by backpropagate errors into an intermediary matrix called syn0. To make it a bit easier to evaluate performance, I set our desired output = -input, so that the algebraic solution to the matrix equation input = syn0 * output conveniently becomes syn0 = -I, where I is the identity matrix.

Let's see how our model performs (they are capped at 100 iterations and a learning rate of α = 0.1):

Pretty cool! Most of the time it converges to the correct solution for syn0, but even if it doesn't it will predict the output almost correctly.

Adding hidden layers

The fun part begins! The above regression model can actually be referred to as a neural network without hidden layers.

Let's try extending the above tanh-regression with a hidden tanh layer using the same learning rate as before:

As you might have figured out by now, only the identity and tanh activation functions work with negative-valued output. However, it's enough to use these activations at the last layer. Let's try another one:

Experiment on your own