Neural Network : ANN Cheatsheet

First we shall do with linear gradient function ReLU which is plain simple, just take the +ve value and discard the -ve. And then later show the changes needed for using Sigmoid() or anything similiar.

No math here. For proof of formula check here.

The Graph
Forward Propagation
Output Layer Error [last layer]
Back Propagation
Correction
With Sigmoid Function
Notes

The Graph

[a1] -- w1 -- [b1]
      \w2\ 
     /w3/ 
[a2] -- w4 -- [b2]

This can be made prettier later with a real image (in.). For now simply remember that a1, a2, b1, b2 are neurons. And w1, w2, w3, w4 are weights connecting each neuron on one layer to the other.

Forward Propagation

output[b1]=output[a1] * w1 + output[a2] * w3 ... + bias[b1]
output[b1]=0 if the above is < 0

Neuron's output is calculated as : weight of the connectors on its left multiplied by the output of the neuron on the other end of the connector.

Neuron's bias will be used to correct the same neuron's output. Output will be written with bias correction already applied.

Output Layer Error [last layer]

delta[b1]=expected[b1]-output[b1]
delta[b1]=0 if output[b1]<0

Delta of neurons indicates the amount its output is higher than the correct value.
.
If you change the order of subtraction as output-expected then the the sign at the correction step bellow should also be changed from += to negative -= too.

Back Propagation

delta[a1]=delta[b1]*w1+delta[b2]*w3
delta[a1]=0 if output[a1]<=0

Neuron's delta is equal to weight of all the connectors going outside multiplied by delta of the node at the end of each of those connector.

Correction

w1+=output[a1]*delta[b1]*increment
bias[a1]+=delta[a1]*increment

Each connector's correction in weight equals output on the left neuron of that connector multiplied with delta value of neuron on the connector's right side.

Bias correction is each neuron's own delta value.

Increment step can be as low as 0.1 or as high as 0.5. Best, use 0.1 to 0.2.

With Sigmoid Function

Basically apply sigmoid() before writing output in any cell. Sigmoid will be based on that cell's output value to be written.

And during back propagation apply dSigmoid() before writing the delta value. And dSigmoid will also be calculated on that cell's output value as before. Not based on the difference.

Forward Propagation

output[b1]=output[a1] * w1 + output[a2] * w3 ... + bias[b1]
output[b1]=1/(1+exp(-output[b1]))

For Last Layer Error

delta[b1]=expected[b1]-output[b1]
delta[b1]*=output[b1]/(1-output[b1])

For Back Propagation

delta[a1]=delta[b1]*w1+delta[b2]*w3
delta[a1]*=output[a1]/(1-output[a1])

For Correction

w1+=output[a1]*delta[b1]*increment
bias[a1]+=delta[a1]*increment

As before. Nothing changes.

Notes

ReLU is better than Sigmoid() or Tanh() for almost everything. Especially for NLP. Therefore don't care about the other activation functions even though most of the examples online use those. Those other functions provide added complexity, higher computation but output worse result than ReLU.

Published

15-Sep-2022

Updated

23-Sep-2022