This code is associated with my Beginning Deep Learning - Understanding Back Propagation
No effort is made to do anything special here. I just wanted to understand the basics
and thus have kept it as simple enough for me to understand.
I plan to follow another tutorial which creates a neural network from scratch
now that I have a better understanding of the back propagation process. Consider
this version 1.
Author: Nik Alleyne
Author Blog: www.securitynik.com
Blog Article associated with this code:
For this I am giving the inputs as 2 and 9. The target value expected to be
output is 92% (0.92)
1
2 | # Import modules
import numpy as np
|
1
2
3
4
5
6
7
8
9 | '''
Define the sigmoid activation function
Let's also round the value by 4 decimals
References:
https://en.wikipedia.org/wiki/Sigmoid_function
'''
def my_sigmoid(dot_product_of_neuron):
print('Applying the sigmoid activation functiion to the weighted sum / dot product ...')
return round(1 / (1 + np.exp(-dot_product_of_neuron)), 4)
|
1
2
3
4
5 | # Define my Cost function MSE
# (y_true - y_predicted) ** 2
def my_mse(y_true, y_predicted):
print('Calculating the cost using MSE ...')
return round((y_true - y_predicted) ** 2, 4)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 | # Define our target value
y_true_output = 0.92
# Define the two inputs (features)
input_layer_features = np.array([2, 9])
# Define the weights from input to hidden layer
weights_input_hidden_neuron0 = np.array([0.15, 0.23])
weights_input_hidden_neuron1 = np.array([0.5, 0.8])
weights_input_hidden_neuron2 = np.array([0.05, -0.05])
# Define the weight from hidden to output
weights_hidden_output = np.array([0.9, -0.5, 0.08])
# Define the bias for the hidden layer neurons
hidden_layer_bias_neuron_0 = 0.5
hidden_layer_bias_neuron_1 = 0.5
hidden_layer_bias_neuron_2 = 0.5
# Define the bias for the output layer neuron
output_layer_bias_neuron_3 = 0.2
|
1
2
3
4
5
6
7
8
9
10 | '''
We can use this calculation to get our values for the first neuron.
We can also use the same concept to get the other neurons.
However, we are better off taking advantage of matrix multiplication
and obtaining the dot product
z0 = (x0 * w0) + (x1 * w1) + b
z0 = (2 * 0.15) + (9 * 0.23) + 0.5
= 0.3 + 2.07 + 0.5
= 2.87
'''
|
1
2
3
4
5
6 | # Calculating the dot product of z1 - the first neuron
# Round it down to 4 decimals
dot_product_hidden_neuron_z0 = round(np.dot(input_layer_features, weights_input_hidden_neuron0) + hidden_layer_bias_neuron_0, 4)
# Print the value of z0
dot_product_hidden_neuron_z0
|
1
2
3 | # Calculate the value of O0 - Hidden Layer Neuron 0 value after the activation function has been applied
sigmoid_of_hidden_neuron_O0 = my_sigmoid(dot_product_hidden_neuron_z0)
sigmoid_of_hidden_neuron_O0
|
1
2
3
4
5
6 | # Calculating the dot product of z1 - the second neuron
# Round it down to 4 decimals
dot_product_hidden_neuron_z1 = round(np.dot(input_layer_features, weights_input_hidden_neuron1) + hidden_layer_bias_neuron_1, 4)
# Print the value of z0
dot_product_hidden_neuron_z1
|
1
2
3 | # Calculate the value of O1 - Hidden Layer Neuron 1 value after the activation function has been applied
sigmoid_of_hidden_neuron_O1 = my_sigmoid(dot_product_hidden_neuron_z1)
sigmoid_of_hidden_neuron_O1
|
1
2
3
4 | # Calculating the dot product of z2 - the third neuron
# Round it down to 4 decimals
dot_product_hidden_neuron_z2 = round(np.dot(input_layer_features, weights_input_hidden_neuron2) + hidden_layer_bias_neuron_2, 4)
dot_product_hidden_neuron_z2
|
1
2
3 | # Calculate the value of O2 - Hidden Layer Neuron 2 value after the activation function has been applied
sigmoid_of_hidden_neuron_O2 = my_sigmoid(dot_product_hidden_neuron_z2)
sigmoid_of_hidden_neuron_O2
|
1
2
3
4 | # Create a new array, using the outputs from the hidden layer
# This array
outputs_from_hidden = np.array([sigmoid_of_hidden_neuron_O0, sigmoid_of_hidden_neuron_O1, sigmoid_of_hidden_neuron_O2])
outputs_from_hidden
|
1
2
3
4 | # Moving on the output layer.
# Calculating the dot product (z3) by using the output from hidden layer
dot_product_output_neuron_z3 = round(np.dot(outputs_from_hidden , weights_hidden_output) + output_layer_bias_neuron_3, 4)
dot_product_output_neuron_z3
|
1
2
3 | # Apply the activation function to z3
sigmoid_of_output_neuron_O3 = my_sigmoid(dot_product_output_neuron_z3)
sigmoid_of_output_neuron_O3
|
1
2
3 | # To get the above, to percentage, we simply multiply by 100
sigmoid_of_output_neuron_O3_percent = sigmoid_of_output_neuron_O3 * 100
sigmoid_of_output_neuron_O3_percent
|
1
2
3
4 | # At thisi point, our target output was 92 but the predicted output 64.45.
# Time to calculate or loss
current_loss = my_mse(y_true_output, sigmoid_of_output_neuron_O3)
current_loss
|
1
2
3
4
5
6
7 | # Finding the partial derivative of the cost as it relates to predicted value (O3)
# Staring off by finding dCost/dz3
# dCost/dz3 = dCost/dO3 * dO3/dz3
# First, step 1. Find dCost/dO3
dCost_dO3 = round(sigmoid_of_output_neuron_O3 - y_true_output, 4)
dCost_dO3
|
1
2
3
4
5 | # Now finding the dO3/dz3
# dCost/dz3
# Step 2
dO3_dz3 = round(sigmoid_of_output_neuron_O3 * (1 - sigmoid_of_output_neuron_O3), 4)
dO3_dz3
|
1
2
3
4
5
6
7
8
9
10 | # Finding now the dCost/dz3
# dCost/dz3 = dCost/dO3 * dO3/dz3
# Step 3
dCost_dz3 = round((dCost_dO3 * dO3_dz3), 4)
# This value is also the dCost/db3
dCost_db3 = dCost_dz3
dCost_dz3, dCost_db3
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 | # Calculating dCost/dw6, dCost/dw7, dCost/dw8
# dCost/dw6 = dCost/dz3 * dz3/dw6
# Next, step 4. Find dCost/dw6
dCost_dw6 = round((dCost_dz3 * sigmoid_of_hidden_neuron_O0), 4)
# Step 5
# dCost/dw7 = dCost/dz3 * dz3/dw7
dCost_dw7 = round((dCost_dz3 * sigmoid_of_hidden_neuron_O1), 4)
# Step 6
# dCost/dw7 = dCost/dz3 * dz3/dw8
dCost_dw8 = round((dCost_dz3 * sigmoid_of_hidden_neuron_O2), 4)
# here are the actual values for those calculations above.
dCost_dw6, dCost_dw7, dCost_dw8
|
1
2
3
4
5
6 | # Calculating the d/Cost as it relates to w0, w1, w2, w3, w4 and w5
# First, finding the derivative of the cost as it relates to O0
# dCost_dO0 = dCost/dz3 * dz3/dO0
# Step 7
dCost_dO0 = round(( dCost_dz3 * weights_hidden_output[0]), 4)
dCost_dO0
|
1
2
3
4 | # Step 8
# dO0/dz0 = O0 * (1 - O0)
dO0_dz0 = round(sigmoid_of_hidden_neuron_O0 * (1 - sigmoid_of_hidden_neuron_O0), 4)
dO0_dz0
|
1
2
3
4
5
6
7
8 | # Step 9 dCost/dz0
dCost_dz0 = round((dCost_dO0 * dCost_dz0), 4)
# The value returned from above, is also the bias of the neuron
dCost_db0 = dCost_dz0
# Here are the two values
dCost_dz0, dCost_db0
|
1
2
3
4 | # Step 10
# dCost/dw0 = dCost/dz0 * dz0/dw0
dCost_dw0 = round((dCost_dz0 * input_layer_features[0]), 4)
dCost_dw0
|
1
2
3
4 | # Step 11
# dCost/dw1 = dCost/dz0 * dz0/dw1
dCost_dw1 = round((dCost_dz0 * input_layer_features[1]), 4)
dCost_dw1
|
1
2
3
4
5
6
7
8
9
10 | # Step 12 - Finding dCost/dO1
# dCost/dO1 = dCost/dz3 * dz3/dO1
# dCost/dz3 * w7
dCost_dO1 = round((dCost_dz3 * weights_hidden_output[1]), 4)
# The value above also represents the dCost/db1
dCost_db1 = dCost_dO1
dCost_dO1, dCost_db1
|
1
2
3
4
5 | # Step 13 : Finding dO0/dz0
# dO0/dz1 = O1 * (1 - O1)
dO1_dz1 = round(sigmoid_of_hidden_neuron_O1 * (1 - sigmoid_of_hidden_neuron_O1), 4)
dO1_dz1
|
1
2
3
4 | # Step 14 : Finding dCost/dz0
# dCost/dz1 = dCost/dO1 * dO1/dz1
dCost_dz1 = round((dCost_dz1 * dO1_dz1), 4)
dCost_dz1
|
1
2
3
4
5 | # Step 15 : Finding dCost/dw2
# dCost/dw2 = dCost/dz1 * dz0/dw2
# dCost/dw2 = dCost/dz1 * Input0
dCost_dw2 = round((dCost_dz1 * input_layer_features[0]), 4)
dCost_dw2
|
1
2
3
4
5 | # Step 16 : Finding dCost/dw3
# dCost/dw2 = dCost/dz1 * dz0/dw2
# dCost/dw2 = dCost/dz1 * Input0
dCost_dw3 = round((dCost_dz1 * input_layer_features[1]), 4)
dCost_dw3
|
1
2
3
4
5
6
7
8
9 | #Step 17 : Finding dCost/dO2
# dCost/dO2 = dCost/dz3 * dz3/dO2
# dCost/dO2 = dCost/dz3 * w8
dCost_dO2 = round((dCost_dz3 * weights_hidden_output[2]), 4)
# The value above also represents the dCost/db1
dCost_db2 = dCost_dO2
dCost_dO2, dCost_db2
|
1
2
3
4
5 | # Step 18 : Finding dO2/dz2
# dO2/dz2 = O2 * (1 - O2)
dO2_dz2 = round(sigmoid_of_hidden_neuron_O2 * (1 - sigmoid_of_hidden_neuron_O2), 4)
dO2_dz2
|
1
2
3
4 | # Step 19 : Finding dCost/dz2
# dCost/dz2 = dCost/dO2 * dO1/dz2
dCost_dz2 = round((dCost_dO2 * dO2_dz2), 4)
dCost_dz2
|
1
2
3
4
5 | # Step 20 : Finding dCost/dw4
# dCost/dw4 = dCost/dz2 * dz0/dw4
# dCost/dw4 = dCost/dz2 * Input0
dCost_dw4 = round((dCost_dz2 * input_layer_features[0]), 4)
dCost_dw4
|
1
2
3
4
5 | # Step 21 : Finding dCost/dw5
# dCost/dw5 = dCost/dz2 * dz1/dw5
# dCost/dw5 = dCost/dz2 * Input1
dCost_dw5 = round((dCost_dz2 * input_layer_features[1]), 4)
dCost_dw5
|
The above represents the code for my basic understanding of the coding.
I will work on another post later, where I make a more meaningful network. That will at least help to solidify my knowledge.
References
No comments:
Post a Comment