These are my notes from the book Grokking Deep Learning by Andrew Trask. It’s a pretty good introduction to the concepts used in Deep Learning. His style of writing, and content level are meant to provide a solid intuition of what’s happening “under the hood”, without overhwelming the reader with overly complex math equations.
Whenever I learn a new subject, I find it overwhelming to try and tackle everything at once. This book allows me to work through the mechanics of what is happening in a neural network using simple equations first. Then I can go back and apply more complicated math once I have the context of how it is being used.
Most of the code is taken directly from the book, and can be found on Andrew’s github.
This is the first of a series of posts where I will be documenting my notes as I follow along with the book. It’s meant primarily as a study guide for myself. I would recommend getting the book if you’d like a more comprehensive explaination of each chapter!
The rest of my notes for this book can be found here
What is a neural network?
At it’s most basic, it takes an input, applies some sort of weight, and then gives a prediction.
- Input data is usually something in the real world that is easily knowable.
- Prediction tells us something given our input data.
- Prediction not always right, uses trial and error.
- Adjusts the weights up and down to get more accuracy the next time it sees the input
- Neural network’s weight is a measure of sensitivity between the input, and the output. Like a volume knob.
Simple neural network example
Let’s try to predict whether or not a baseball team will win based on the average number of toes their players have. The code below, will:
- Take a single input
- Apply a weight
- Provide prediction (an output)
# Making prediction with single input
weight = 0.1 #arbitrary for now
def neural_network(my_input, weight):
prediction = my_input * weight
return prediction
# Takes number of toes, feeds into neural_network function
# Prints prediction
number_of_toes = [8.5, 9.5, 10, 9]
my_input = number_of_toes[0]
pred = neural_network(my_input, weight)
print(pred)
0.8500000000000001
What did this neural network do?
- It multiplies the input by a weight. It ‘scales’ the input by a certain amount
- At it’s simplest form, it uses multiplication. But we don’t like multiplication.
- I think this is where sigmoid and other activations come in. Will cover later. Let’s use multiplication for now.
Making prediction with multiple inputs
What if we had more inputs thatn just number of toes? In this example, we have the inputs:
- number of toes
- win/loss record
- number of fans
Each input will need to have its own weight.
We will need to multiply each input by its respective weight, then add them together to get a single output prediction.
# Making prediction with multiple inputs
# Empty network with multiple inputs
weights = [0.1, 0.2, 0]
def neural_network(my_input, weights):
pred = w_sum(my_input, weights)
return pred
# Define w_sum to perform a weighted sum of inputs
def w_sum(a,b):
assert(len(a) == len(b))
output = 0
for i in range(len(a)):
output += (a[i] * b[i])
print(str(a[i]) +' x ' + str(b[i]) + ' = ' + str(a[i] * b[i]))
print('Weighted sum = ' + str(output))
return output
# Goes through each element in list a and b, multiplies element,
# then sums with the other elements.
# Also known as a dotproduct
# Inserting one input datapoint
"""Data for current status at beginning of each game for first 4 games of season
toes = current number of toes
wlrec = current games won (percent)
nfans = fan count in millions
"""
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]
# This is the input for the first game of the season
my_input = [toes[0], wlrec[0], nfans[0]]
pred = neural_network(my_input, weights)
8.5 x 0.1 = 0.8500000000000001
0.65 x 0.2 = 0.13
1.2 x 0 = 0.0
Weighted sum = 0.9800000000000001
The weighted sum is multiplying the my_input list, with the weights list. These lists can be considered vectors. Multiplying two vectors and then taking their sum is the same as a linear algebra dot product. This is an elementwise operation.
Dot product gives us a notion of similarity between two vectors.
The dot product is what you get when you take two vectors with the same number of dimensions, and apply the vectors to eachother to get a scalar of the two vectors. Think of ‘applying’ the vectors, as each vector giving eachother either a boost, or some cancelation.
# We can also use numpy to calculate dotproduct.
import numpy as np
vector1 = [1,2,3,4]
vector2 = [5,6,7,8]
np.dot(vector1, vector2)
70
# That is the same as computing w_sum
output = 0
for i in range(len(vector1)):
output += (vector1[i] * vector2[i])
print(str(vector1[i]) +' x ' + str(vector2[i]) + ' = ' + str(vector1[i] * vector2[i]))
print('Weighted sum = ' + str(output))
1 x 5 = 5
2 x 6 = 12
3 x 7 = 21
4 x 8 = 32
Weighted sum = 70
Here’s what our basic neural network would look like using numpy. The weights, and my_input list items have all been converted to arrays.
import numpy as np
weights = np.array([0.1, 0.2, 0])
def neural_network(my_input, weights):
pred = my_input.dot(weights)
return pred
toes = np.array([8.5, 9.5, 9.9, 9.0])
wlrec = np.array([0.65, 0.8, 0.8, 0.9])
nfans = np.array([1.2, 1.3, 0.5, 1.0])
# This is the input for the first game of the season
my_input = np.array([toes[0], wlrec[0], nfans[0]])
pred = neural_network(my_input, weights)
print(pred)
0.9800000000000001
Predictions with multiple outputs
If we have three completely independent outputs, meaning the predictions are completely separate, then the implementation is easy.
For example, instead of just predicting win/loss, what if we also predict what % of the team is sad sad, and what % of team is hurt.
# Performs scalar multiplication between input number
# and each weight
def ele_mul(number, vector):
output = [0, 0, 0]
assert(len(output) == len(vector))
for i in range(len(vector)):
output[i] = number * vector[i]
return output
def neural_network(my_input, weights):
pred = ele_mul(my_input, weights)
return pred
weights = [0.3, 0.2, 0.9] #represents weight for %hurt, %win, % sad
wlrec = [0.65, 0.8, 0.9]
my_input = wlrec[0]
pred = neural_network(my_input, weights)
output_labels = ['% hurt', '% win', '% sad']
# Prints output with labels associated
for i in range(len(output_labels)):
print(output_labels[i], pred[i])
% hurt 0.195
% win 0.13
% sad 0.5850000000000001
We took a single input, in this case the win/loss record, and multiplied it with each weight associated with the outputs of % hurt, % win, % sad.
# Can also be done via numpy, but vector must be array not list
def scalar_mul(number, vector):
output = number * vector
return output
def neural_network(my_input, weights):
pred = scalar_mul(my_input, weights)
return pred
weights = np.array([0.3, 0.2, 0.9])
wlrec = [0.65, 0.8, 0.9]
my_input = wlrec[0]
pred = neural_network(my_input, weights)
print(pred)
[0.195 0.13 0.585]
Predictions with multiple inputs and outputs
This is starting to look more like a proper neural network. Where you have multiple inputs feeding multiple outputs
# For each output, perform a weighted sum of inputs
# This is a helper function
def w_sum(a,b):
assert(len(a) == len(b))
output = 0
for i in range(len(a)):
output += (a[i] * b[i])
return output
# Applies w_sum against each row in weights (hurt, win, sad)
# This is a helper function
def vect_mat_mul(vect, matrix):
assert(len(vect) == len(matrix))
output = [0, 0, 0]
for i in range(len(vect)):
output[i] = w_sum(vect, matrix[i])
return output
# Takes
def neural_network(my_input, weights):
pred = vect_mat_mul(my_input, weights)
return pred
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]
#toes %win #fans
weights = [ [0.1, 0.1, -0.3], #hurt?
[0.1, 0.2, 0.0], #win?
[0.0, 1.3, 0.1] ] #sad?
# This is the input for the first game of the season
my_input = [toes[0], wlrec[0], nfans[0]]
pred = neural_network(my_input, weights)
print(pred)
[0.555, 0.9800000000000001, 0.9650000000000001]
The intuition here is the following:
- Take the first element of each list of inputs in toes, wlrec, and nfans
- That is the
my_input
vector - The list of vectors is called a matrix.
- Perform a vector matrix multiplication against the
weights
matrix
Looks like:
(8.5 * 0.1) + (0.65 * 0.1) + (1.2 * -0.3) = 0.555 = hurt prediction
(8.5 * 0.1) + (0.65 * 0.2) + (1.2 * 0.0) = 0.98 = win prediction
(8.5 * 0.0) + (0.65 * 1.3) + (1.2 * 0.1) = 0.965 = sad prediction
You can think of it as 3 independent dot products, ie 3 indepdendent weighted sums of the input.
# my_input dotted with hurt
print(np.dot(my_input, weights[0]))
# my_input dotted with win
print(np.dot(my_input, weights[1]))
# my_input dotted with sad
print(np.dot(my_input, weights[2]))
0.555
0.9800000000000001
0.9650000000000001
Predicting on Predictions
The prediction output can be used as an input to another prediction, aka, a hidden layer.
# An empty network with multiple inputs & outputs
# ih_wgt are the weights going from the input layer to the hidden layer
# toes %win #fans
ih_wgt = [ [0.1, 0.2, -0.1], #hid[0]
[-0.1, 0.1, 0.9], #hid[1]
[0.1, 0.4, 0.1] ] #hid[2]
# hp_wgt are the weights going from the hidden layer to the prediction
# hid[0] hid[1] hid[2]
hp_wgt = [ [0.3, 1.1, -0.3], #hurt?
[0.1, 0.2, 0.0], #win?
[0.0, 1.3, 0.1] ] #sad?
weights = [ih_wgt, hp_wgt]
def neural_network(my_input, weights):
hid = vect_mat_mul(my_input, weights[0])
pred = vect_mat_mul(hid, weights[1])
return pred
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]
my_input = [toes[0], wlrec[0], nfans[0]]
pred = neural_network(my_input, weights)
print(pred)
[0.21350000000000002, 0.14500000000000002, 0.5065]
What just happened?
- Vector matrix mult with
my_input
andih_wgt
matrix to get predictions for hidden layer. - Then Vector matrix mult hidden layer prediction output against
hp_wgt
matrix for final prediction - If we had another layer, we could then do another vector matrix mult with final prediction against next set of weights
We can simplify by using numpy to calculate dot product in place of vect_mat_mul
helper function.
def neural_network(my_input, weights):
hid = np.dot(my_input, weights[0])
pred = np.dot(hid, weights[1])
return pred