Thursday, November 14, 2019

Beginning Machine Learning - Reubilding the Linear Regression Algorithm

Over the last few months, I've been caught up with expanding my knowledge on machine learning. As a result, these next few posts are all about me documenting my learning. As stated in many of my previous posts, this is all about making it easier for me to be able to refresh my memory in the future.

While there have been many great tutorials online that I've used, this one is mostly from the "Machine Learning Full Course - Learn Machine Learning 10 Hours | Machine Learning Tutorial | Edureka" on YouTube. Some of the other sites I've used are also within the references:

This post I'm rebuilding the Linear Regression algorithm and in the next post we use Sickit learn's Linear Regression


#!/usr/bin/env python3

'''
    This code is based on me learning more about Linear Regression 
    This is part of me expanding my knowledge on machine learning
    In this version I'm rebuilding the algorithm 

    Author: Nik Alleyne
    blog: www.securitynik.com
    filename: linearRegresAlgo_v2.py

'''


import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize'] = (20.0, 10.0)



def main():
    print('*[*] Beginning Linear regresion ...')
    
    # Reading Data - This file was downloaded fro GitHub. 
    # See the reference section for the URL
    df = pd.read_csv('./headbrain.csv',sep=',', dtype='int64', verbose=True)

    #Gather information on the shape of the datset
    print('[*] {} rows, columns in the training dataset'.format(df.shape))
    print('[*] First 10 records of the training dataset')
    print(df.head(10))

    #Let's now create the X and Y axis using 
    X = df['Head Size(cm^3)'].values
    Y = df['Brain Weight(grams)'].values

    #Find the mean of X and Y
    mean_x = np.mean(X)
    mean_y = np.mean(Y)
    print('[*] The mean of X is {} || The mean of Y is {} '.format(mean_x, mean_y))
    
    # Calculating the coefficients
    # See formula here https://support.minitab.com/en-us/minitab-express/1/help-and-how-to/modeling-statistics/regression/how-to/multiple-regression/methods-and-formulas/methods-and-formulas/#coefficient-coef`
    numerator = 0
    denominator = 0

    for i in range(len(X)):
        numerator += ((X[i] - mean_x) * (Y[i] - mean_y))
        denominator += (X[i] - mean_x) ** 2
    b1 = numerator / denominator
    b0 = mean_y - (b1 * mean_x)
    print('[*] Coefficients:-> Brain Weight (b1): {} || Head size (b0): {}'.format(b1, b0))

    # When compared to the equation y = mx+c, we can say m = b1 & c = b0

    # create the graph
    max_x = np.max(X) + 100
    min_x = np.min(X) - 100

    # Calculating line values x and y
    x = np.linspace(min_x, max_x, 1000)
    y = b0 + b1 * x

    #plotting the line
    plt.plot(x,y, color='r', label='Regression Line')
    plt.scatter(X, Y, c='b', label='Scatter Plot')

    plt.xlabel('Head Size(cm^3)')
    plt.ylabel('Brain Weight(grams)')
    plt.legend()
    plt.show()

    # Let's now use the R2 method to determine how good the model is
    # Formula can be found here
    # https://support.minitab.com/en-us/minitab-express/1/help-and-how-to/modeling-statistics/regression/how-to/multiple-regression/methods-and-formulas/methods-and-formulas/#coefficient-coef
    ss_total = 0
    ss_error = 0

    for i in range(len(X)):
        y_pred = b0 + b1 * X[i]
        ss_total += (Y[i] - mean_y) ** 2
        ss_error += (Y[i] - y_pred) ** 2
    r_sq = 1 - (ss_error/ss_total)
    print('[*] Your R2 squared value is: {}'.format(r_sq))




if __name__ == '__main__':
    main()


When we run the code, we get:



root@securitynik:~/ML# ./linearRegresAlgo_v2.py | more
*[*] Beginning Linear regresion ...
Tokenization took: 0.06 ms
Type conversion took: 0.23 ms
Parser memory cleanup took: 0.00 ms
[*] (237, 4) rows, columns in the training dataset
[*] First 10 records of the training dataset
   Gender  Age Range  Head Size(cm^3)  Brain Weight(grams)
0       1          1             4512                 1530
1       1          1             3738                 1297
2       1          1             4261                 1335
3       1          1             3777                 1282
4       1          1             4177                 1590
5       1          1             3585                 1300
6       1          1             3785                 1400
7       1          1             3559                 1255
8       1          1             3613                 1355
9       1          1             3982                 1375
[*] The mean of X is 3633.9915611814345 || The mean of Y is 1282.873417721519
[*] Coefficients:-> Brain Weight (b1): 0.26342933948939945 || Head size (b0): 325.57342104944223
[*] Your R2 squared value is: 0.6393117199570003



That's it, my first shot at machine learning. Next post we use Sickit Learn rather than build the algorithm ourselves.


References:
https://www.youtube.com/watch?v=GwIo3gDZCVQ&list=PL9ooVrP1hQOHUfd-g8GUpKI3hHOwM_9Dn&index=1
https://matplotlib.org/3.1.1/tutorials/introductory/customizing.html#sphx-glr-tutorials-introductory-customizing-py
Headbrain.csv 
read_csv
Calculating Coefficient
R2

No comments:

Post a Comment