Simple Linear Regression without Gradient Descent Optimization

January 1, 2019   

Linear Regression in a different way

Trying to implement linear regression for a simple dataset without using existing regression library functions or the Gradient Decent Technique

Theory

Regression

A method of establishing relationship between 1 or more independant variables against one dependant variable

Linear Regression

Establishing a Linear relationship

Let’s try implementing the same:

Import the Google Drive to access your dataset

from google.colab import drive

drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
import numpy as np
import pandas as pd

Reading the data

df = pd.read_csv('/content/drive/My Drive/slr_data.csv')
df.head()
x y
0 77 79.775152
1 21 23.177279
2 22 25.609262
3 20 17.857388
4 36 41.849864
df.describe()
x y
count 300.000000 300.000000
mean 50.936667 51.205051
std 28.504286 29.071481
min 0.000000 -3.467884
25% 27.000000 25.676502
50% 53.000000 52.170557
75% 73.000000 74.303007
max 100.000000 105.591837
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize = (20,10))

sns.relplot(x = 'x', y = 'y', data = df)

plt.show()
<Figure size 1440x720 with 0 Axes>

png

Linear Regression is all about finding a straight line that has the least Root Mean Square Value for the given data points => the line is to be existing between the extreme Y values to make sure the line is between the points in a given dataset

y_max = df.y.max()
y_min = df.y.min()
y_max
105.5918375
y_min
-3.4678837889999996
y_mid = (y_max + y_min) / 2
y_mid
51.0619768555
x2 = df.x.max()
x1 = df.x.min()
df.x.idxmax()
87
df.x.idxmin()
55
y2 = df.y[df.x.idxmax()]
y1 = df.y[df.x.idxmin()]
y2
105.5918375
y1
-1.040114209
slope_m = (y2 - y1) / (x2 - x1)
slope_m
1.06631951709
def lin_equ(x):
    return slope_m*(x - (df.x[df.y.idxmax()] + df.x[df.y.idxmin()]) / 2) + (df.y[df.y.idxmax()] + df.y[df.y.idxmin()]) / 2
df.head()
x y
0 77 79.775152
1 21 23.177279
2 22 25.609262
3 20 17.857388
4 36 41.849864
lin_equ(77)
79.85260381693
lin_equ(21)
20.13871085989
lin_equ(22)
21.20503037698
lin_equ(20)
19.0723913428
lin_equ(36)
36.13350361624
df['y_pred'] = pd.Series(map(lin_equ, df.x.values))
df.head()
x y y_pred
0 77 79.775152 79.852604
1 21 23.177279 20.138711
2 22 25.609262 21.205030
3 20 17.857388 19.072391
4 36 41.849864 36.133504
plt.figure(figsize = (20,15))

sns.relplot(x = 'x', y = 'y_pred', data = df)

plt.show()
<Figure size 1440x1080 with 0 Axes>

png

plt.figure(figsize = (20,15))

sns.relplot(x = 'x', y = 'y', data = df)

sns.lineplot(x = 'x', y = 'y_pred', data = df)

plt.show()
<Figure size 1440x1080 with 0 Axes>

png

neg_err_sum = sum([i*i for i in df.y_pred-df.y if i < 0])
pos_err_sum = sum([i*i for i in df.y_pred-df.y if i >= 0])
neg_err_sum
1228.4737143905872
pos_err_sum
2396.9837025596657



comments powered by Disqus