CS460 Assignment
Lecture 7
Submitted by:
Aman Upadhyay
| Roll number 1711017 | aman.upadhyay@niser.ac.in |
Assigned by Dr. Subhankar Mishra | School of Computer Sciences | NISER,
Bhubaneswar
Regression
Introduction
Regression is a supervised machine learning algorithm that can use a pre-embedded statistical method to generate a model [1]. Regression is a technique used for the modeling and analysis of numerical data which exploits the relationship between two or more variables so that we can gain information about one of them through knowing the values of the other. When there is only one independent variable in the linear regression model, the model is generally termed as a simple linear regression model. When there is more than one independent variable in the model, then the linear model is termed as the multiple linear regression model [2].
The Linear Model
Consider a simple linear regression model:

fw,b is the model and w, b are the parameters. The model fw,b is trained on input {xi,yi}i=1 N such that given an xi, this model produces the value yi. The optimized parameters are denoted by w* and b*[1].
This model can be used for interpolation and extrapolation of data. Both the methods are used to guess values given a model. The range of a linear regression model is (xmin,xmax) where x is the independent variable of the training data.
- Interpolation is when the guessed is in the range.
- Extrapolation is when the guessed value is out of the range.

Loss Function
All the algorithms in machine learning rely on minimizing or maximizing a function, which we call “objective function”. The group of functions that are minimized is called “loss functions”. A loss function is a measure of how good a prediction model does in terms of being able to predict the expected outcome. A common loss function used is "mean square error"[3]:

The loss function is a sum of the penalty for misclassification of examples. The Squared is useful as we do not need the + or - information from the error, we just need to know the amount of error, hence an even power, absolute is not used cause it's not differentiable on all x and higher even power magnify the outliers a lot.

Closed-Form Solution
The optimal parameters w and b for the model is found by minimizing the loss function, and a closed form solution is available for simple linear regression:

Why Linear Regression?
There are multiple advantages of simple linear regression over non-linear regressions:
- Simplicity: The model is simple and can be easily interpreted using a plot.
- The solution can be easily found.
- No possibility of overfitting the model.
Implementation & Graphical Interpretation
We are using a small dataset which is tabulated below
x | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
y | 3 | 2 | 5 | 4 | 3 |
import numpy as np
import matplotlib.pyplot as plt # To visualize
def mean(z): #to find mean in mse
return sum(z) / len(z)
def w_func(x_, y_): # to find the slope
xy = [i * j for i, j in zip(x_, y_)]
y_mean_x = [mean(y_) * i for i in x_]
x2 = np.square(x_)
x_mean_x = [mean(x_) * i for i in x_]
numerator = [i - j for i, j in zip(xy, y_mean_x)]
denominator = [i - j for i, j in zip(x2, x_mean_x)]
return sum(numerator) / sum(denominator)
def b_func(x_, y_, w_): #to find the intercept
x_mean = sum(x_) / len(x_)
y_mean = sum(y_) / len(y_)
return y_mean - (w_ * x_mean)
def y_pred_func(b_, w_, x_): #to find the predicted value
return [b_ + (w_ * i) for i in x_]
def mse_func(y_, y_pred_): #to find the mean square error
error = [(i - j) ** 2 for i, j in zip(y_, y_pred_)]
return mean(error)
Then we call these fuctions to get the linear regression model.
if __name__ == '__main__':
x = [1, 2, 3, 4, 5] #dataset
y = [3, 2, 5, 4, 3]
print("X ::", x)
print("Y ::", y)
w = w_func(x, y) #calculate slope
print("w ::", w)
b = b_func(x, y, w) #calculate intercept
print("b ::", b)
y_pred = y_pred_func(b, w, x) #sanity check, calculating predected value
print("Y Predicted :", y_pred)
mse = mse_func(y, y_pred) # calculate the mse
print("Mean square error :", mse)
plt.scatter(x,y)
plt.plot(x, y_pred, color='red')
plt.show()
References
[1] Mishra, S. "Linear Regression". Lecture 7. CS460 Class Lecture (Lecture Notes, NISER, Bhubaneswar, India)
[2] Shalabh. Simple Linear Regression Analysis (Lecture Notes, IIT K, Kanpur, India). Retrieved from http://home.iitk.ac.in/~shalab/regression/Chapter2-Regression-SimpleLinearRegressionAnalysis.pdf
[3] Probabilty Course. Mean Squared Error. Retrieved from https://www.probabilitycourse.com/chapter9/9_1_5_mean_squared_error_MSE.php