Linear Regression Model


Simple Linear Regression :


Gradient Descent


Gradient descent objective is to minimize the cost function, where cost function is defined as \begin{equation} J(m,c) = {1 \over n}\sum_{i=0}^{n}{(y_{i}(predicted)- y_{i}(actual))^2} \end{equation}
y(predicted) is also termed as hypothesis function for linear regression. $$ y(predicted) = mx + c$$ here x is the independent variable which we obtain through experiments and y is dependent variable which we obtain through two methods:

How Gradient Descent Works ?

  1. First we choose an initial parameter as \(m_0,c_0\)
  2. $$y_{i}^{p} = m_{0} x_{i} + c_0$$
  3. Now iteratively update m, b in such a way that J should decrease after each iteration:
  4. $$ m = m - \alpha \Delta m $$ $$ c = c - \alpha \Delta c $$
    where alpha is the learning parameter.
  5. One needs to calculate \(\Delta m\) and \(\Delta c\), we define them in following way: $$ \Delta m = {\partial J(m,c) \over \partial m} $$ $$ \Delta c = {\partial J(m,c) \over \partial c} $$
  6. We have defined the cost function as: $$J(m,c) = {1 \over n}\sum_{i=0}^{n}{(y_{i}^{p}- y_{i}^{a})^2}$$ Using the definition of \(y_{i}^{p} = m x_{i} + c \) $$ \Rightarrow J(m,c) = {1 \over n} \sum_{i=0}^{n}{(m x_{i} + c- y_{i}^{a})^2}$$ $$ {\partial J(m,c) \over \partial m} = {1 \over n} 2 \sum_{i=0}^{n}{(m x_{i} + c- y_{i}^{a}) m} $$ Similarly $$ {\partial J(m,c) \over \partial c} = {1 \over n} 2 \sum_{i=0}^{n}{(m x_{i} + c- y_{i}^{a}) c}$$ Now we subtitute these values in Step(2) and iterate until we get the best fit.

Learning parameter


In Step(2) we have mentioned something as Learning Parameter. It is quite crucial to know about this parameter, as its name suggests it is the learning parameter. It decides the rate of learning of the algorithm. The partial derivatives of J(w.r.t m or c) gives us the values of \(\Delta\)m, \(\Delta\)c such that J gets decreased. The iterative steps are taken in such a way in gradient descent that we attain the minima. This step size is determined by parameter \({\alpha}\). Conditions on \({\alpha}\):
Trulli in above image the \(\theta\) is parameter like m,c.

Where can Gradient Descent work ?

Try Gradient Descent technique through this link by randomly giving points and observe the changes in line.

Reference

  1. 2008 Normal Equations. In: The Concise Encyclopedia of Statistics. Springer, New York, NY.
  2. Learning Parameter \(\alpha\)
  3. Gradient Descent