Perceptron (Gradient Descent and Implementation)

Lecture note for lecture 18 - 08/11/2020

By Manabputra

Perceptron

A perceptron is a single-layer neural network used for supervised learning of binary classifiers. The job of the binary classifiers are to decide whether an input belongs to a specific class. A perceptron has four major parts. Namely,

Input values
Weights and bias
Weighted sum
Activation function

A perceptron first takes all the input values and multiplies it with their weights. After that, all these values are summed together and a weighted sum is created. When this weighted sum is applied to the activation function, perceptron yields an output.

Diagramatic representation of Perceptron

Algorithm

The Perceptron applies the weighted sum of the input data ($x$) to the activation function. $$activation\;=\;sum(weight_i \;* \;x_i) \;+ \;bias $$ Then the activation function is changed to a prediction using a transfer function. $$prediction \;=\; 1.0 \; if \; activation \; >= \;0.0 \; else \; 0.0 $$ Using Gradient Descent, we now estimate the weights of the Perceptron algorithm from the data.

Gradient Descent

Using Gradient Descent we minimize a function by following the gradients of a cost function. In case of Machine Learning algorithms, Gradient descent is used to evaluate and update the weights at every iteration to minimize the error in the model. While using Perceptron, we use this to update weights ($w$) at each iteration. $$w \;= \; w \; + \; learning\_rate \; * \; (label \; - \; prediction) \; * \; x$$ Here, $learning\_rate$ is manually configured depending on the nature of the data set. We optimize weight ($w$) using the prediction error ($label \; - \; prediction$) for the model.

Implementation

For the implementation of the algorithm, we use the data set provided in class.

First, we develop a function to make predictions. The function $predict()$ predicts an output for the input data given a number of weights. We initiate the weight with bias.

	
    	def predict(self, x):
       	 	self.x = np.append(x, 1)
       	 	return (1 if np.dot(self.x, self.weights) > 1 else 0)

For training the network, we estimate weight values using Gradient Descent. The required parameters are:

Learning rate
Epochs

	
	def __init__(self, inputs, epochs=100, learning_rate=0.01):
      	  	self.epochs = epochs
      	  	self.learning_rate = learning_rate
      	  	self.weights = np.zeros(inputs + 1)
	def predict(self, x):
       	 	self.x = np.append(x, 1)
       	 	return (1 if np.dot(self.x, self.weights) > 1 else 0)

These two parameters and the data together acts as the arguments of the training function. We update each weight for each row in the training data, each epoch and the Weights are updated based on the error the model made. The function named $train(self, x, y)$ calculates weight values for the dataset using gradient descent.

	
   	 def train(self, x, y):
	     for in range(self.epochs):
              	 for input, label in zip(x, y):
                     prediction = self.predict(input)
               	     input = np.append(input, 1)
               	     self.weights += self.learning_rate * (label - prediction) * input

The weights then can be obtained using the following function. we use

Learning rate: 0.1
Epochs: 100

	
   	 p = Perceptron(len(data_x[1]), 100, 0.1)
	 p.train(data_x, data_y)
	 print("weights=",p.weights)
         
         weights= [ 0.50277562 -0.28875358  0.2 ]

The weights are obtained as $weights= [ 0.50277562\;\;\; -0.28875358\;\;\; 0.2 ]$. Using $matplotlib$, one can make the following plot.

	
        
	l=np.linspace(-3,6,200)
	g=-((p.weights[0]*np.linspace(-3,6,200))+p.weights[1])/p.weights[2]
	
	plt.plot(l,g)
	plt.scatter(data_x[:,0][:50], data_x[:,1][:50])	
	plt.scatter(data_x[:,0][50:], data_x[:,1][50:])
	plt.title("Perceptron")
	plt.xlabel("x")
	plt.ylabel("y")
	plt.savefig("plot.png",dpi=600)

The line the seperates the two region is given by the equation: $$0.05277562x\; - \;0.28875358y\; + \;0.2\;=\;0$$