Unsupervised Learning and Auto-encoder

By Divy Gupta(1611048)

Introduction

Autoencoders are an unsupervised learning technique which uses a specific type of feedforward neural network. The goal of autoencoder is to produce a compressed representation of input data. In other words, autoencoders produce input data with reduced dimensionality. Autoencoders first appeared in 1980s where Hinton and the PDP group (Rumelhart et al., 1986) tried to address the problem of “backpropagation without a teacher”, by using the input data as the teacher.

There are major 3 properties of autoencoders-

· Autoencoders are very data specific. As we are trying to produce almost exact representation of input data, it is difficult to generalize modeled neural network. For example a model trained on handwritten data won’t work on landscape photos.

· The output of the autoencoder won’t exactly be same as the input. In other words, autoencoders are lossy and the input data does degrade to a certain degree.

· The autoencoders don’t require any label as input data itself Is the label.

Typical Architecture of autoencoder

A typical autoencoder has mainly 3 layers-

1. Input layer

2. Hidden layer

3. Output layer

Number of nodes is same in input and output layer since we want input as label but the dimensionality is reduced in hidden layer so to obtain a compressed representation of input data. In other words as we provide a bottleneck at hidden layer, neural network is forced to learn the compressed representation of input data. However, if the input data has completely independent feature components, It will be very difficult to obtain an accurate representation of input data.

Screenshot (84).png

The two processes encoding and decoding are defined as-

· Encoding is the process of converting original input representation to compressed representation.

· Decoding is the process of converting compressed representation to input representation.

The function that encodes data to a latent representation is called encoder and the function that decodes data from hidden layer is called decoder. Now to construct the model, we want to balance two opposite factors. First is sensitivity towards input data and the other is regularization to avoid overfitting. This trade-off require a loss function which encourages input data sensitivity and regularization term which avoids overfitting.

Where x is input data and x’ is output data

Implementation of Autoencoder

For implementation of autoencoders, we need to deal with 4 hyperparameters before training the model

1. Number of layers: We can make autoencoders as deep as we want but in general it should be decided based on the sophistication of data. Apart from input and output layers, we can take array of layers where number of node first decreases than increases.

2. Number of nodes per layer: The input and output layer must have same no of nodes. In between we can trial and error to get best compact representation with minimum loss of useful information.

3. Number of nodes in hidden layer: In this layer, we would like to have as less as possible no. of nodes.

4. Loss function and Regularizer: For loss function we can use mean square error or crossentropy. For regularization, we can give a penalty to the layer if it produces absolute value of input x.

Here second term penalizes the absolute value of the vector of activations a in layer h for observation k, scaled by a tuning parameter.

Here I have used Jupyter notebook for implementing autoencoder. Further, I have used online dataset MNIST which has black and white images of handwritten no.

Screenshot (85).png

Images have 28*28 size and in vector form we get 784 no. between [0,1] for each image. We use Keras for autoencoder codes. The hyperparameter values are as follows-

1. Hidden layer- 128 nodes

2. Loss function- Binary cross entropy

3. Code size = 32

Screenshot (86).png

Here is the result of the above code.

Screenshot (87).png

The output images are pretty accurate with respect to original images

Application of Autoencoders

Now let’s dive into some of the main applications of autoencoders.

1. Image Compression and denoising: Autoencoders are usually not used for compressing images as they are very bad at it but for specific image types they can outperform the existing compression algorithm. Further autoencoders can be used for denoising images. In literature, there are very good examples of image denoising using autoencoders. Image denoising is possible by autoencoders because these models are very good at feature extraction.

2. Feature extraction and dimensionality reduction: By using under complete autoencoder, we can extract hidden feature relations or dependencies which can further be used in feature engineering. Also we can use directly the latent representation from hidden layer for reduced dimensionality data.

3. Recommendation and New data generation: Deep autoencoders can be used to recommend goods and services by learning sophisticated feature relations found in consumer markets. For example Youtube , typically it recommends videos to any user based on his watch and search history. It looks for the features like watch time, content of video, comments, likes and dislikes etc. Such input data can provide sophisticated feature relationships which then can be used for better videos recommendation. In the same way, we can use these models to create new data such creating fake human faces etc.

4. Anomaly detector: Autoencoders are very useful for anomaly detection. Although there are simple methods such as PCA which can be used for anomaly detection but PCA uses linear algebra which breaks down at complex problems often found in real world. By using activation function and many layers in autoencoders, we can solve highly non-linear problems.

Questions asked in Classroom (CS460)

Q. How to choose no of dimensions in hidden layer?

While choosing no. of dimensions in hidden layers, we can first visually inspect the input data and can try to compute no. of actual independent features. Using that no. for hidden layer nodes we can compare results. We can further decrease hidden layer nodes and can compare output results. If they are acceptably accurate, we can take lower values.

References

[1] https://towardsdatascience.com

[2] https://blog.keras.io

[3] Baldi, P.. (2012). Autoencoders, Unsupervised Learning, and Deep Architectures. Proceedings of ICML Workshop on Unsupervised and Transfer Learning, in PMLR 27:37-49-

[4] D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal representations by error propagation. In Parallel Distributed Processing. Vol 1: Foundations. MIT Press, Cambridge, MA, 1986.

[5] https://www.tensorflow.org/tutorials

Some Useful Definitions

Q.What is Unsupervised Learning?

Unsupervised learning is training the machine on unlabeled or unclassified data. In this algorithm, task of the machine is to recognizing similarities, patterns and differences in the provided dataset.

Q.What is Feedforward neural network?

Feedforward neural network are those neural networks where information travels in forward direction only. In other words there are no loops present in the entire system. For example, information flows through input layer to hidden layers and then strictly moves forward to output layer.