CS460 Machine Learning

Ajaya Saipriya Sahoo, 4th year Int Msc, School of Mathematical Sciences, NISER
Prateek Kumar Murmu, 4th year Int Msc, School of Mathematical Sciences, NISER
Project Group No. 10

Link to access files: Link

Stock Prediction using SVR

Problem Statement

We all know the value of money. Everyone wants to earn to live a stable life. But we also want to become rich with low efforts and great advantages. We all have some kind of wish list in our mind and we need a lot of money to fulfill those desires. Stock market is one of the best platform for this. But again there are risks which we don't want. Or do you want to take risk and lose your money? No, right? Even though there are risks one can forecast the stocks by visualising the past stock values and some statistical factors.

In this project we are trying to plots of financial data of a specific company by using tabular data provided by "yfinance" python library. Again we will create a Support Vector Regression(SVR) model to predict upcoming stock prices. This will reduce the risk factor for the investors.

Datasets and Idea

We are going to use Yahoo Finanace, Quandle and if possible some other platform from which we can get sufficient dataset for stock prediction. We will create a SVR model. We will split the data set into training and testing data. After training the model we will check the performance of the model by using different metrics(e.g mean square error, mean absolute error). We have already studied about the Support Vector Machine for Linear Regression and SVR uses same principle as SVM.

Work Division

No team is a perfect team without teamwork and coordination. We will discuss everything about our work. But roughly saying, Ajaya is going to give the 1st and 2nd Presentation, create slides and Prateek is going to present the last presentation, provide the summary of all theories for the report. Creating ML model using python and handling website for reports will come under joint task.

Mid-way work and Expecting results

Since we are taking support vector machine as our baseline, our main focus is to understand how the SVR algorithm works theoratically. After that we will try to build the model for it, test the performance of trained model, forecast the stock for a certain date by plotting a graph between "exponential moving average vs date". At the latter stage we are planning to compare the result with the result obtained by other algorithm for example Long Short-Term Memory (LSTM) model.

There are some difficulties we might face such as SVR model may not work for large data sets, choosing a good kernel and fine-tuning the hyperparameters are not easy task. So the final goal of the project to bulld a model that might overcome some of these obstacles.

1.Theoretical Analysis (Using different sources/papers)

Support Vector Regression (SVR):

A Support Vector Machine (SVM) is a discriminative classifier that is formally defined by the separating hyperplane. This algorithm outputs an optimal hyperplane which categorizes new examples. It is considered to be one of the most suitable algorithms available for time series prediction. This algorithm can be used for both classification and regression problems.

SVM involves plotting data as points in the multidimensional space. The dimensions represent the attributes (or parameters) of our given data. This algorithm sets a boundary on the dataset called hyperplane. The hyperplane separates our data points into separate classes. To find the best hyperplane (or decision boundary) is our goal. And by best we mean, the decision boundary would be associated with a maximum marginal distance.

Let µ be some unknown data point and w be the vector which is perpendicular to the hyperplane. So, now our decision rule will be,

+ b ………………………………….. (1)

Width of the margin of the hyperplane must be maximized to get a good hyperplane.

Width = [2 / ||||] ................................................... (2)

Max of Width = Max [2 / ||||] ..........................................(3)

Applying Lagrange's multiplier as,

L = ||||² - ……………………………… (4)

where , ’s are the class classifications.

L = ……………………………………….. (5)

By finding the extremum of the above Lagrangian L, we get our desired result.

Now our decision rule will be,

……………………………………… (6)

If we get a dataset which is non-linear in the current dimensional space, we can map them to a new space with greater dimension than before.

The hyperplane in our new dimension is given by the equation,

………………………………. (7)

And the hyperplane must satisfy,

for positive samples, i.e. when ………………………..(8)

for the negative samples, i.e. when ……………….(9)

Here maps our independent values to the new space with greater dimension in which our dataset turns to be linearly separable.

To summarize the above two inequalities we can write,

……………………………………………………. (10)

In real world problems it can happen that even in the new space the dataset is not linearly separable. To counter this problem we introduce a variable, as a tolerance margin in the classification thresholds, making the classifier more flexible in accepting possible errors. Now the hyperplane condition in Eq. (10) becomes Eq. (11), and the problem of finding the optimal hyperplane becomes a convex optimization problem given by Eq. (12). In this equation, C is the adjustment parameter for the edge of the hyperplane with the smallest possible misclassification, under the conditions of Eq. (11).

………………………………. (11)

………………………………….... (12)

Now we come to SVR. It uses principles similar to SVM, but the response variable is a continuous value . Instead of seeking for the hyperplane in Eq. (11), SVR seeks the linear regression function, given by Eq. (13). To achieve this, a threshold error ε is defined to be minimized in the expression in Equation (14). This expression is called the ε-insensitivity loss error function. The SVR regression process therefore seeks to minimize ε in Eq. (14) and in the expression of R, defined in Eq. (15).

……………………………………………… (13)

……. (14)

……………………………………. (15)

Again introducing the tolerance variables here as well, defining as the value in excess of ε and to limit the value to the regression target. Thus, the minimization of Eq. (15) becomes Eq. (16), under the conditions of Eqs. (17) and (18) for and .

+) ........................................................... (16)

…………………………………………… (17)

………………………………………….. (18)

Now let’s focus on the Kernel Function now. It is,

………………………………….. (19)

So, it is the dot product of the images of the vectors in our current space.

The perks of kernels is that we can get the dot products of the images without even knowing anything about the map .

There are mainly four types of kernel function in the SVM algorithm, namely, linear, radial basis function (RBF) Eq.(20), polynomial Eq.(21) and sigmoid function Eq.(22). In this project we have used RBF, polynomial and sigmoid functions and have compared the results.

where gamma is a parameter ……………………….. (20)

where d and r are parameters …………………….. (21)

where and r are parameters ……………... (22)

In the RBF kernel, is the squared Euclidean distance between two feature vector and is a parameter which determines how much influence a single training data point has. RBF kernel is a function whose value depends on the distance from the origin or from some point.

In the polynomial kernel, d is the degree of the kernel and r is a constant term. Here, we simply calculate the dot product by increasing the power of the kernel.

An interesting fact to be noted is that the shape of the kernel function directly influences the values obtained by the SVR. Similarly, the constant c in Eq. (16) and the parameters and d in Eqs. (21) and (22) should be optimized.

2.Experiment and Results:

Data Set: We have taken the Uniqlo (Fast Retailing ) in Tokyo Stock Exchange dataset available on Kaggle. The dataset contains data of 5 years i.e 2012-2016. This is how our data looks like.

Preprocessing Data and Data Analysis: We only need to predict the ‘Close Price’ with respect to Date. After checking whether there exists a null value in the date column or not, we changed the data type of the Date column from string to date-time. Then we compared the month-wise opening and closing price of actual data.

Using plotly we got a figure of opening and close price of actual data vs Date.

Close Price Prediction: We made a separate data frame with Date and Close as columns and took a copy of the same for further use. After normalizing the feature(close value) into range 0 to 1, we split the data into 70% training data and 30% testing data. Using different kernels (RBF, Sigmoid, Polynomial(degree=2)) of SVR from Sci-kit Learn we got the following graphs as result.

After getting the model we will predict the close price using different kernels. Here it is shown for the next 30 days.

Now, the question is which kernel we should use for better result. Let’s look into a statistical measure called the R-squared.

It represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. We have calculated the R^2 value(R) for each kernel. We know that for R=1, it is considered as a best model and for R=0 or < 0 ; it is considered as a worse model.

Considering the above measure, we can clearly see that ‘Sigmoid’ is a very bad kernel option for our dataset. That’s why we haven’t considered it for the ‘30 days prediction’.

Now, comparing the above statistical measures for RBF and Polynomial kernel, we can conclude that for our experiment RBF kernel is the best model.

Final Report:

Review:

Let’s first go through what we have done till now(briefly). We have described how the SVR algorithm works by giving a theoretical explanation. Further we moved into our model. We created three different models using three different kernels. The kernels we have used are RBF (Radial Basis Function), Polynomial (degree= 2) and Sigmoid. The graphical representation of Close Price vs Date was presented for these different models.

We used a specific set of values for the parameters of RBF kernel. The values for C and gamma that are used are 100 and 0.1 respectively. For this model we got almost around 93% accuracy. The polynomial kernel model gave us almost 74% accuracy. [Here the term ‘accuracy’ is used for percentage value of R-squared score. For example, for model-1 if r-squared value is 0.78 we are saying that the model is 78% accurate.]

We created another model for RBF kernel with default parameters i.e., C=1 and gamma=1. For this new model we got almost 83% accuracy. Here we present the graph for two different RBF models.

Fig. RBF model with default parameters

Fig. RBF model with C=100 and gamma=0.1

Ideas to improve accuracy:

First, we tried to improve the accuracy of RBF model with default parameters by tuning the hyperparameters. To find a good set of values for C and gamma, we used Grid Search method. We took a no. of sets of values of C and gamma as parameters with 5-fold cross validation. Using Grid search we got the best set of C and gamma value from those taken values. For this set of values, the RBF model gave almost 96.5% accuracy.

Another idea is we can create a hybrid model. Instead of searching for an optimal set of hyperparameters for RBF kernel, we tried to use hybrid model so that we can get more accuracy than polynomial model. To create a hybrid model, we are proposing to create a model with mixed kernel function. Let’s say we have ‘A’ as kernel function for RBF and ‘B’ as kernel function for polynomial. We somehow create a kernel function K = aA+(1-a)B, where ‘a’ is a positive constant such that 0<a<1. In this way, we can use the mixed kernel and get benefitted by both RBF and polynomial kernel. We are proposing that the value of ‘a’ should be taken small so that the impact of RBF will be less. Unfortunately, we couldn’t provide an experimental proof or any result to validate the idea due to some issues while running the python code. The last paper in the reference list gives us some theoretical proof to back this idea. So, we are providing another idea.

Before discussing the last idea, let’s explain what the polynomial kernel and RBF kernel do. Polynomial and radial basis function (RBF) kernels have complementary strengths. Polynomial kernels perform better for extrapolation. RBF kernels give a better fit in the region covered by the training data Poly kernel has extremely strong generalizability but weak learning capacity. By contrast, Gaussian RBF is it has strong learning capacity but weak generalizability. Based on these concepts, we are proposing an idea of creating a hybrid model in which first we will train an RBF kernel model with original training data set. Then we will predict the values when the input is training data. Now we will create another polynomial kernel model which will be trained by original x values of training data and predicted y values from above model. This will be our hybrid model. This model gives us 82% accuracy which is better than the polynomial kernel. Note: We are simply proposing this idea. We are not providing any strong theoretical proof for this. But here is the result we got for this hybrid kernel.

Limitations and Conclusion:

· Finding optimal hyperparameters using Grid Search is computationally expensive.

· We still need a strong theoretical proof to verify the effectiveness of the Hybrid Model for different datasets Also it depends on the user how much RBF and polynomial part we need to create a good hybrid model for the data.

· With high prediction accuracy, SVR implementation is easy and it is robust to outliers. For very large data sets, it is not suitable. Here as our dataset is small the results might be a consequence of overfitting.

· The Hybrid Model-2 gives a good accuracy of 82%.

Reference:

Vanukuru K.S.R, "Stock Market Prediction Using Machine Learning", Vol. 05 Issue. 10, International Research Journal of Engineering and Technology (IRJET), Oct 2018. Link.
Henrique B.M, Sobreiro V.A and Kimura H, Stock Price Prediction Using Support Vector Regression on Daily and Up to the Minute Prices, The Journal of Finance and Data Science, vol. 4, no. 3, pp. 183-201, 2018. Link.
Nanda M.A, Seminar K.B, Nandika D and Maddu A, “A Comparison Study of Kernel Functions in the Support Vector Machine and Its Application for Termite Detection”, Information (Switzerland), 9(1), 5, Jan 2018. Link.

Kari T, Gao W, Tuluhong A, Zhang Z and Yaermairmaiti Y, “Article Mixed Kernel Function Support Vector Regression with Genetic Algorithm for Forecasting Dissolved Gas Content in Power Transformers”,Energies 2018 11 9 2437.Link.

This is an embedded <a target="_blank" href="https://office.com">Microsoft Office</a> presentation, powered by <a target="_blank" href="https://office.com/webapps">Office</a>.
*Remark: The proposed ideas for hybrid model-1 and 2 may not work for every data sets. We are simply proposing some ideas without any valid proof that might work, according to us. Also, these kinds of simple models may not work in real life stock market prediction as the real stock market depends on a lot of factors. Thank you.